a new passive measurementmethod by trinocular stereo vision

231

A new passive measurement method by trinocular stereo vision

Jun Shen Department of Biomedical Engineering, Southeast University 2, Sipailou, 210018, Nanjing, P.R. China

Serge-Castan and Jian Zhao Lab. CERFIA, Universitk Paul Sabatier, 118, Route de Narbonne, 31077, Toulouse, France

Abstract. In the present paper, a new trinocular stereo vision method is proposed. After a brief survey of binocular stereo vision methods, some essential limitations are indicated. A quantitative description of the false target problem in stereo vision is proposed and applied to binocular and trinocular stereo vision. It is shown that a new trinocular stereo vision strategy matching the feature primitives in 3-D space can effectively solve the false targets. The new method for matching trinocular images is then presented. A new calibration method is also proposed and we show that all linear cameras can be transformed into an equilvalent orthonormal one and both the intrinsic and extrinsic parameters of a trinocular stereo vision system can be calibrated without using any 3-D coordinates, which provides a passive measurement system with an auto-calibration capability.

A system based on the proposed methods has been realized and tested for computer- generated and real images. The experimental results are satisfactory. Erroneous matching is always less than 1% and the precision for 3-D reconstruction is 2 mm at a distance of about 1.5m.

Keywords. Passive measurement, computer vision, stereo vision, image matching, false target, calibration.

1. Introduction

Three-dimensional automatic passive measurement plays an important role in modern vision systems and has found wide applications in practice, e.g. machine parts assembly, mobile robots, calculation of elevation data from remote-sensing

Elsevier Industrial Metrology I (1990) 231-259

0921-5956/90/$3.50 © 1990. Elsevier Science Publishers B.V.

232 J. Shen et al. / Passive measurement by trinocular stereo vision

images and so on. Stereo vision is one of the most important approaches to obtain 3-D data from 2-D images. Since Julesz's experiments on random dot stereo pair images showed the human beings can match stereo pair images by using information obtained from low-level processing [1], many binocular stereo vision methods have been proposed.

As is well known, if one takes two or more images of a scene from different positions, the 3-D coordinates of the scene can be solved from the correspondence between the images by triangulation. All binocular stereo vision systems are based

Jun Shen was born in Jiangsu, China, in 1946. He received the BSc degree in Radio and Electronic Engineering from Qinghua (Tsinghua) University, Beijing, China in 1968 and PhD and "Docteur d'Etat" degrees in Computer Science from Paul Sabatier University, Toulouse, France, in 1982 and 1986, respectively. He worked in Radio Broadcasting, Electrical Engineering and Automatic Control in Industry from 1969 to 1978, and then in Signal and Image Processing in China and France. From 1986 to 1987, he worked as a research scientist at the Lab. CERFIA, Toulouse. Since 1987, he has been professor and chairman of the Biomedical Engineering Department at Southeast University, Nanjing, China, and is presently a visiting professor at Paul Sabatier University, Toulouse, France. His current research interests include image processing, computer vision, pattern recognition and neural

network applications. He has authored or co-authored over 40 publications and is a member of the editorial board of Journal of Pattern Recognition and Artificial Intelligence, published in China and the international journal Industrial Metrology. Prof. Shen received Outstanding Paper Honorable Mention from the IEEE Computer Society at the 1986 International Conference on Computer Vision and Pattern Recognition.

Serge Castan is professor at P. Sabatier University, Toulouse. He received the Doctor degree in Applied Mathematics, and "Docteur d'Etat" degree in Mathematics from the University of Toulouse, France, in 1963 and 1968 respectively. He was research scientist at Centre National de la Recherche Scientifique (CNRS) (1963-1969), and became Professor in Computer Science (software and computer vision) in 1969. He was Director of the Computer Science Department at Institute of Technology from 1972 to 1978. Co-founder (1970) and Director of Cybernetique des Entreprises, Reconnaissance des Formes et Intelligence Artificielle (CERFIA, UA CNRS) Laboratory (1985-1989). Co-founder and Co-Director of Institut de Recherche en Informatique de Toulouse (IRIT, UA CNRS) (1989), and head of the Image Understanding Department at this institute. His major

research interests are in computer vision including: filtering, passive and active 3-D measurement, feature extraction and description (2-D, 3-D and motion), image understanding and parallel architec- ture for image processing. He is author or co-author of numerous publications in the field. Professor Castan received Outstanding Paper Honorable Mention from IEEE Computer Society at the 1986 International Conference on Computer Vision and Pattern Recognition. He is a member of Associa- tion Francaise de Cybernetique Economique et Technique (AFCET) and French representative at the IAPR governing board.

Jian Zhao was born in Nanjing, China, in 1957, he received the BSc degree in Radio and Electronic Engineering from Chinese University of Science and Technology in 1982 and the M.E. (Master of Engineering) degree from Institute of Electronics, SINICA in 1985, and PhD degree in Computer Science from Paul Sabatier University, Toulouse, France, in 1989. From 1982 to 1985 he worked on digital communication at Institute of Electron- ics, SINICA, then on image processing at the Biomedical Engineering Department at Southeast University, Nanjing, China. Since September 1986, he has worked on computer vision at the Lab. CERFIA, Paul Sabatier University, Toulouse, France.

J. Shen et al. / Passive measurement by trinocular stereo vision 233

on this principle and image matching is therefore the key problem for stereo vision. In this paper, we first present in Section 2 the state of art of binocular stereo

vision and indicate some essential limitations. A quantitative description of the false target problem is then proposed in Section 3. Based on this description, we analyze the probability that false targets appear in binocular and trinocular stereo vision systems and we show that a trinocular stereo vision system can solve the false targets much better than a binocular one. In particular, a new strategy that matches the feature primitives in 3-D space, i.e., the sets of feature pixels, can effectively reduce the false targets even without using heuristic constraints, which is the object of Section 4. A new method for matching trinocular stereo images based on this idea is then presented in Section 5.

Because the intrinsic and extrinsic parameters of the cameras are needed to determine the 3-D data of the scene, the calibration problem is discussed in Section 6. We show that both the intrinsic and extrinsic parameters of a trinocular stereo vision system can be calibrated from the correspondence between some pixels on the three images. No 3-D coordinates of points in 3-D space are needed a priori. Note that 3-D coordinates are necessary for traditional calibration methods. By use of this technique, one can therefore automatically determine the shape of the objects in a scene even with an uncalibrated trinocular stereo vision system. If the distance between two points in the scene is known, the size and absolute position of the objects can be determined. Obviously, such a system with a self-calibrating capability is important in practical applications.

The implementation of the whole trinocular stereo vision system based on the new strategy is presented in Section 7. The system is implemented and tested for real images. The experimental results both for the matching and for the calibration technique, show that the proposed trinocular stereo vision system is robust and reliable, and no a priori knowledge or heuristic constraints about the scene are needed. Some concluding remarks are given in Section 8.

2. Binocular stereo vision and its limitations

As is well known, there exist differences between the images of the same scene taken from different positions. Even when no complicated optic properties of t he objects' surfaces are considered, these differences will exist because of the perspective projection. The disparity between the images depends on the parameters of the camera system and the shape and position of the objects in the scene. Inversely, one can deduce the scene in the 3-D space from the disparity between the images if the intrinsic and extrinsic parameters of the cameras are known.

The intrinsic and extrinsic parameters of the cameras can be found by calibration techniques, which will be discussed later. The disparity between the images is determined in general by image matching techniques. By using different features and different strategies in the matching process, different stereo vision methods are obtained. We can divide the current binocular stereo vision methods into eight classes as follows:


(1) Using local similarity between grey value images. One of the most direct methods to find the correspondence between images is the use of correlation. The locations corresponding to a maximum of correlation are matched. Levine et al. [2] and Yakimovsky and Cunningham [3] used correlations between regions to match the stereo pair images, while others also used correlations between small windows [4-7]. In order to remove the false targets caused by noise and by the existence of many similar regions, Yakimovsky and Cunningham used windows varying in size and shape according to the images. The shortcoming of the use of correlation between grey value images is that it takes in general much computation and the correlation function is not sufficiently sharp at the local maximum to assure a good matching precision. Because of these shortcomings, it is difficult to use this technique in industrial applications, where nearly real-time processing is needed. But in cases where the scene is textured and there exist no important perspective distortions caused by depth changes, grey value image correlation can be used to match the stereo pair images. An example is the calculation of elevation data from remote sensing images.

(2) Feature point correlation. In order to obtain more reliable matching results, it is reasonable to match at first a limited number of feature pixels in the images rather than all the pixels. Moravec used as feature pixels those pixels which have a local maximum of the minimum of grey value variances in several directions, and a from-coarse-to-fine correlation is used [8,9]. Barnard and Thompson also used Moravec's feature pixels. They calculated the probability of correspondence between windows of size 5 x 5 and used the continuity of disparities to match the images [10]. Because Moravec's feature pixels are often isolated from each other, much work must be done to reconstruct the whole 3-D scene from the correspondence between these feature pixels.

(3) Matching zero-crossings and edge pixels. Because zero-crossings or edge pixels contain important information on the images, they are widely used as feature pixels in stereo vision. Marr and Poggio proposed to match the stereo pair images by using zero-crossings of directional derivatives [ 11], while Grimson realized a stereo vision system using zero-crossings of the Laplacian of Gaussian [12]. Other methods using zero-crossings can be found in [13-15]. Baker matched the stereo pair images by local correlation of vertical edge pixels [16]. One can use segments of edges or zero-crossings to accelerate the matching process. These segments can be extracted by edge detection and polygonal approximation. Shen and Castan proposed the necessary and sufficient conditions of corresponding polygons in stereopair images [17]. Ayache proposed an algorithm using hypothesis verification to match edge segments [18]. Hwang used a graph structure to describe the vertex and edges of polygonal objects and the relations between them, and image matching is thus transformed into a graph matching problem [19]. These methods work well for scenes composed on polygonal objects. Of course, their performance depends much on that of the edge detection algorithm and of the polygonal approximation. Because one may have different results from polygonal approxi-


mation for the left and right images, it is sometimes difficult to match them, in particular when the perspective distortion is important.

(4) Using dynamic programming. Because corresponding pixels on the stereo pair images should lie on the same epipolar plane, one can search the pixel correspondence only along the epipolar lines. So dynamic programming can be used to solve the matching problem [20-22]. One of the advantages of these methods is that they allow some distortion between the images. But for practical stereo vision systems and complicated scenes, it will be difficult to match the stereo pair images only by one-dimensionl dynamic programming.

(5) Matching by binary Laplacian images. A binary image is a simple and effective representation of grey value images. Nishihara proposed to match stereo pair images under random non-structured illumination by use of a binary Laplacian image [23]. Shen proposed fast algorithms for binary image correlation and used them to match stereo pair images under natural illumination [24]. The use of binary image correlation reduces much of the complexity of computation and can improve the sharpness of the correlation function at local maxima. In order to use simultaneously geometrical and structural as well as local and global similarities between the images, Shen proposed to match the stereo pair images by pyramidal graphs of the binary Laplacian images [25].

(6) Matching the results of region segmentation. As is well known, images can be segmented into regions, and one can use the similarity between regions to match the stereo images [26]. Shapiro used the regions, their boundries and the relations between them to find the correspondence [27]. The methods using the similarity between regions allow some perspective distortions between images, but for complicated scenes, region segmentation often takes much computation time and the obtained result is not very stable.

(7) Image matching with the correction of perspective distortion. As mentioned above, there exist perspective distortions between the images taken from different positions. So all the methods which do not take into account these distortions can be used only in the case where these distortions are negligible. It is therefore interesting to be able to correct them before or in the matching process. Norvelle used correlation to match remote sensing images, and in the matching process, the matched points are used to estimate the perspective distortions between next non- matched windows and thus to correct them [28]. Shen and Castan proposed the invariant features in different perspective projections and the stereo correspondence equation system [29]. Mori used the matched boundary points to estimate the shape of the surface and thus correct the perspective distortion [30]. These methods can solve the perspective distortions to some extent, but because of the complexity of the surface reflectance property and the noise, there is still a long way to g o i n order to realize practical automatic stereo vision systems based on these methods.


(8) Different interpretations of the continuity principle of stereo vision. Because objects in the physical world are in general continuous except at the boundaries [11], the continuity principle is an important tool to solve the false targets in stereo vision and it is widely used in different stereo vision systems. An important approach is the use of relaxation. Other important examples are: disparity continuity [11], figural continuity [-31,32], disparity gradient limit [-33-35], stereo correspondence equation system [29], coherence [36] and disparity functional [37]. On one hand, these methods pay much attention to the funda- mentals of stereo vision and contribute to machine stereo vision research, on the other hand, it is still difficult to use them immediately in practical systems though some of them have obtained good results for real images.

From the brief survey of the binocular stereo vision methods above, we see that there exist some essential difficulties for binocular stero vision:

(1) All binocular stereo vision systems match the stereo pair images by the similarity between images, no matter which features or similarity measures are taken for comparison. But the hypothesis that the stereo pair images are similar to each other is not always valid, in particular when the camera positions are quite different from each other or the distance between the objects and the cameras is relatively small. A binocular stereo vision system tries to find the 3-D data of the scene from the difference between two images but uses a methodology to match the images by the similarity between them, which therefore implies an essential contradiction between the goal of such a system and the strategy used to attain this goal. Though binocular systems can work for some specific applications, it will be difficult to realize a robust and reliable binocular system which works well for general scenes.

(2) The precision of a binocular stereo vision system very much depends on the distance between the cameras. But when the distance between the cameras is increased to improve the precision, the similarity hypothesis between the images will be more violated and it will be more difficult to match the images by similarity. This leads to a contradiction between the precision and the efficiency of the system.

(3) Almost all the binocular stereo vision systems match, at least in the first step, the feature pixels or feature primitives. But when the direction of a primitive (for example, the direction of an edge) composes a small angle with the epipolar line, the precision of the system will be bad. In particular, if the primitive lies in the epipolar line direction, it is almost impossible to match the pixels in the primitive by a binocular system.

(4) If the 3-D scene is composed of regularly repeated and uniformly distributed patterns, use of similarity between images will not be sufficient to solve the false target problem because many different matchings can all meet the similarity criterion.

To overcome these difficulties, multi-, typically three-, camera stereo vision systems are proposed. An early multi-camera system was presented in [8,9], where multi-images were used to reduce the difficulties for matching the successive images and the distance between the first and the last cameras is large enough to assure


a good precision of 3-D reconstruction from stereo vision. Other examples of trinocular stereo vision systems are given in [38-40]. The three cameras are arranged in a non-colinear way and the images are matched pair by pair. The results based on two-image matching are then analyzed together to eliminate possible errors and to improve the precision of results. As far as the matching strategy is concerned, no essential difference exists between these trinocular systems and binocular ones. So the essential difficulties of binocular systems still exist in these systems though some are reduced to some extent. Thus the following questions is proposed: since we have three images of a scene taken from different positions, why should we limit ourself to the methodology of binocular stereo vision?

We think that a good strategy to solve the false targets is to destroy the conditions under which they are produced. Taking three or more images and matching them in 3-D space makes possible the realization of this idea. A pioneer work in this domain is that of Yachida et al. [41]. They matched the feature pixels (edge pixels) of the three images by perspective geometrical constraints in 3-D space. Another example is presented in [42], where the mid-points of edge segments are taken as the feature pixels. Once the mid-point of a segment is matched, the segment is considered matched, and therefore, this algorithm seems faster than the preceding one. According to the authors, the erroneous matchings by not using similarity amount to about 15% and they should be removed afterwards by the use of similarity between the images. Yachida argued that with his new matching strategy, the false targets appear only in "rare" cases by coincidence. But how "rarely" do the false targets appear? Is matching feature pixels efficient enough to remove the false targets? Which is the new matching strategy we should use in a trinocular stereo vision system and to what extent can it work better than a binocular one? To answer these questions, a theoretical analysis is necessary, which will be the subject of the Sections 3 and 4.

3. A quantitative description of false targets

In this section, we first propose a quantitative description of the false target problem and use it to analyze binocular and trinocular stereo vision systems when matching is done on the pixel level.

In Fig. 1, P is a feature point in 3-D space, and P1, P2 and P3 are the images of P taken by cameras 1, 2 and 3 respectively. In our analysis we shall suppose that the feature point P is "seen" by all three cameras. Obviously, if we know P1 and P2 are corresponding pixels, i.e., they are the images of the same point P in 3-D space, the position of P can be uniquely determined. Projecting P onto image 3, we should find the feature pixel P3, which is the image of P taken by camera 3.

Consider at first the binocular system composed of the cameras 1 and 2. Let L2 be the epipolar line in image 2 corresponding to the feature pixel P1. If there exist no feature pixels other than P2 on the epipolar line L2, no false targets will occur and P1 and P2 can be easily matched.


Suppose that the image is composed of n epipolar lines of n pixels (in parallel or slightly convergent stereo vision systems, the image lines are often arranged to be the epipolar lines) and the average density of feature pixels in the image is p. So we shall have n 2 x p feature pixels in each image corresponding to n 2 x p feature points in the 3-D scene.

As there are n different epipolar lines in image 2, the probabil i ty that the image of a feature point in 3-D space taken by camera 2 does not lie on the line L 2 will be (1 - 1/n). As ment ioned above, in a binocular stereo vision system, for a feature pixel P1, which is the image of point P in the 3-D space taken by camera 1, no false targets will occur if and only if no one of the (nZp - 1) feature points other than P in the scene projects its image on the epipolar line L 2, where L 2 is the epipolar line on image 2 corresponding to the feature pixel P1. Suppose that the probabilities that we have a feature point at a posit ion in the 3-D space are independent from one posit ion to another; then the probabil i ty that no false targets occur for a binocular stereo vision system is:

PB2 = (1 - l/n) "2p-t. (1)

However, for a t r inocular stereo vision system, a corresponding triplet PI#P2#Pa (i.e. P1, P2 and P3 are the images of some point P in the 3-D space taken respectively by the cameras 1, 2, and 3) can be accepted only when the three straight lines OIPx, 02P2 and 03P3 have a c o m m o n intersection in the 3-D space.

Now consider another feature pixel P~ on the epipolar line L2. For images 1 and 2, it is possible that P~ corresponds to P1, which yields a feature point E i in the space (Fig. 1). But for a t r inocular system, E i should produce in turn a feature

¢¢ E i

Ima

O 3 I m ~ ~ Image 2

ch

Fig. 1. Feature pixel matching for binocular and trinocular systems•

J. Shen et al. / Passive measurement by trinocuIar stereo vision 239

pixel at the position Pe3 of image 3. So if there is no feature pixel at Pe3, we know immediately that Ph can not match P1, i.e., the feature pixel P~ on the epipolar line Lz does not yield the false target problem. But if there really exists a feature pixel at the position Pe3, the false target E i does occur. This case can take place only when some feature points, such as Q, exist on the straight line passing through 03 and Pe3 (Fig. 1).

We divide the whole space in the field of vision of camera 3 into n x n subspaces S(i,j) (i,j = 1, ..., n), each of which is the solid angle determined by 03 and the boundaries of pixel (i,j) on image 3. A feature point in the 3-D space can lie in any one of the n x n subspaces, so the probability that it does not lie in the subspace determined by 03 and Pe3 is (1 -- l/n2). Thus the probability that no one of the (n2p- 1) feature points other than P lies in the subspace O3Pe3 is:

P x = ( l - - I/H2) nzp-1, (2)

which is the probability that the feature pixel U2 does not give rise to the false target problem.

The probability that we have on L 2 k feature pixels other than P2 is:

pk= { n 2 k - - 1 } n - k ( l _ l/n)"2P -l-k, (3)

where {g} represents the combination number. No false targets will occur for the trinocular stereo vision system if and only if no one of the k feature pixels gives rise to false targets. From eqns. (2) and (3), the probability that the false target problem does not occur for a trinocular system with feature pixel matching is:

= n - k ( 1 - 1 /n) "2p-1- (1 - 1 /n2 ) (4) k=0

Because the contour points are relatively stable feature points in the scenes, edge pixels or zero-crossings in the images can be taken as feature pixels. In general, the density of edge pixels and zero-crossings in real images is in the range 0.1-0.2 1-43] and the image size parameter n ~> 1. Under these conditions and by use of Poisson's theorem, we have from eqns. (1) and (4):

ea2 ~ (1 -- p)" (5)

and PT ~ exp(-- npZ). (6)

Comparing eqns. (1) and (4) or the approximated expressions (5) and (6), we see that for a trinocular stereo vision system, the probability that the false target problem does not occur is much larger than for a binocular one. To show the magnitude of these probabilities, let us consider the following example. Suppose that the image size is 256 x 256, and the average density of feature pixels is p = 0.15. We then have from eqns. (5) and (6):

PB2 --'= 8.54 x 10-19, (7)


PT = 0.003. (8)

From this example, we see that in feature pixel matching, the probability that no false target problem occurs for a feature pixel in one of the stereo pair images is practically zero in a binocular system, and this probability becomes much larger in a trinocular system. But even though this probability is much raised, it remains very small. In order to effectively solve the false target problem, a new matching strategy for trinocular stereo vision systems is therefore needed, which will be analyzed in the following section.

4. A new matching strategy for trinocular stereo vision

In this section, we analyze the false target problem at the primitive matching level.

In Fig. 2, let S be a set of feature points in 3-D scene, and $1, $2 and $3, called feature primitives in the images, its images taken by the three cameras respectively. When we search the primitive in image 2 corresponding to the feature primitive $1, there exist in general more than one, say (h + 1), primitive candidates, of which only one is the real corresponding primitive $2. Therefore, supplementary constraints must be added to binocular systems to solve the false targets. But in a trinocular stereo vision system, if $1 corresponds to $2, they will determine a primitive S in the 3-D space, and S should in turn project its image $3 on image 3.

;te Ai

Image 2

q

Fig. 2. False target problem in feature primitive matching.


Now consider another feature primitive candidate S~ in image 2, St and S~ will determine a fabricated feature primitive Ei in the 3-D space and Ei should produce a feature primitive Se3 in image 3. Obviously if there is no feature primitive at the position Se3, we know S~ can not match $1, i.e., the primitive S~ will not give rise to false targets for St. But if Sez does exist by coincidence, the false target problem occurs. Suppose S~3 is composed of N pixel positions, the false targets produced by the primitives St and S~ will take place only when there exist one or more feature points in each solid angle determined by the optic centre 03 and a pixel of S~3.

Take a pixel Pi ~ S~3; 03 and Pi determine a solid angle A~ in the space (Fig. 2). From the preceding section, we know the probability that there exist no feature points in the solid angle is Px ~ 1 - p, i.e., the probability that we have at least one feature point in Ai is p.

Se3 being composed of N pixels, the subspace determined by the optic centre 03 and Se3 is therefore composed of N solid angles A~, i = 1, ..., N. The probability that there exists at the same time at least one feature point in each one of the solid angles A~, i = 1, ..., N, is given by

Qu = pN, (9)

which means that the probability that the feature primitive S~ in image 2 does not give rise to false targets for St is:

PN = 1 - pN. (1 O)

As we have in image 2 h possible feature primitives like S~ which are the candidates other than $2 corresponding to the primitive $1, no false target problems will occur for matching $1 and these corresponding candidates in image 2 if and only if no one of S~, i -- 1, ..., h, gives rise to false targets. So the probability that we can determine the feature primitive S in the 3-D space corresponding to St by use of only the geometrical constraints deduced from 3-D perspective projection principle is:

P'rm = (1 - pN)h. (1 1)

We see from eqn. (11) that the probability of finding the primitive S corresponding to $1, i.e., no false targets occur, is much increased for a trinocular stereo vision system when matching is done at the feature primitive level. Practically we have almost no false targets in this case. Let us again take the example above to show this conclusion. Suppose the image size is 256 x 256, (n--256), the density of feature points is p--0.15, and we match feature primitives composed of at least 8 pixels, i.e., N = 8. Suppose for a feature primitive in image 1, there exist on average 25 corresponding candidates in image 2, i.e., h = 24 (there are in general less candidates in practical cases). Equation (11) then gives:

PTm = 0.999994, (12)

which means we have practically no false target problem.


One may argue that considering the errors of feature pixel extraction caused by discretization and noises in practical cases, a matching St #S~2//Se3 should be accepted when a large enough part of Sea is composed of feature pixels, because a too strict acceptance criterion will cause erroneous rejection of true matching. This argument is reasonable. An analysis of these cases is therefore necessary.

Suppose that a matching St//si2//Se3 c a n be accepted as true when at least q x N (0 < q ~< 1, a threshold) pixels in Se3 but not necessarily all the N pixels in Se3 are feature pixels. With a statistical analysis similar to the preceding one, we find that the probability that no false target problems occur in this case will be:

Pym = 1 - ( l - p ) s . k=qN k

(13)

Take again the example above and let q = 0.75, we have from eqn. (13):

PTm =" 0.939. (14)

Equation (14) shows that even if non-exact matching is accepted as a consequence of the errors introduced by discretization and noises, the false targets occur with a small probability (in the present example, it is 1 -0.939--0.061) . We conclude that a trinocular stereo vision system matching feature primitives in the 3-D space shows an important advantage in avoiding the false target problem - - the most difficult problem in stereo vision.

5. Implementation of the new matching strategy

To realize the new matching strategy of trinocular stereo vision proposed in Section 4, one should first extract the feature primitives from the three original trinocular images, which will be presented in Section 7. Here we present the algorithm matching the feature primitives after they have been extracted.

Let {Lt, ..., Lm}, {R1, ..., R,} and {Dr, ..., Ds} be the primitive lists for the left, right and down images of a scene, respectively, where L~ (or R i, D/) is a feature primitive which is a sequence of feature pixels and characterized by some parameters.

Taking a primitive Li in the left image, we first examine the approximate direction of the primitive (note that Li is not necessarily a straight line). If it approaches the direction of the epipolar line for the left and right images rather than that for the left and down images, we look for the possible corresponding primitive candidates of L~ in the down image. Otherwise we look for them in the right image. The reason to do so is that we shall have a better precision of epipolar geometry calculation if the direction of the primitive differs more from that of the epipolar line.

Without loss of generality, suppose that the possible corresponding candidates of Li should be looked for in the right image. We take as candidates those primitives Rj in the right image which satisfy the following conditions:


(1) Epipolar constraint: at least one part of Rj and Li should be in the same epipolar zone because from the projective geometry, corresponding pixels in two images of a scene taken from different posit ions should lie in the same epipolar plane.

(2) Disparity constraint: disparity between L~ and Rj should be in some range determined by the a priori knowledge about the depth of the scene. If no such knowledge is available, all disparity values can be permitted.

Let R = {Ril . . . . , Ri~} be the primitive set corresponding to Li satisfying the condit ions above. We try to match L~ with one of these candidates. Taking R~u s R, u ~ {1, ..., t}, we calculate the correspondence between Li and R~u pixel by pixel.

For a pixel PiL of L~, the corresponding pixel P~R in Ri, can be determined simply by the epipolar geometry. If such a pixel does not exist, we pass to the next pixel of L~. Otherwise, the pixel in L~ and this corresponding pixel in Rzu will determine a point P in the 3-D space, which in turn projected onto the third image (here the down image). If the project ion of P on the third image is not a feature pixel, it is not possible to match PiL with Pig and we do nothing but pass to the next pixel of L~. Otherwise, i.e., this project ion is a feature pixel belonging to a feature primitive Dq, q ~ {1, ..., s}, in the third image, the index for the possible corresponding triplet, denoted N [Li, Riu , Dq], is increased by one and we pass onto the next pixel of L~.

This process of matching the pixels is repeated until all the pixels of L~ are calculated. We thus obtain a number of possible primitive correspondence triplets associated with an index N[Li, Ri,, Dq], q = 1, ..., Q~, where Qi is the number of the possible corresponding primitive triplets containing Li and R~,. The triplet {Li, Ri,, Do.}, Q ~ {1, ..., Qi}, is taken for further examinat ion if and only if

N[Li, Ri,, Do.] > N[Li, R~,, Dq], for q = 1, ..., Og and q ¢ Q.

Similarly we match Lg with other possible primitive candidates in R until all the candidates Rix, x = 1, ..., t, are calculated.

We thus get t possible correspondence triplets, each associated with an index N. The triplet {Li, Rw, Do.,} giving the maximal index is then taken as the matching result and the following percentages of matched pixels in the primitives are calculated:

P(Li) = N[Li, Rw, DO-,]/M(Li),

P(Rw) = N[Li, Rw, DO-,]/M(Rw),

P(DQ,) = N[L~, Riv, Do,]/M(DO-,),

where M( . ) means the number of all pixels of the primitive. If at least one of the three percentages is larger than a threshold T(0 < T < 1),

the matched triplet is registered; otherwise no correspondence is found. When the above matching process for L~ is finished, another primitive in the left

image is taken and the primitive matching process is applied to the new primitive. In this process, if the new found matched triplet contains a primitive already


registered in a previously matched triplet, i.e., there is a contradiction between the new and the old matching results, the indices N of the two triplets should be compared and the one which has a lower index will be removed.

The primitive matching process is repeated until all feature primitives in the left image have been calculated. We thus obtain a number of triplets of matched primitives which are registered. Then all matched pixels of the matched primitives are removed from the three images and the matching process described above continues from the unmatched primitives and the rest parts of the matched primitives, and so on. The matching algorithm terminates when no new correspondence triplets can be found.

Note that perspective projective geometry is the basis of the primitive matching algorithm, i.e., a matched triplet should reconstruct a feature primitive in the 3-D space. In this process, no geometrical or structural similarity between the primitives in the images is needed.

6. Calibration of the system

From the presentation above, we see that both the intrinsic and extrinsic parameters of the cameras are needed in the process of matching the trinocular stereo vision images in the 3-D space. These parameters are necessary also for the reconstruction of the 3-D scene from the matching results, as is well known. A calibration system is therefore necesary, which will be discussed in this section.

6.1. Linear camera model and mono-view calibration

In our system, we use the linear camera model, and the mono-view calibration technique is used to determine the intrinsic parameters of the cameras, especially the image centers.

The linear camera model has been frequently discussed in the literature, e.g. [-3,44,45]. Here we show by vectorial geometry that all linear cameras can be calibrated by linear optimization and they can be transformed into an equivalent or thonormal camera.

Figure 3 shows a model of a linear camera: C is the optical center of the camera and O1 is the origin of the coordinate system on the image plane re. Let H and V be the vectors from a pixel to its neighbors in the line and column directions respectively, and P(X, Y, Z) and M(i,j) a point in the 3-D space and its image. O w X Y Z is the reference in the 3-D space. We have:

CM-= A' + i V + j H , (15)

where A' = C01. First we suppose V_I_ H (for example, for a CCD camera), i.e., V - H = 0. Make

C02 _1_ re; 02 is the intersection with the image plane re. Let A = C02 and Ao =


x Y

A d

A

C

x / V E2

(i,j)

P (X, Y, Z)

Fig. 3. Linear camera model.

A/HA II. Create a vector H i :

H1 = H + aA o

such tha t

H~ "A' =0.

where • means the inner p roduc t between vectors. We have

(OwP - OwC)" H~/(OwP - OwC)" Ao =jH" H/IIAolI,

Let

P = OwP,

we then have

(16)

(17)

(18)

P . H ' - C h - j [ P ' A o - Ca] = 0, (20)

where Ch = C" H', Ca = C" Ao and []Ao[l = 1. Similarly we can define:

V I = V + b A o and V'=llAHV1/V'V,

and have

P" V' - C~ - i [P. Ao - Ca] = 0, (21)

with Cv = C- V'. So if we have n points Pk, k = 1, . . . , n, in the 3-D space and their images taken

by the camera, tak ing account of [PAoli = l, the parameters A o, H' , V t and Ch, C,

C = O w C , and H ' = 4iAIIH,/H'H, (19)

246 J. Shen et al. / Pass ive measuremen t by tr inocular s tereo vision

and Ca can be calculated by linear optimization, and C can be in turn determined from Ch, Cv and Ca because H', V' and A 0 are already calculated.

Letting H 0 = NALIH/H" H and V0 = IIAII V/V. V, we get

H o = H ' - ( H " Ao)Ao and Vo= V ' - ( V " Ao)Ao. (22)

We see that the linear camera is thoroughly calibrated and it is equivalent to a linear camera with optical axis Ao, line vector Ho and column vector V o, where H o _L Vo,/40 Z Ao and Vo 1 Ao. Note that IIAoH = 1, i.e., the focal distance of this equivalent camera is 1; we therefore call such a camera an or thonormal camera. By use of this equivalent model, the relation between a point P and its image M(i,j) on the image plane ~z of the real camera can be expressed by:

OwP = C + #[A o + (j - jo )He + ( i - io) Ve], (23)

wi th jo = H " Ao, io = V" Ao, H e = Ho/(Ho" Ho) and V~= Vo/(V o. Vo), and # is a factor.

This calibration technique can be generalized to the cases where line and column directions are not orthogonal to each other. Suppose a linear camera has line and column vectors E1 and E 2 ; E1 is not necessarily perpendicular to E 2. We can define the vectors H and V such that

H = El,

V : E 2 - tE~, with t = ( E 1 • E z ) / ( E 1 • E l ) .

Evidently we have H . V= 0, i.e., H_l_ V. A pixel (i,j) measured in the coordinate system with E 1 and E 2 as the base vectors will have the coordinates (i',j') in the system with H and V as the base vectors, where

i '= i, j ' = j + ti. (24)

Substituting eqn. (24) into eqns. (20) and (21), we have

P . H' - Ch -- (j + ti)(P " Ao -- Ca) = O, (25)

P" IF -- Cv - i(P" A o - Ca) = 0. (26)

So given n points in 3-D space and their images taken by a camera, we can solve by linear optimization V', Cv, Ao and Ca from the system of eqns. (26) for the n points, and in turn H', Ch and t from the system of eqns. (25) for the n points because A o and Ca have already been solved.

6.2. Calibration of the trinocular system

Of course, we can calibrate each camera in a trinocular stereo vision system by the mono-view calibration technique presented above, which needs the 3-D coordinates of a number of points in space and their images.

In practice, it is not convenient to calibrate a system by some standard objects of which we measure the 3-D coordinates of points every time. In this section, we


show that both the intrinsic and extrinsic parameters of a trinocular system, can be calibrated from a number of triplets of corresponding pixels, except for the distances between the cameras which can only be determined to a factor. And if the distance between two points in the 3-D space is known, these distances can be uniquely determined. With this technique, one can therefore calibrate a trinocular system without any 3-D coordinate data.

Figure 4 shows a trinocular system. For a point P, in space and its images M1,, M2, and M3, taken respectively by the cameras 1, 2 and 3, the straight lines C1 M 1,, C2 M2, and C3 M3, should have a common intersection in the space, which gives:

( C - C2 ) • ( F 1 t l

(C2 C3) (F2.

(C3 C1) (F3n

x Q. )=0 ,

X F3n ) = 0 ,

x F1.) =0 ,

(27)

where

Fk, = Aok + (Jk, --Jok)hk + (ik, -- iOk)Vk, for k = 1, 2 and 3.

x represents the vectorial product between two vectors; Aok, hk and Vk are the equivalent vectors of the cameras k (k -- 1, 2 and 3) in the directions of the optic axis, lines and columns, as defined in eqn. (23); iok and Jok are the coordinates of the image center on the image plane of the camera k, as defined in eqn. (23). Aok, hk and Vk satisfy the following constraints (equivalent orthonormal camera model):

Aok : - - (hk x vk)/(tlh lt II v ll),

hk'Vk=O, f o r k = l , 2,3. (28)

P

/ C1 / C2

!0/M3n/ c 3

,y

Fig. 4. A trinocular stereo vision system.


Without loss of generality, we can construct the world coordinate system such that the optical centre of camera 1, i.e., C1, is taken as the origin, C2 - C1 lies on the direction of a principal axis, and the plane passing through the camera centres C1, C2 and C3 as a coordinate plane. The system of equations (27) becomes:

C2" (Fin × F2n) --- O,

(C 2 -- C 3 ) . (F2n x F3n ) = O, (29)

C 3 "(F3n x F1, ,) = O.

If no extrinsic and intrinsic parameters of the cameras are known, we shall have 23 unknowns in the equation system (29): two unknowns for the directions of C 2 and C2 - C3; three for hk (k = 1, 2, 3); two for vk (because hk" vk = 0); and two for the image centre (J0k, iOk) (k = 1, 2, 3). So they could be solved by a nonlinear optimization from at least eight triplets of corresponding pixels. The problem is that there exist a family of false solutions, and simulations show the calibration results are error-prone.

Fortunately in practical applications, such as mobile robots, the cameras can be calibrated for the first time by the mono-view calibration technique. After this calibration, when the cameras are used in a vision or passive measurement system, one often needs to adjust the relative directions and positions of the cameras and their focal distances (for example, in the case of a zoom) in order to have the scene requested or to see some details of objects. In these cases, some intrinsic and extrinsic parameters of the cameras will change, and it is not convenient to calibrate them every time with the standard objects and measure the 3-D coordinates in the space. This poses the problem of how to calibrate a trinocular system without using the 3-D coordinates.

Of course, when the relative positions and/or directions of the cameras are changed, they can be measured approximately by an automatic mechanical system and the change of the focal distances can be known from the zoom adjustment. In general, these measured parameters after adjustment are not precise because a precise measurement will be expensive and sometimes difficult. So the problem to be solved is to obtain the precise calibration results from the approximate parameters. A technique calibrating such a system only from the triplets of corresponding pixels will provide a mobile and adjustable stereo vision system with the capability of self-calibration and istherefore important in practical applications.

For CCD cameras, the image centre is fixed on the image plane, so after an adjustment, the image centers of the equivalent orthonormal cameras, i.e., (Jok, iok), k = 1, 2, 3, do not change, which are known already from the mono-view calibration before. We have therefore only 17 rather than 23 unknowns to solve. Moreover because we have the approximate values of them, as mentioned above, the difference between the precise and the initial approximate values before calibration should be limited (in general, less than 20%). Under these conditions, we can use the equation system (29) to calibrate the 17 parameters of a trinocular system by minimizing the deviation criterion Cd from at least 6 triplets of corresponding pixels (providing 18 equations for 17 unknowns), with

J. Shen et al. / Passive measurement by trinocuIar stereo vision 249

Cd = {[C2 "(F,, x F2,)]/[IIC21I IIF1. × F2.11]} 2

+ {[(C2 - C3)" (F2, x F3,)]/[IIC2 - C3 II [[Fzn × F3nll]} 2

-~- { I f 3 . ( f 3 n x F~n)]/[]lC 3 [1 IlF3n x FI~II]} 2. (30)

If the absolute distance between two points in the 3-D space is known, the distance between the cameras can be easily determined.

7. Implementation of the new trinocular passive measurement system and experimental results

Based on the new methods presented above, a passive measurement system by trinocular stereo vision is realized.

The sets of edge pixels in the images are taken as the feature primitives in our system. Edge pixels in each of the trinocular stereo images are detected by the DRF method [46-481. An edge following algorithm gives the lists of the connected edge pixels in the three images. These connected edges are then decomposed into edge primitives as follows:

(1) A primitive should not be branched, i.e., if an edge pixel is a common pixel of many edges, it is taken as an ending pixel of edge primitives.

(2) The starting and terminating pixels of edges are taken as ending pixels of edge primitives.

(3) A chain of connected edge pixels starting and terminating at two ending pixels, and passing through no other ending pixels, is taken as an edge primitive.

(4) If there exists an important change of edge direction at some point of a primitive, this primitive should be divided into two primitives and so on.

After such processing, we obtain a list of edge primitives for each of the three images. Note that an edge primitive is represented by a sequence of edge pixels and no polygonal approximation error is introduced.

Once we have obtained the lists of feature primitives {LI, ..., Lm}, {R1, ..., Rn} and {D1, ..., Ds} of the three images, they are matched by the algorithm presented in Section 5.

The mono-view calibration technique in Section 6.1 is used to calibrate the cameras for the first time, especially the image centres. After adjustment the system is calibrated by the technique presented in Section 6.2. Note that with this technique, one can provide a system which is able to determine the shape and the relative distances of the objects in a 3-D scene even when the system is not calibrated a priori and no 3-D coordinates are needed.

Our system has been tested by computer generated data and by real scenes, the experimental results are satisfactory.

Tables 1 and 2 show some results of calibration for computer generated data, and Table 3 shows the results for the real images presented in Fig. 5. All results are obtained by the technique proposed in Section 6.2. Experiments show our


Table 1 Result of s imulat ion of t r inocular ca l ibra t ion (pixel coordinates in real numbers)

Real parameters Init ial values Ca l ib ra t ion (x, y, z) x for ca l ibra t ion results

(x, y, z) r (x, y, z) T

Camera 1

C1

Aol

hi

vl

Camera 2

C2

Ao2

h2

v2

Camera 3

C3

Ao3

h3

V3

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0 .9871245E-1 0 .1491893E-1 0.9950042E + 0

0 . 1 5 9 2 0 7 3 E - 2 0 . 1 5 9 2 0 6 5 E - 4 0 . 1 5 8 1 8 5 2 E - 3

- 0 . 2 2 7 5 1 4 5 E - 4 0 . 1 9 9 9 6 7 8 E - 2

- 0 . 2 7 7 2 5 7 0 E - 4

0.1000000E + 4 0.0000000E + 0 0.0000000E + 0

- 0 . 1 9 6 6 8 1 1 E + 0 0 . 2 8 0 3 6 2 2 E - 1 0.9800666E + 0

0 . 1 5 6 2 0 8 5 E - 2 0 . 1 5 6 2 0 8 5 E - 3 0 . 3 0 9 0 1 2 8 E - 3

- 0 . 1 8 0 5 3 9 0 E - 3 0 . 1 9 8 9 6 5 6 E - 2

- 0 . 9 3 1 4 7 7 8 E - 4

0.5000000E + 3 0.6000000E + 3 0.0000000E + 0

0 . 2 5 3 9 9 5 7 E - 1 - 0.1472638E + 0

0.9887711E + 0

0 . 1 5 6 6 6 1 3 E - 2 - 0 . 3 1 3 3 2 2 5 E - 3 - 0 . 8 6 9 0 8 2 3 E - 4

0 . 4 0 3 2 5 3 3 E - 3 0 . 1 9 3 9 0 3 6 E - 2 0 . 2 7 8 4 3 3 8 E - 3

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0.2000000E - 2 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0 . 2 5 0 0 0 0 0 E - 2 0.0000000E + 0

0.1000000E + 4 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0.2000000E - 2 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0 . 2 5 0 0 0 0 0 E - 2 0.0000000E + 0

0.4000000E + 3 0.4000000E + 3 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0.2000000E - 2 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0 . 2 5 0 0 0 0 0 E - 2 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0 . 9 8 3 8 5 1 8 E - 1 0.1524698E - 1 0.9950316E + 0

0 . 1 5 9 2 0 0 0 E - 2 0.1590984E - 4 0 . 1 5 7 6 5 5 1 E - 3

- 0 . 2 2 7 9 6 8 7 E - 4 0.1999842E - 2

- 0 . 2 8 3 7 7 1 8 E - 4

0.1000000E + 4 0.0000000E + 0 0.0000000E + 0

- 0 . 1 9 7 0 4 1 2 E + 0 0 .2835779E-1 0.9799850E + 0

0 . 1 5 6 1 7 4 2 E - 2 0.1563476E-- 3 0 . 3 0 9 4 8 8 3 E - 3

- 0 . 1 8 0 5 7 0 4 E - 3 0 .1989532E--2

- 0 .9387752E--4

0.4999123E + 3 0.6000184E + 3 0.0000000E + 0

0 . 2 5 0 6 8 9 0 E - 1 -- 0.1469702E + 0

0.9888232E + 0

0.1566622E-- 2 - 0 . 3 1 3 2 4 8 1 E - - 3 - 0.8627590E - 4

0.4037863E - 3 0 . 1 9 3 9 3 9 3 E - 2 0 . 2 7 8 0 2 3 6 E - 3


Table 2 Result of s imulat ion of t r inocu la r ca l ibra t ion (pixel coordinates in integers)

Real parameters Initial values (x, y, z) v for cal ibra t ion

(x, y, z):

Cal ibra t ion results (X, y, Z) T

Camera l

C1

Aol

h:

P1

Camera 2

C2

Ao2

h2

P2

Camera 3

C3

Ao3

h3

P3

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0 .9871245E-1 0.1491893E--1 0.9950042E + 0

0 . 1 5 9 2 0 7 3 E - 2 0 . 1 5 9 2 0 6 5 E - 4 0 . 1 5 8 1 8 5 2 E - 3

- 0 . 2 2 7 5 1 4 5 E - 4 0 . 1 9 9 9 6 7 8 E - 2

- 0 . 2 7 7 2 5 7 0 E - 4

0.1000000E + 4

0.0000000E + 0 0.0000000E + 0

- 0 . 1 9 6 6 8 1 1 E + 0 0 . 2 8 0 3 6 2 2 E - 1 0.9800666E + 0

0 . 1 5 6 2 0 8 5 E - 2 0 . 1 5 6 2 0 8 5 E - 3 0 . 3 0 9 0 1 2 8 E - 3

- 0.1805390E - 3 0 . 1 9 8 9 6 5 6 E - 2

- 0 . 9 3 1 4 7 7 8 E - 4

0.5000000E + 3 0.6000000E + 3 0.0000000E + 0

0 . 2 5 3 9 9 5 7 E - 1 - 0.1472638E + 0

0 .9887711E+0

0 . 1 5 6 6 6 1 3 E - 2 - 0 . 3 1 3 3 2 2 5 E - 3 - 0 . 8 6 9 0 8 2 3 E - 4

0.4032533E-- 3 0 . 1 9 3 9 0 3 6 E - 2 0 . 2 7 8 4 3 3 8 E - 3

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0.2000000E - 2 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.2500000E-- 2 0.0000000E + 0

0.1000000E + 4 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0 , 1 0 0 0 0 0 0 E + 1

0.2000000E - 2 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.2500000E-- 2 0.0000000E + 0

0.4000000E + 3 0.4000000E + 3 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0 . 2 0 0 0 0 0 0 E - 2 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.2500000E-- 2 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0.9702264E - 1 0 .1190221E-1 0.9952111E + 0

0 . 1 5 9 2 3 7 5 E - 2 0 .1374745E- 4 0 .1554043E- 3

- 0 . 1 9 4 1 4 0 8 E - 4 0 . 1 9 9 9 7 8 4 E - 2

- 0.2202370E - 4

0.1000000E + 4 0.0000000E + 0 0.0000000E + 0

- 0.1978625E + 0 0 .2464162E- 1 0.9799201E + 0

0 . 1 5 6 1 9 6 6 E - 2 0 .1522897E- 3 0 . 3 1 1 5 5 8 0 E - 3

- 0 . 1769430E- 3 0 .1990310E--2

- 0 . 8 5 7 7 7 2 3 E - 4

0.5027568E + 3 0.5988467E + 3 0.0000000E + 0

0 .2288202E- 1 - 0.1487464E + 0

0.9886107E + 0

0 . 1 5 6 6 5 1 7 E - 2 - 0.3146949E - 3 - 0.8360701E - 4

0.4044335E - 3 0 . 1 9 3 8 2 3 6 E - 2 0 .2822660E- 3

Precision of 3-D reconstruct ion: 0.14%.


Table 3 Result of t r inocular ca l ibra t ion for the real images in Fig. 5.

Init ial values Ca l ib ra t ion for ca l ibra t ion results (x, y, z) T (x, y, z) T

Camera 1

C1

Aol

hi

Vl

/01 Jol

Camera 2

C2

Ao2

h2

V2

io2 ~2

Camera 3

C3

,403

h3

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0.8000000E - 3 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.8000000E - 3 0.0000000E + 0

0 .1117321E+3 0.1966048E + 3

0.4500000E + 3 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0 . 8 0 0 0 0 0 0 E - 3 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.8000000E - 3 0.0000000E + 0

0.1240654E + 3 0.1836594E + 3

0.2500000E + 3 0.2500000E + 3 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.1000000E + 1

0 . 8 0 0 0 0 0 0 E - 3 0.0000000E + 0 0.0000000E + 0

0.0000000E + 0 0.0000000E + 0 0.0000000E + 0

0.2273972E + 0 0 .1448655E+0 0.9629666E + 0

0.7210453E - 3 - 0 . 5 5 1 9 5 7 0 E - 4 - 0 . 1 6 1 9 6 5 9 E - - 3

0 .3049020E--4 0 .7509213E--3

- 0 . 1 2 0 1 6 6 1 E - - 3

0 .1117321E+3 0.1966048E + 3

0.4500000E + 3 0.0000000E + 0 0.0000000E + 0

- 0.7470439E - 1 0 .1557139E+0 0.9849734E + 0

0 . 7 5 4 1 4 0 6 E - 3 - 0 . 4 8 1 5 1 3 3 E - 4

0 . 6 4 8 0 9 3 3 E - 4

0 . 5 9 0 2 8 2 8 E - 4 0 . 7 6 7 2 6 i 3 E - 3

- 0 . 1 1 6 8 1 8 9 E - 3

0.1240654E + 3 0.1836594E + 3

0.2977080E + 3 0.2982275E + 3 0.0000000E + 0

0 . 1 6 9 5 2 5 3 E - 1 - 0 . 4 2 2 8 0 1 9 E - 1

0.9989620E + 0

0 . 7 6 7 7 8 2 2 E - 3 - 0 . 5 5 8 2 6 9 7 E - 4 - 0 . 1 5 3 9 2 2 0 E - 4

J. Shen et al. / Passive measurement by trinocular stereo vision

Table 3 (continued) Result of trinocular calibration for the real images in Fig. 5.

253

Initial values Calibration for calibration results (x, y, z) x (x, y, z) T

0.0000000E + 0 0 .5791323E- 4 v 3 0.8000000E - 3 0 .7875553E- 3

0.0000000E + 0 0 .3234979E- 4

io3 0.1305385E + 3 0.1305385E + 3 Jo3 0.1801512E+3 0.1801512E+3

Precision of 3-D reconstruction: 0.19%. (18 triplets of correspondence are used in the calibration)

Fig. 5. Real images for trinocular calibration (left image, right image and down image).

trinocular calibration technique to work well when the difference between the initial estimate of the parameters (i.e., the imprecise measures) and their real values is not larger than 30%, which is easily satisfied in practice.

Figures 6-8 show some matching results for real images. Example 1 is a scene composed of uniformly distributed grids, which are very difficult to match by a binocular system, but they are easily matched by our trinocular system with no errors. Examples 2 and 3 are other indoor scenes. In example 3, two trinocular stereo views are taken before and after a movement of the trinocular system, and the 3-D scenes reconstructed from each of the two trinocular views are superposed.


Fig. 6. Example 1: (a) original images; (b) detected edges; (c) matched edges.

We see that they confirm each other, which provides a demonstrat ion of the precision of our system. These experimental results show that our trinocular stereo vision system is robust and reliable, while no a priori knowledge or constraints about the scene are used. With the actual system, the erroneous matchings are always less than 1% and the precision is 2 mm at a distance of about 1.5 m.

8. Conclusion

Binocular stereo vision systems face the essential difficulty that matching is done by use of some similarity between the images, which is not always valid because of different perspective effects on the images. We think that a trinocular stereo vision system matching groups of feature points, called feature primitives, by use of the 3-D perspective geometry will be more robust. A quantitative description of the false target problem is proposed and a statistical analysis shows an efficient way to solve the false targets, i.e., a new strategy matching the stereo images in 3- D space and at the feature primitive level rather than in image space or at pixel level. The calibration problem is also discussed. We show that both the extrinsic


(a)

(b)

Fig. 7. Example 2: (a) original images; (b) detected images.


(c)

Fig. 7. Example 2: (c) matched edges.

and intrinsic parameters of a trinocular stereo vision system can be calibrated by nonlinear optimization and no 3-D coordinates of points in the 3-D space are needed, which makes it possible to have a stereo vision system with an autocalibra- tion capability, as is requested in many applications. A system for passive measurement by trinocular stereo vision is thus realized and tested for computer-generated and real images. The experimental results are satisfactory.

The new method shows the following advantages [49, 50]: (1) It is based on perspective geometry and needs no similarity between the

images. The system is therefore robust and can be used in various situ- ations.

(2) Very different views can be used, which assures a good precision for 3-D scene reconstruction.

(3) Trinocular stereo calibration without using the 3-D coordinates makes it possible to determine automatically the intrinsic and extrinsic parameters of the system, which is important for practical application.

(4) Uniformly distributed and regularly repeated patterns, which are very difficult to solve by binocular systems, can be easily solved.

(5) It solves very well the false targets, and the result is precise and reliable.


Fig; 8. Example 3: (a) original images; (b) detected edges; (c) matched images; (d) superposition of the same scene reconstructed from two trinocular views (visualized by projections in three directions).

References

[1] B. Julsez, Binocular depth perception of computer-generated patterns, Bell Syst. Tech. J. 39 (1960) 1125-1161.

[2] M.D. Levine, D. Handly and G. Yaki, Computer determination of depth maps, Comput. Graph. Image Process. 2 (1973) 134-150.

[3] Y. Yakimovsky and R. Cunningham, A system for extracting three-dimensional measurements from a stereopair of TV cameras, Comput. Graph. Image Process 7 (1978) 195-210.

[4] D.B. Gennery, Modeling the environment of an exploring vehicle by means of stereo vision, AIM-399, Stan-CS, 80 805, Computer Science Dept., Stanford Univ., 1980.

[5] M.J. Hannah, Computer Matching of Aeros in Stereo Imagery, Technical Report, Stanford, CA, 1974.

1-6] R.L. Henderson, W,J. Miller and C.B. Grosch, A flexible approach to digital stereo mapping, Photogramm. Engr. Remote Sens. (1978) 1499-1512.

1-7] J.H. Quam, Computer Comparison of Pictures, Stanford, CA, 1971. [8] H.P. Moravec, Towards automatic visual obstacle avoidance, Proc. 5th Int. Joint Conf. on

Artificial Intelligence, Cambridge, MA, 1977. [9] H.P. Moravec, Visual mapping by a robot ROVER, Proc. 6th Int. Joint Conf on Artificial

Intelligence, 1979, pp. 598-620.


[10] S.T. Barnard and W.B. Thompson, Disparity analysis of images, IEEE Trans. Pattern Anal. Mach. Intelligence, 2 (4) (1980).

[11] D. Marr and T. Poggio, A computational theory of human stereo vision, Proc. R. Soc. London, Ser. B 204, (1979) 301-328.

[12] W.E.L. Grimson, From Images to Surfaces, MIT Press, Cambridge, MA, 1981. [13] Y. Shirai, Three-dimensional computer vision, in: G.G. Dodd and L. Rossol (eds.), Computer

Vision and Sensor-Based Robots, Plenum Press, New York, 1979, pp. 187-206. [14] W. Hoff and N. Ahuja, Depth from stereo, Proc. 4th Scand. Conf. on Image Analysis, 1985,

pp. 761-768. [15] Y.C. Kim and J.K. Aggarwal, Finding range from stereo images, Proc. 1EEE Conf. on Computer

Vision and Pattern Recognition, 1985, pp. 289-294. [16] H.H. Baker, Edge based stereo correlation, Proc. ARPA Image Understanding Workshop, Univ.

Maryland, 1980. [17] J. Shen and S. Castan, A method for finding straight line and plane correspondence in stereopair

images, Proc. Int. Conf. on Acoustics, Speech, Signal Processing, San Diego, CA, 1984. [18] N. Ayache and B. Faverjon, Un algorithme de st6r6oscopie passive utilisant la pr6diction et

v6rification r6cursive d'hypoth6ses, Proe. 5th AFCET, 1985. [19] J.J. Hwang and E.L. Hall, Matching of feature objects using relational table from stereo image,

Comput. Graph. Image Process. 20 (1982) 22-42. [20] M. Benard, Restitution automatique en st6r6ophotogramm6trie, PhD Thesis, E.N.S.T., Paris,

1983. [21] R. Mohr and B. Wrobel, La correspondance en st6r6o vision, vue comme une recherche d'un

chemin optimal, Proc. 4th Conf. Reconnaissance des Formes et Intelligence Artifieielle, Paris, 1984, pp. 71-80.

[-22] E.R. Haddow, J.F. Boyce and S.A. Lloyd, A new stereo algorithm. Proc. 4th Scand. Conf. on linage Analysis, 1985, pp. 175-182.

[23] H.K. Nishihara, PRISM: A practical real-time imaging stereo matcher, A.I. Memo. 780, MIT, Cambridge, MA, 1984.

[24] J. Shen, A new fast algorithm of stereo vision, Proc. SPIE'85, Cannes, 1985. [25] J. Shen, S. Castan and J. Zhao, Stereo vision by pyramidal graph matching, Proc. SPIE'87,

Cannes, 1987. [26] J. Cocquerez and A. Gagalowitz, Mise en correspondance de r6gions dans une paire d'images

st6r6o, Proc. 3rd colloque Image, Paris, 1987. [27] L. Shapiro, A fast structural matching algorithm with applications in stereo vision, Proc. 4th

Scand. Conf. on Image Analysis, 1985. [28] F.R. Norvelle, Interactive digital correlation techniques for automatic compilation of elevation

data, Tech. paper, ASP, 47th Ann. Meeting, Washington DC, 1981, pp. 554-567. [29] J. Shen and S. Castan, A stereo vision algorithm taking into account the perspective distortions,

Proc. Int. Conf. on Pattern Recognition, Montreal, 1984. [30] K.I. Mori, M. Kidode and H. Asada, An interactive prediction and correction method for

automatic stereo comparison, Comput. Graph. linage Process. 2 (3/4) (1973) 393-401. [31] R.D. Arnold, Local context in matching edges for stereo vision, Proc. Workshop on Image

Understanding, 1978, pp. 64-72. [32] J.E.W. Mayhew and J.P. Frisby, Computational and psychological studies towards a theory of

human stereopsis, Artif Intell. 17 (1981) 349-385. [33] J.J. Koenderink and A.J. van Doom, Geometry of binocular vision and a model for stereopsis,

Biol. Cybernet. 21 (1976) 29-35. [34] R.D. Arnold and T.O. Binford, Geometric constraints in stereo vision, Proe. Soc. Photo-Opt.

lnstrum. Eng. 238 (1980) 281-292. [35] P. Burt and B. Julesz, A disparity gradient limit for binocular fusion, Science 208 (1980) 615-617. [36] K. Prazdny, Detection of binocular disparities, Biol. Cybern. 52 (1985) 93-99. [37] R.D. Eastman and A.M. Waxman, Using disparity functionals for stereo correspondence and

surface reconstruction, Comput. Vision Graph. Image Process. 39 (1) (1987).


[-38] A. Gerhard, H. Platzer, J. Steurer and R. Lenz, Depth extraction by stereo triples and a fast correspondence estimation algorithm, Proc. 8th Int. Conf. on Pattern Recognition, Paris, 1986.

[39] Y. Ohta, M. Watanabe and K. Ikeda, Improving depth map by right angled trinocular stereo, Proc. 8th Int. Conf. on Pattern Recognition, Paris, 1986.

[-40] Y. Vincent, Perception et modelisation de l'environnement d'un robot mobile: une approche par st6r6o vision, PhD Thesis, univ. Paul Sabatier, Toulouse, 1986.

E41] M. Yachida, Y. Kitamura and M. Kimachi, Trinocular vision: New approach for correspondence problem, Proc. 8th Int. Conf. on Pattern Recognition, Paris, 1986.

[42] N. Ayache and F. Lustman, Fast and reliable passive stereovision using three cameras, Int. Workshop on Industrial Applications and Machine Intelligence, Tokyo, 1987.

[43] W.K. Pratt, Digital Image Processing, Wiley, New York, 1978. [-44] O. Faugeras and G. Toscani, The calibration problem for stereo, Proc. CVPR'86, Miami, FL,

1986. [-45] T. Tsai, An efficient and accurate camera calibration technique for 3D machine vision, Proc.

IEEE Conf. on Computer Vision and Pattern Recognition, Miami, FL, 1986. [46] J. Shen and S. Castan, An optimal linear operator for edge detection, Proc. IEEE Conf. on

Computer Vision and Pattern Recognition, Miami, FL, 1986. [-47] J. Shen and S. Castan, Edge detection based on multi-edge models, Proc. SPIE'87, (Real Time

Image Processing), Cannes, 1987. [-48] J. Shen and S. Castan, Further results on DRF method for edge detection, Proc. 9th Int. Conf

on Pattern Recognition, Rome, 1988. E49] J. Shen and S. Castan, A new strategy for multi-camera stereo vision, Proc. 5th Scand. Conf. on

Image Analysis, Stockholm, June 1987. [50] J. Shen, S. Castan and J. Zhao, A new trinocular stereo vision method, Proc, 6th Scand. Conf.

on Image Analysis, Oulu, June 1989.

a new passive measurementmethod by trinocular stereo vision

Documents