docs/computer vision... · web viewonce the scene point depth is known, the and co-ordinates of the...
TRANSCRIPT
Introduction
We have seen that, in order to compute disparity in a pair of stereo images, we need to determine the local shift (disparity) of one image with respect to the other image.
In order to perform the matching which allows us to compute the disparity, we have 2 problems to solve :
Which points to select for matching?- we require distinct ‘feature points’
How do we model feature points in 1 image with feature points in the second image?
5 feature points 5 feature points
Left Right
Feature matching for stereo image analysis
There are some fairly obvious guidelines in solving both problems.
Feature points must be :
Local (extended line segments are no good, we require local disparity)
Distinct (a lot ‘different’ from neighbouring points)
Invariant (to rotation, scale, illumination)
The matching process must be :
Local (thus limiting the search area)
Consistent (leading to ‘smooth’ disparity estimates)
Approaches to feature point selection
Previous approaches to feature point selection have been :
Moravec interest operator- this is based on thresholding local greylevel squared differences
Symmetric features - circular features, spirals
Line segment endpoints
Corner points
A feature extraction algorithm based on corner point detection is described in Haralick & Shapiro, pp. 334 which allows corner point locations to be accurately estimated along with the covariance matrix of the location estimate. The approach uses least-squares estimation and leads to an identical covariance matrix as was obtained for the optical flow estimation algorithm!
Assume that the corner point lies at location ( , )x yc c . We require an estimate ( , )x yc c of the location together with a covariance matrix which we obtain by considering the intersection of line segments with an m point window spanning the corner point location.
Suppose we observe 2 ‘strong’ edges in the window. These edges will meet at the corner point estimate ( , )x yc c (unless they are parallel).
An algorithm for corner point estimation
What if we observe 3 strong edges?
The location estimate ( , )x yc c minimises the sum of perpendicular distances to each line segment.
Thus, in this case ( , )x yc c is selected to minimise n n n12
22
32 .
n3n1
( , )x y1 1 ( , )x y2 2
( , )x yc c( , )x y3 3
( , )x yc c
n2
How do we characterise an edge segment?
Each edge passes through some point ( , )x yi i and has a gradient f x y fi i i( , ) at that point. The edge segment is characterised by the perpendicular distance to the origin li and the gradient vector direction i .
(cos( ),sin( )) f i i i
l x y f x yi i iT
i i i i i ( , ) cos( ) sin( )
Finally, we can quantify the certainty with which we believe an edge to be present passing through ( , )x yi i by the gradient vector squared-magnitude f wi i
2 .
In order to establish a mathematical model of the corner location, we assume that the perpendicular distance of the edge segment ( , )li i is modelled as a random variable ni with variance n
2 .
l x y ni c i c i i cos( ) sin( )
Assuming our observations are the quantities ( , )li i for each location i in the m -point window, we can employ a weighted least-squares procedure to estimate ( , )x yc c :
( , ) min( , )x y w nc c x y i ii
m
c c
2
1
where w fi i 2 .
ni
i
( , )x yi i
li
f i
( , )x yc c
Defining :
( , ) ( cos sin )x y w n w l x yc c i i i i i i i ii
m
i
m
2 2
11
( , ) ( , )
x yx
x yy
c c
c x x
c c
c y yc c c c
0 0
These equations straightforwardly lead to a matrix-vector equation :
w w
w w
xy
l w
l w
i ii
m
i i ii
m
i i ii
m
i ii
mc
c
i i ii
m
i i ii
m
cos cos sin
cos sin sin
cos
sin
2
1 1
1
2
1
1
1
Subsituting :
l x y
ffx
ffy
i i i i i
i ii
i ii
cos( ) sin( )
cos
sin
fx
fy
fy
fy
fy
fy
xy
fx
xfx
fyy
fx
fyx
fy
y
i
i
mi i
i
m
i i
i
mi
i
m
c
c
ii
i ii
i
m
i ii
ii
i
m
2
1 1
1
2
1
2
1
2
1
xy
fx
fy
fy
fy
fy
fy
fx
xfx
fyy
fx
fyx
fy
y
c
c
i
i
mi i
i
m
i i
i
mi
i
m
ii
i ii
i
m
i ii
ii
i
m
2
1 1
1
2
1
1 2
1
2
1
We can estimate n2 by :
( cos sin ) n i i c ii
m
c imw l x y2
1
212
The covariance matrix is then given by :
C( , )
x y n
i
i
mi i
i
m
i i
i
mi
i
mc c
fx
fy
fy
fy
fy
fy
2
2
1 1
1
2
1
1
This is the same covariance matrix we obtained for the optical flow estimation algorithm.
Window selection and confidence measure for the corner location estimate
How do we know our window contains a corner point? We require a confidence estimate based on the covariance matrix C( , )x yc c .As in the case of the optical flow estimate, we focus on the eigenvalues 1 2 of C( , )x yc c and define a confidence ellipse :
We require 2 criteria to be fulfilled in order to accept the corner point estimate :
The maximum eigenvalue 1 should the smaller than a threshold value
The confidence ellipse shall not be too elongated implying that the corner location estimate precision is much greater in one direction than the other
In order to satisfy the second criterion, we can quantify the circularity of the confidence ellipse by the form factor q :
qtr
1
4 41 2
1 2
21 2
1 22 2
( )
det( )( )
CC
Its easy to check that 0 1 q with q 0 for an elongated ellipse and q 1 for a circular ellipse. Thus we can threshold the value of q in order to decide whether to accept the corner location estimate or not.
The general problem is to pair each corner point estimate in one stereo image with a corresponding estimate in the other stereo image. The disparity estimate is
then the displacement vector .
We can represent each possible match i j for feature points i j m, ..1 by a bipartite graph with each graph node being a feature point. The correspondence matching problem is then to prune the graph so that one node in the left image is only matched with one node in the right image.
L R L R There are a number of feature correspondence algorithms that have been developed ranging from relaxation labelling type algorithms to neural network techniques. Typically these algorithms work on the basis of :
Feature similarity Smoothness of the underlying disparity field
Correspondence matching for disparity computation
We can define feature point f i , corresponding to node i in the left stereo image. In our case, f i is a corner feature with image location ( , )x yc ci i .
We need to match f i with some feature f j' in the right stereo
image. Such a match would produce a disparity estimate ( , )' 'dij c c c cx x y y
i j i j at position ( , )x yc ci i .
( , )x yc ci i
dij
We will outline a matching algorithm based on relaxation labelling which makes the following assumptions :
We can define a similarity function wij between matching feature points f i and f j' . Matching feature points would be expected to have a large value of wij . One possibility might be :
wsijij
11
where is a constant and sij is the normalised sum-of-squared greylevel differences between pixels in the m-point window centred on the two feature points.
Disparity values for neighbouring feature points are similar. This is based on the fact that most object surfaces are smooth and large jumps in the depth values at neighbouring points on the object surface, corresponding to large jumps in disparity values at neighbouring feature points, are unlikely.
Our relaxation labelling algorithm is based on the probability pi ij( )d that feature point f i is matched to feature point f j' resulting in a disparity value of dij .
We can initialise this probability using our similarity function :
pwwi ijij
ijj
( )( )0 d
We next compute the increment in pi ij( )d by considering the contribution from all of the neighbouring feature points of f i which, themselves, have disparities close to dij :
dij
f i
q pir
ij kr
kllk f f kl ijk i
( ) ( )
: :( ) ( )d d
d d
close to neighbour of
Finally, we update the our probabilities according to :
pp q
p qir
ijir
ij ir
ij
ir
ij ir
ijj
( )( ) ( )
( ) ( )( )( ) ( )
( ) ( )
1 d
d dd d
This is an iterative algorithm, whereby all feature points are updated simultaneously. The algorithm converges to a state where none of the probabilities associated with each feature point doesn’t change above some pre-defined threshold value. At this point the matches are defined such that :
f f p pi j iR
ij k iR
ik if ( ) ( )( ) max ( )d d
Example
Stereo pair (left and right images) generated by a ray tracing program.
A selection of successfully matched corner features (using the above corner detection and matching algorithms) and the final image disparity map which is proportional to the depth at each image point.
Introduction
This area of computer vision research attempts to reconstruct the structure of the imaged 3D environment and the 3D motion of objects in the scene from optical flow measurements made on a sequence of images.
Applications include autonomous vehicle navigation and robot assembly. Typically a video camera (or more than one camera for stereo measurements) are attached to a mobile robot. As the robot moves, it can build up a 3D model of the objects in its environment.
Motion and 3D structure from optical flow
The fundamental relation between the optical flow vector at position ( , )x y in the image plane, v( , ) ( ( , ), ( , ))x y v x y v x yx y , and the relative motion of the point on an object surface projected to ( , )x y can easily be derived using the equations for perspective projection.
We assume that the object has a rigid body translational motion of V ( , , ))V V VX Y Z relative to a camera centred co-ordinate system ( , , )X Y Z .
The equations for perspective projection are :
xy
fZ x y
XY
( , )
We can differentiate this equation with respect to t :
ddt
xy
vv
fZ
ZV XVZV YV
fVZ
fXVZ
fVZ
fYVZ
x
y
X Z
Y Z
X Z
Y Z
2
2
2
=
where ( , , ) ( , , )V V VdXdt
dYdt
dZdtX Y Z .
Substituting in the perspective projection equations, this simplifies to :
vv Z
fV xVfV yV Z
f xf y
VVV
x
y
X Z
Y Z
X
Y
Z
1 1 00
We can invert this equation by solving for ( , , )V V VX Y Z :
VVV
Zf
vv
xyf
X
Y
Z
x
y
0
This consists of a component parallel to the image plane and an unobservable component along the line of sight :
Zf
v vx y( , , )0
v
( , , )x y f
( , , )x y f
VZ
X
f
O
Y
Focus of expansion
From the expression for the optical flow, we can determine a simple structure for the flow vectors in an image corresponding to a rigid body translation :
vv Z
fV xVfV yV
VZ
fVV
x
fVV
y
VZ
x xy y
x
y
X Z
Y Z
Z
X
Z
Y
Z
Z
1
0
0
where ( , ) ( , )x yfVV
fVV
X
Z
Y
Z0 0 is called the focus of expansion
(FOE). For VZ towards the camera (negative), the flow vectors point away from the FOE (expansion) and for VZ away from the camera, the flow vectors point towards the FOE (contraction).
( , )x y0 0
( , )x y
v( , )x y
Example
Diverging tree test sequence (40 frame sequence). The first and last frame in the sequence are shown and the optical flow showing the flow vectors pointing towards the FOE indicating an expanding flow field.
What 3D information does the FOE provide?
( , ) ( , , ) ( , , ),x y ffVV
fVV
ffV
V V VX
Z
Y
Z ZX Y Z0 0
Thus, the direction of translational motion V V/ can be determined by the FOE position.
Y
V
Z
X
f
O
( , , )x y f0 0
We can also determine the time to impact from the optical flow measurements close to the FOE.
vv
VZ
x xy y
x
y
Z
0
0
VZZ
1
is the time to impact, an important quantity for both mobile robots and biological vision systems!
The position of the FOE and the time to impact can be found using the least-squares technique based on measurements of the optical flow ( , )v vx yi i at a number of image points ( , )x yi i (see Haralick & Shapiro, pp. 191).
Z
VZ
The structure from motion problem can be stated as follows :
Given an observed optical flow field v( , )x yi i measured at N image locations ( , ) ..x y i Ni i 1 which are the projected points of N points ( , , )X Y Zi i i on a rigid body surface moving with velocity V ( , , )V V VX Y Z , then determine, to a multiplicative scale constant, the positions ( , , )X Y Zi i i and velocity V ( , , )V V VX Y Z from the flow field.
Note that this is a rather incomplete statement of the full problem since, in most camera systems attached to mobile robots, the camera can pan and tilt inducing an angular velocity about the camera axes. This significantly complicates the solution.
An algorithm for determining structure from motion
An elegant solution exists for the simple case ( , , ) X Y Z 0 (Haralick & Shapiro, pp. 188-191). The general case is more complex, and a lot of research is still going on investigating this general problem and the stability of the algorithms developed.
v( , )x yi i
X Y Zi i i, ,
V
Z
X
f
O
Y
Z
X
Y
The algorithm starting point is the optical flow equation :
VVV
Zf
vv
xyf
X
Y
Z
x
y
0
Thus, since V ( , , )V V VX Y ZT is the vector sum of ( , , )v vx y
T0 and ( , , )x y f T then the vector product of these 2 vectors is orthogonal to V :
But :
vv
xyf
v fv f
v f v x
x
y
y
x
x y0
Thus :
v fv f
v f v x
VVV
y
x
x y
T
X
Y
Z
0
This equation applies to all points ( , ) ..x y i Ni i 1 . Obviously a trivial solution to this equation would be V 0 . Also if some non-zero vector V is a solution then so is the vector cV for any scalar constant c . This confirms that we cannot determine the absolute magnitude of the velocity vector, we can only determine it to a multiplicative scale constant.
We want to solve the above equation, in a least-squares sense, for all points ( , ) ..x y i Ni i 1 , subject to the condition that :
V VT k 2
This constrains the squared-magnitude of the velocity vector to be some arbitrary value k 2 .
We can rewrite the above orthogonality condition for all points ( , ) ..x y i Ni i 1 in matrix-vector notation :
AVVV
X
Y
Z
0
where :
A
v f v f v y x v
v f v f v y x v
x x x y
x x x N N yN N N N
1 1 1 11 1
. . .
. . .
The problem is thus stated as :min ( ) ( )V AV AV V VT T k subject to 2
This is a classic problem in optimization theory and the solution is that the optimum value V is given by the eigenvector of A AT corresponding to the minimum eigenvalue. A AT is given by :
A ATy
i
N
x yi
N
y x i i yi
N
x yi
N
xi
N
x x i i yi
N
y x i i yi
N
x x i i yi
N
x i i
f v f v v f v v y x v
f v v f v f v v y x v
f v v y x v f v v y x v v y x
i i i i i
i i i i i
i i i i i i i
2 2
1
2
1 1
2
1
2 2
1 1
1 1
1
1
( )
( )
( ) ( ) ( v yi
N
i)2
1
Once we have determined our estimate V , we can then compute the depths Z i i N1.. of our scene points since, from our original optical flow equation :
vv Z
f xf y
VVV
x
y
X
Y
Z
1 00
Zv fV xVZv fV yV
x X Z
y Y Z
We can compute the least-squares estimate of each Z i from the above 2 equations as :
The solution is :
Finally, the scene co-ordinates ( ,Y )X i i can be found by applying the perspective projection equations.
Introduction
The problem of structure from stereo is to reconstruct the co-ordinates of the 3D scene points from the disparity measurements. For the simple case of the cameras in normal position, we can derive a relationship between the scene point depth and the disparity using simple geometry :
Structure from stereo
L
b
R
Z
( , , )X Y Z
f
xL xR
O
Take O to be the origin of the ( , , )X Y Z co-ordinate system. Using similar triangles :
XZ
xf
X bZ
xf
L R
xf
xf
bZ
L R
Zbf
x xL R
x xL R is known as the disparity between the projections of the same point in the 2 images. By measuring the disparity of corresponding points (conjugate points), we can infer the depths of the scene point (given knowledge of the camera focal length f and baseline b) and hence build up a depth map of the object surface.
We can also derive this relationship using simple vector algebra. Relative to the origin O of our co-ordinate system, the left and right camera centres are at ( , , )0 0 0 T , ( , , )b T0 0 where b is the camera baseline. For some general scene point ( , , )X Y Z with left and right image projections ( , )x yL L and ( , )x yR R :
xy
fZ
XY
xy
fZ
X bY
L
L
R
R
x xfZ
X x bfbZL R ( ( ))
Zfb
x xL R
Because the cameras are in normal position, the y disparity y yL R is zero.
Once the scene point depth is known, the X and Y co-ordinates of the scene point can be found using the perspective projection equations.
When the cameras are in some general orientation with respect to each other, the situation is a little more complex and we have to use epipolar geometry in order to reconstruct our scene points.
Epipolar geometry
Epipolar geometry reduces the 2D image search space for matching feature points to a 1D search space along the epipolar line. The epipolar line e pR L( ) in the right image plane is determined from the projection of a point P in the left image plane. There is a corresponding epipolar line e pL R( ) corresponding to the projection of P in the right image plane.
The key to understanding epipolar geometry is to recognise that O POR L lie in a plane (the epipolar plane). The epipolar line e pR L( ) is the intersection of the epipolar plane with the right image plane. The corresponding point in the right image plane to point pL in the left image plane can be the projection of any point along the ray O PL , thus defining the epipolar line e pR L( ).
pL
P
OLOR
e pR L( )
e pL R( )
Given a feature point with position vector O PR R in the right hand image plane, the epipolar line e pL R( ) in the left hand image plane can easily be found.
Any vector O PR R , for 1 projects to the epipolar line. With respect to the co-ordinate system centred on the left hand camera, O PR R becomes R( )O P O OR R L R where R is a rotation matrix aligning the left and right camera co-ordinate systems. The rotation matrix and the baseline vector O OL R can be determined using a camera calibration algorithm.
Thus for any value 1, the corresponding position ( , )x y on the epipolar line can be found using perspective projection. By varying , the epipolar line is swept out. Of course, only two values of are required to determine the line.
For cameras not in alignment, we can define a surface of zero disparity relative the vergence angle which is the angle that the cameras’ optical axes make with each other :
For vergence angles , disparities are defined as negative (ie. objects lie outside the zero disparity surface). For vergence angles , disparities are defined as positive (ie. objects lie inside the zero disparity surface.)
xL
xR
d x xL R 0,
d 0
d 0
A particularly simple case arises when the image planes are parallel and perpendicular to the optical axis, the epipolar lines are just the image rows.
We have looked at how we can interpret 2D information, specifically optical flow and disparity measurements in order to determine 3D motion and structure information about the imaged scene.
We have described algorithms for :
Optical flow estimation
Feature corner point location estimation
Correspondence matching for feature points
3D velocity and structure estimation from optical flow
Summary