docs/computer vision... · web viewonce the scene point depth is known, the and co-ordinates of the...

Introduction

We have seen that, in order to compute disparity in a pair of stereo images, we need to determine the local shift (disparity) of one image with respect to the other image.

In order to perform the matching which allows us to compute the disparity, we have 2 problems to solve :

Which points to select for matching?- we require distinct ‘feature points’

How do we model feature points in 1 image with feature points in the second image?

5 feature points 5 feature points

Left Right

Feature matching for stereo image analysis

There are some fairly obvious guidelines in solving both problems.

Feature points must be :

Local (extended line segments are no good, we require local disparity)

Distinct (a lot ‘different’ from neighbouring points)

Invariant (to rotation, scale, illumination)

The matching process must be :

Local (thus limiting the search area)

Consistent (leading to ‘smooth’ disparity estimates)

Approaches to feature point selection

Previous approaches to feature point selection have been :

Moravec interest operator- this is based on thresholding local greylevel squared differences

Symmetric features - circular features, spirals

Line segment endpoints

Corner points

A feature extraction algorithm based on corner point detection is described in Haralick & Shapiro, pp. 334 which allows corner point locations to be accurately estimated along with the covariance matrix of the location estimate. The approach uses least-squares estimation and leads to an identical covariance matrix as was obtained for the optical flow estimation algorithm!

Assume that the corner point lies at location ( , )x yc c . We require an estimate ( , )x yc c of the location together with a covariance matrix which we obtain by considering the intersection of line segments with an m point window spanning the corner point location.

Suppose we observe 2 ‘strong’ edges in the window. These edges will meet at the corner point estimate ( , )x yc c (unless they are parallel).

An algorithm for corner point estimation

What if we observe 3 strong edges?

The location estimate ( , )x yc c minimises the sum of perpendicular distances to each line segment.

Thus, in this case ( , )x yc c is selected to minimise n n n12

22

32 .

n3n1

( , )x y1 1 ( , )x y2 2

( , )x yc c( , )x y3 3

( , )x yc c

n2

How do we characterise an edge segment?

Each edge passes through some point ( , )x yi i and has a gradient f x y fi i i( , ) at that point. The edge segment is characterised by the perpendicular distance to the origin li and the gradient vector direction i .

(cos( ),sin( )) f i i i

l x y f x yi i iT

i i i i i ( , ) cos( ) sin( )

Finally, we can quantify the certainty with which we believe an edge to be present passing through ( , )x yi i by the gradient vector squared-magnitude f wi i

2 .

In order to establish a mathematical model of the corner location, we assume that the perpendicular distance of the edge segment ( , )li i is modelled as a random variable ni with variance n

2 .

l x y ni c i c i i cos( ) sin( )

Assuming our observations are the quantities ( , )li i for each location i in the m -point window, we can employ a weighted least-squares procedure to estimate ( , )x yc c :

( , ) min( , )x y w nc c x y i ii

m

c c

2

1

where w fi i 2 .

ni

i

( , )x yi i

li

f i

( , )x yc c

Defining :

( , ) ( cos sin )x y w n w l x yc c i i i i i i i ii

m

i

m

2 2

11

( , ) ( , )

x yx

x yy

c c

c x x

c c

c y yc c c c

0 0

These equations straightforwardly lead to a matrix-vector equation :

w w

w w

xy

l w

l w

i ii

m

i i ii

m

i i ii

m

i ii

mc

c

i i ii

m

i i ii

m

cos cos sin

cos sin sin

cos

sin

2

1 1

1

2

1

1

1

Subsituting :

l x y

ffx

ffy

i i i i i

i ii

i ii

cos( ) sin( )

cos

sin

fx

fy

fy

fy

fy

fy

xy

fx

xfx

fyy

fx

fyx

fy

y

i

i

mi i

i

m

i i

i

mi

i

m

c

c

ii

i ii

i

m

i ii

ii

i

m

2

1 1

1

2

1

2

1

2

1

xy

fx

fy

fy

fy

fy

fy

fx

xfx

fyy

fx

fyx

fy

y

c

c

i

i

mi i

i

m

i i

i

mi

i

m

ii

i ii

i

m

i ii

ii

i

m

2

1 1

1

2

1

1 2

1

2

1

We can estimate n2 by :

( cos sin ) n i i c ii

m

c imw l x y2

1

212

The covariance matrix is then given by :

C( , )

x y n

i

i

mi i

i

m

i i

i

mi

i

mc c

fx

fy

fy

fy

fy

fy

2

2

1 1

1

2

1

1

This is the same covariance matrix we obtained for the optical flow estimation algorithm.

Window selection and confidence measure for the corner location estimate

How do we know our window contains a corner point? We require a confidence estimate based on the covariance matrix C( , )x yc c .As in the case of the optical flow estimate, we focus on the eigenvalues 1 2 of C( , )x yc c and define a confidence ellipse :

We require 2 criteria to be fulfilled in order to accept the corner point estimate :

The maximum eigenvalue 1 should the smaller than a threshold value

The confidence ellipse shall not be too elongated implying that the corner location estimate precision is much greater in one direction than the other

In order to satisfy the second criterion, we can quantify the circularity of the confidence ellipse by the form factor q :

qtr

1

4 41 2

1 2

21 2

1 22 2

( )

det( )( )

CC

Its easy to check that 0 1 q with q 0 for an elongated ellipse and q 1 for a circular ellipse. Thus we can threshold the value of q in order to decide whether to accept the corner location estimate or not.

The general problem is to pair each corner point estimate in one stereo image with a corresponding estimate in the other stereo image. The disparity estimate is

then the displacement vector .

We can represent each possible match i j for feature points i j m, ..1 by a bipartite graph with each graph node being a feature point. The correspondence matching problem is then to prune the graph so that one node in the left image is only matched with one node in the right image.

L R L R There are a number of feature correspondence algorithms that have been developed ranging from relaxation labelling type algorithms to neural network techniques. Typically these algorithms work on the basis of :

Feature similarity Smoothness of the underlying disparity field

Correspondence matching for disparity computation

We can define feature point f i , corresponding to node i in the left stereo image. In our case, f i is a corner feature with image location ( , )x yc ci i .

We need to match f i with some feature f j' in the right stereo

image. Such a match would produce a disparity estimate ( , )' 'dij c c c cx x y y

i j i j at position ( , )x yc ci i .

( , )x yc ci i

dij

We will outline a matching algorithm based on relaxation labelling which makes the following assumptions :

We can define a similarity function wij between matching feature points f i and f j' . Matching feature points would be expected to have a large value of wij . One possibility might be :

wsijij

11

where is a constant and sij is the normalised sum-of-squared greylevel differences between pixels in the m-point window centred on the two feature points.

Disparity values for neighbouring feature points are similar. This is based on the fact that most object surfaces are smooth and large jumps in the depth values at neighbouring points on the object surface, corresponding to large jumps in disparity values at neighbouring feature points, are unlikely.

Our relaxation labelling algorithm is based on the probability pi ij( )d that feature point f i is matched to feature point f j' resulting in a disparity value of dij .

We can initialise this probability using our similarity function :

pwwi ijij

ijj

( )( )0 d

We next compute the increment in pi ij( )d by considering the contribution from all of the neighbouring feature points of f i which, themselves, have disparities close to dij :

dij

f i

q pir

ij kr

kllk f f kl ijk i

( ) ( )

: :( ) ( )d d

d d

close to neighbour of

Finally, we update the our probabilities according to :

pp q

p qir

ijir

ij ir

ij

ir

ij ir

ijj

( )( ) ( )

( ) ( )( )( ) ( )

( ) ( )

1 d

d dd d

This is an iterative algorithm, whereby all feature points are updated simultaneously. The algorithm converges to a state where none of the probabilities associated with each feature point doesn’t change above some pre-defined threshold value. At this point the matches are defined such that :

f f p pi j iR

ij k iR

ik if ( ) ( )( ) max ( )d d

Example

Stereo pair (left and right images) generated by a ray tracing program.

A selection of successfully matched corner features (using the above corner detection and matching algorithms) and the final image disparity map which is proportional to the depth at each image point.

Introduction

This area of computer vision research attempts to reconstruct the structure of the imaged 3D environment and the 3D motion of objects in the scene from optical flow measurements made on a sequence of images.

Applications include autonomous vehicle navigation and robot assembly. Typically a video camera (or more than one camera for stereo measurements) are attached to a mobile robot. As the robot moves, it can build up a 3D model of the objects in its environment.

Motion and 3D structure from optical flow

The fundamental relation between the optical flow vector at position ( , )x y in the image plane, v( , ) ( ( , ), ( , ))x y v x y v x yx y , and the relative motion of the point on an object surface projected to ( , )x y can easily be derived using the equations for perspective projection.

We assume that the object has a rigid body translational motion of V ( , , ))V V VX Y Z relative to a camera centred co-ordinate system ( , , )X Y Z .

The equations for perspective projection are :

xy

fZ x y

XY

( , )

We can differentiate this equation with respect to t :

ddt

xy

vv

fZ

ZV XVZV YV

fVZ

fXVZ

fVZ

fYVZ

x

y

X Z

Y Z

X Z

Y Z

2

2

2

=

where ( , , ) ( , , )V V VdXdt

dYdt

dZdtX Y Z .

Substituting in the perspective projection equations, this simplifies to :

vv Z

fV xVfV yV Z

f xf y

VVV

x

y

X Z

Y Z

X

Y

Z

1 1 00

We can invert this equation by solving for ( , , )V V VX Y Z :

VVV

Zf

vv

xyf

X

Y

Z

x

y

0

This consists of a component parallel to the image plane and an unobservable component along the line of sight :

Zf

v vx y( , , )0

v

( , , )x y f

( , , )x y f

VZ

X

f

O

Y

Focus of expansion

From the expression for the optical flow, we can determine a simple structure for the flow vectors in an image corresponding to a rigid body translation :

vv Z

fV xVfV yV

VZ

fVV

x

fVV

y

VZ

x xy y

x

y

X Z

Y Z

Z

X

Z

Y

Z

Z

1

0

0

where ( , ) ( , )x yfVV

fVV

X

Z

Y

Z0 0 is called the focus of expansion

(FOE). For VZ towards the camera (negative), the flow vectors point away from the FOE (expansion) and for VZ away from the camera, the flow vectors point towards the FOE (contraction).

( , )x y0 0

( , )x y

v( , )x y

Example

Diverging tree test sequence (40 frame sequence). The first and last frame in the sequence are shown and the optical flow showing the flow vectors pointing towards the FOE indicating an expanding flow field.

What 3D information does the FOE provide?

( , ) ( , , ) ( , , ),x y ffVV

fVV

ffV

V V VX

Z

Y

Z ZX Y Z0 0

Thus, the direction of translational motion V V/ can be determined by the FOE position.

Y

V

Z

X

f

O

( , , )x y f0 0

We can also determine the time to impact from the optical flow measurements close to the FOE.

vv

VZ

x xy y

x

y

Z

0

0

VZZ

1

is the time to impact, an important quantity for both mobile robots and biological vision systems!

The position of the FOE and the time to impact can be found using the least-squares technique based on measurements of the optical flow ( , )v vx yi i at a number of image points ( , )x yi i (see Haralick & Shapiro, pp. 191).

Z

VZ

The structure from motion problem can be stated as follows :

Given an observed optical flow field v( , )x yi i measured at N image locations ( , ) ..x y i Ni i 1 which are the projected points of N points ( , , )X Y Zi i i on a rigid body surface moving with velocity V ( , , )V V VX Y Z , then determine, to a multiplicative scale constant, the positions ( , , )X Y Zi i i and velocity V ( , , )V V VX Y Z from the flow field.

Note that this is a rather incomplete statement of the full problem since, in most camera systems attached to mobile robots, the camera can pan and tilt inducing an angular velocity about the camera axes. This significantly complicates the solution.

An algorithm for determining structure from motion

An elegant solution exists for the simple case ( , , ) X Y Z 0 (Haralick & Shapiro, pp. 188-191). The general case is more complex, and a lot of research is still going on investigating this general problem and the stability of the algorithms developed.

v( , )x yi i

X Y Zi i i, ,

V

Z

X

f

O

Y

Z

X

Y

The algorithm starting point is the optical flow equation :

VVV

Zf

vv

xyf

X

Y

Z

x

y

0

Thus, since V ( , , )V V VX Y ZT is the vector sum of ( , , )v vx y

T0 and ( , , )x y f T then the vector product of these 2 vectors is orthogonal to V :

But :

vv

xyf

v fv f

v f v x

x

y

y

x

x y0

Thus :

v fv f

v f v x

VVV

y

x

x y

T

X

Y

Z

0

This equation applies to all points ( , ) ..x y i Ni i 1 . Obviously a trivial solution to this equation would be V 0 . Also if some non-zero vector V is a solution then so is the vector cV for any scalar constant c . This confirms that we cannot determine the absolute magnitude of the velocity vector, we can only determine it to a multiplicative scale constant.

We want to solve the above equation, in a least-squares sense, for all points ( , ) ..x y i Ni i 1 , subject to the condition that :

V VT k 2

This constrains the squared-magnitude of the velocity vector to be some arbitrary value k 2 .

We can rewrite the above orthogonality condition for all points ( , ) ..x y i Ni i 1 in matrix-vector notation :

AVVV

X

Y

Z

0

where :

A

v f v f v y x v

v f v f v y x v

x x x y

x x x N N yN N N N

1 1 1 11 1

. . .

. . .

The problem is thus stated as :min ( ) ( )V AV AV V VT T k subject to 2

This is a classic problem in optimization theory and the solution is that the optimum value V is given by the eigenvector of A AT corresponding to the minimum eigenvalue. A AT is given by :

A ATy

i

N

x yi

N

y x i i yi

N

x yi

N

xi

N

x x i i yi

N

y x i i yi

N

x x i i yi

N

x i i

f v f v v f v v y x v

f v v f v f v v y x v

f v v y x v f v v y x v v y x

i i i i i

i i i i i

i i i i i i i

2 2

1

2

1 1

2

1

2 2

1 1

1 1

1

1

( )

( )

( ) ( ) ( v yi

N

i)2

1

Once we have determined our estimate V , we can then compute the depths Z i i N1.. of our scene points since, from our original optical flow equation :

vv Z

f xf y

VVV

x

y

X

Y

Z

1 00

Zv fV xVZv fV yV

x X Z

y Y Z

We can compute the least-squares estimate of each Z i from the above 2 equations as :

The solution is :

Finally, the scene co-ordinates ( ,Y )X i i can be found by applying the perspective projection equations.

Introduction

The problem of structure from stereo is to reconstruct the co-ordinates of the 3D scene points from the disparity measurements. For the simple case of the cameras in normal position, we can derive a relationship between the scene point depth and the disparity using simple geometry :

Structure from stereo

L

b

R

Z

( , , )X Y Z

f

xL xR

O

Take O to be the origin of the ( , , )X Y Z co-ordinate system. Using similar triangles :

XZ

xf

X bZ

xf

L R

xf

xf

bZ

L R

Zbf

x xL R

x xL R is known as the disparity between the projections of the same point in the 2 images. By measuring the disparity of corresponding points (conjugate points), we can infer the depths of the scene point (given knowledge of the camera focal length f and baseline b) and hence build up a depth map of the object surface.

We can also derive this relationship using simple vector algebra. Relative to the origin O of our co-ordinate system, the left and right camera centres are at ( , , )0 0 0 T , ( , , )b T0 0 where b is the camera baseline. For some general scene point ( , , )X Y Z with left and right image projections ( , )x yL L and ( , )x yR R :

xy

fZ

XY

xy

fZ

X bY

L

L

R

R

x xfZ

X x bfbZL R ( ( ))

Zfb

x xL R

Because the cameras are in normal position, the y disparity y yL R is zero.

Once the scene point depth is known, the X and Y co-ordinates of the scene point can be found using the perspective projection equations.

When the cameras are in some general orientation with respect to each other, the situation is a little more complex and we have to use epipolar geometry in order to reconstruct our scene points.

Epipolar geometry

Epipolar geometry reduces the 2D image search space for matching feature points to a 1D search space along the epipolar line. The epipolar line e pR L( ) in the right image plane is determined from the projection of a point P in the left image plane. There is a corresponding epipolar line e pL R( ) corresponding to the projection of P in the right image plane.

The key to understanding epipolar geometry is to recognise that O POR L lie in a plane (the epipolar plane). The epipolar line e pR L( ) is the intersection of the epipolar plane with the right image plane. The corresponding point in the right image plane to point pL in the left image plane can be the projection of any point along the ray O PL , thus defining the epipolar line e pR L( ).

pL

P

OLOR

e pR L( )

e pL R( )

Given a feature point with position vector O PR R in the right hand image plane, the epipolar line e pL R( ) in the left hand image plane can easily be found.

Any vector O PR R , for 1 projects to the epipolar line. With respect to the co-ordinate system centred on the left hand camera, O PR R becomes R( )O P O OR R L R where R is a rotation matrix aligning the left and right camera co-ordinate systems. The rotation matrix and the baseline vector O OL R can be determined using a camera calibration algorithm.

Thus for any value 1, the corresponding position ( , )x y on the epipolar line can be found using perspective projection. By varying , the epipolar line is swept out. Of course, only two values of are required to determine the line.

For cameras not in alignment, we can define a surface of zero disparity relative the vergence angle which is the angle that the cameras’ optical axes make with each other :

For vergence angles , disparities are defined as negative (ie. objects lie outside the zero disparity surface). For vergence angles , disparities are defined as positive (ie. objects lie inside the zero disparity surface.)

xL

xR

d x xL R 0,

d 0

d 0

A particularly simple case arises when the image planes are parallel and perpendicular to the optical axis, the epipolar lines are just the image rows.

We have looked at how we can interpret 2D information, specifically optical flow and disparity measurements in order to determine 3D motion and structure information about the imaged scene.

We have described algorithms for :

Optical flow estimation

Feature corner point location estimation

Correspondence matching for feature points

3D velocity and structure estimation from optical flow

Summary

docs/computer vision... · web viewonce the scene point depth is known, the and co-ordinates of the...

Documents