formation et analyse d’images session 7

1

Formation et Analyse d’ImagesSession 7

Daniela Hall

25 November 2004

2

Course Overview

• Session 1: – Homogenous coordinates and tensor notation– Image transformations– Camera models

• Session 2:– Camera models– Reflection models– Color spaces

• Session 3:– Review color spaces– Pixel based image analysis

• Session 4:– Gaussian filter operators– Scale Space

3

Course overview

• Session 5:– Contrast description– Hough transform

• Session 6:– Kalman filter– Tracking of regions, pixels, and lines

• Session 7:– Stereo vision – Epipolar geometry

• Session 8: exam

4

Session overview

1. Stereo vision

2. Epipolar geometry

3. 3d point position from two views using epipolar geometry

4. 3d point position from two views when camera models are known.

5

Human stereo vision• Two Eyes = Three Dimensions (3D)!

Each eye captures its own view and the two separate images are sent on to the brain for processing.

• When the two images arrive simultaneously in the back of the brain, they are united into one picture. The mind combines the two images by matching up the similarities and adding in the small differences.

• The small differences between the two images add up to a big difference in the final picture! The combined image is more than the sum of its parts. It is a three-dimensional stereo picture.

• The word "stereo" comes from the Greek word "stereos" which means firm or solid. With stereo vision you see an object as solid in three spatial dimensions--width, height and depth--or x, y and z.

6

Computer stereo vision

• Stereo vision allows to estimates the 3D position of scene point X from its positions x, x’ in 2 images taken from different camera positions P, P’.

• The two views can be acquired simultaneously with two cameras or sequentially with one camera in motion.

• Each view has an associated camera matrix P,P’.• The 3d point X is imaged as x=PX in the first view and

x’=P’X in the second view. • x and x’ correspond because they are the image of the

same point in 3d.

Source: Hartley, Zisserman: Multiple view geometry in computer vision, Cambridge, 2000. http://www.robots.ox.ac.uk/~vgg/hzbook/

7

Topics in stereo vision

• Correspondence geometry (epipolar geometry): – given an image point x in the first view, how does it

constrain the corresponding point x’ in the second view?

• Camera geometry (motion): – Given a set of corresponding points {xi, x’i}, what are

the cameras P, P’ of the two views?

• Scene geometry: – Given corresponding image points {xi,x’i} and cameras

P, P’, what is the position of X in 3d?

8

Session overview

1. Stereo vision




9

Epipolar geometry

• A point in one view defines an epipolar line in the other view on which the corresponding point lies.

• The epipolar geometry depends only on the cameras. Their relative position and their internal parameters.

• The epipolar geometry is represented by a 3x3 matrix called the fundamental matrix F.

10

Epipolar geometry

thanks to Andrew Zisserman and Richard Hartley for all figures.

11

Notations

• X 3d point• C, C’ 3d position of camera• I, I’ image planes.• x, x’ 2d position of 3d point X in image I, I’ of camera C,

C’.• pi epipolar plane. C, x, e, e’, C’,X all lie on pi.• e, e’ epipoles (2d position of the camera center C in image

I’). C, e, e’, C’ lie on the baseline.• l, l’ epipolar line. l’ is the intersection of the epipolar plane

pi spanned by the baseline CC’ and the ray of Cx. The corresponding point x’ must lie on l’.

12

Epipolar geometry

• For any two fixed cameras we have one baseline.• For any 3d point X we have a different epipolar plane pi.• All epipolar planes intersect at the baseline.

13

Epipolar line

• Suppose we know only x and the baseline. • How is the corresponding point x’ in the other

image constrained?• pi is defined by the baseline and the ray Cx. • The epipolar line l’ is the image of this ray in the

other image. x’ must lie on l’.• The benefit of the epipolar line is that the

correspondence search can be restricted to l’ instead of searching the entire image.

14

Example

15

Epipolar terminology

• Epipole: – intersection of the line joining the camera centers (baseline) and

the image plane. – the image of the other camera center in the image plane.– intersection of the epipolar lines.

• Epipolar plane: – a plane containing the baseline. There is a one-parameter family of

epipolar planes for a fixed camera pair.• Epipolar line:

– intersection of the epipolar plane with the image plane.– all epipolar lines intersect in the epipole.– an epipolar plane intersects both image planes and defines

correspondences between the lines.

16

The fundamental matrix• The fundamental matrix is the algebraic representation of

the epipolar geometry.• Derivation of F:

– map point x to some point x’ in the other image– l’ is obtained as the line joing x’ and the epipole e’– F can be computed from these elements

HeF

FxxHel

]'[

]'['Relation of x and epipolar lineEquation for Fe epipole[e]x skew-symetricmatrix

Relation scalar productand skew symetric matrix

baba

ee

ee

ee

e

eeee T

][

0

0

0

][

),,(

12

13

23

321

17

Fundamental matrix

18

Correspondence condition

• The fundamental matrix satisfies the condition that for any pair of corresponding points x, x’ in the two images

• This is true, because if x and x’ correspond, then x’ lies on the epipolar line l’. And since we know l’=Fx we can write:

• The importance of this relations is that we can compute the fundamental matrix only from point correspondences. We need at least 7 point correspondences (details chap 10, Hartley, Zisserman book).

0' Fxx T

Fxxlx TT '''0

19

Computing the fundamental matrix

• Given sufficiently many point matches xi, xi’ the equation x’TFx=0 can be used to compute F.

• Writing x=(x,y,1)T and x’=(x’,y’,1)T each point match gives rise to an

equation of the unknowns of F.

• writing the 9 unknowns of F as a vector f, we get the lower two equations.

• Using the last equation SVD provides a direct solution for F.

0

1''''''

...........................

1111'11'11'1'11'11'

0)1,,,',',',',','(

0'''''' 333231232221131211

f

yxyyyxyxyxxx

yxyyyxyxyxxx

fA

fyxyyyxyxyxxx

fyfxffyyfyxfyfxyfxxfx

nnnnnnnnnnnn

20

The fundamental matrix

• Allows to compute the epipolar line l’ in I’ for a point x in I. x’ lies on l’.

• Allows to compute the l in I for x’ in I’. We can verify the point correspondence x, x’, because x must lie on l.

• In the course 3d vision, you will see that the F is used to compute the camera projection model P for each camera. With the camera model you can estimate the 3d position of a point without calibrating the cameras. (self-calibration).

21

Session overview

1. Stereo vision




22

Computing the 3d position of a point

1. Compute the fundamental matrix from at least 7 point correspondences.

2. Determine two camera projection matrices.

3. Determine a point correspondence x, x’ in the two views.

4. The 3d position X of the image points x and x’ can be computed directly as intersection of the two rays defined by Cx and C’x’.

23

Computing the 3d position of a point

C C’

LU

24

Camera projection matrices P, P’

• We set the origin of the world to the camera center C of the first camera. Then The projection matrix P of the first camera is

• The projection matrix of the second camera has the form

• It can be computed by solving F=[e]xP’P+ for P’• C is the null vector of P:

000

100

010

001

, inverse, pseudo ,

0100

0010

0001

)0|( PIPPPIP

tion t translaandrotation 3D R with ),|(' tRP

0

CP

25

Defining the rays

• The ray backprojected from x by P is obtained – by solving PX=x

– by using 2 points on the ray and compute the tensor (see Session 1)

• C is on the ray and x. The 3d position of x is P+x

• line equation using tensor notation

kjijki

kjijki

kjijki

ULEX

CxPEU

CxPEL

)'()''(

)()(line defined by x and Cline defined by x’ and C’3d point as intersection of L and U

26

Intersecting the rays

• In real world applications x’TFx =0 may not be true due to imprecise measurements.

• This means that the rays L and U may not intersect (they are skew).

• In these cases you need to find the point with minimum distance between the L and U. You can solve this by SVD.

27

Direct compution of 3d point position

• Choose a calibration object, whose 3d position is known

• Calibrate the cameras (compute the camera model MI

S and NIS from at least 5 ½ points)

• Then from a correspondence P, Q the 3d position of R can be computed directly.

28

Session overview

1. Stereo vision




29

Direct computation of 3d point position

• R is at the intersection of 3 planes

30

Camera modelSI

SSC

SRC

IR

I PMPTMCP

1S

S

S

IS z

y

x

M

w

wj

wi

SS

IS

SS

IS

SS

IS

SS

IS

PM

PM

mzsmysmxsm

mzsmysmxsm

w

wjj

PM

PM

mzsmysmxsm

mzsmysmxsm

w

wii

3

2

3

1

)(

)(

34333231

24232221

)(

)(

34333231

14131211

Equation:

Image coordinates

31

Transformation image-scene

• Problem: we need to know depth zs for each image position. Since zs can change, a general form of MS

I can not exist.

• Any point in I is the image of points in a ray.

ISI

IRI

CR

SC

S PMPCMTP

32

Calibration

1. Construct a calibration object whose 3D position is known.

2. Measure image coordinates

3. Determine correspondences between 3D point RS

k and image point PIk.

4. We have 11 DoF. We need at least 5 ½ correspondences.

33

Calibration

• For each correspondence scene point RSk and image

point PIk

• which gives following equations for k=1, ..., 6

• from wich MIS can be computed

Skh

IS

Skh

IS

k

kkk RM

RM

w

iwi

3

1

)(

)(

Skh

IS

Skh

IS

k

kkk RM

RM

w

jwj

3

2

)(

)(

0))(())((

0))(())((32

31

Skh

ISk

Skh

IS

Skh

ISk

Skh

IS

RMjRM

RMiRM

34

Properties of MIS

• The first equation defines a plane that goes through the camera center and the image plane in x direction

• The second equation defines a plane that goes through the camera center and the image plane in y direction.

0))(())((

0))(())((32

31

Skh

ISk

Skh

IS

Skh

ISk

Skh

IS

RMjRM

RMiRM

35

Calibration using many points

• For k=5 ½ M has one solution.– Solution depends on precise measurements of

3D and 2D points. – If you use another 5 ½ points you will get a

different solution.

• A more stable solution is found by using large number of points and do optimisation.

36

Calibration using many points

• For each point correspondence we know (i,j) and R.

• We want to know MIS.

Solve equation with your favorite algorithm (least squares, levenberg-marquart, svd,...)

0))(())((

0))(())((32

31

Skh

ISk

Skh

IS

Skh

ISk

Skh

IS

RMjRM

RMiRM

0

34

33

32

31

24

23

22

21

14

13

12

11

10000

00001

0000000000

0000000000

m

m

m

m

m

m

m

m

m

m

m

m

jzjyjxjzyx

iziyixizyx

37

Computation of 3d point position

• We have the camera model MIS of camera 1

and the camera model NIS of camera 2.

• We have a point PI in camera 1 and a point QI in camera 2 which correspond (that means they are the image of the same scene point RS).

• The position of RS can be computed by the intersection of 3 planes.

38


• PI = MISRS, PI=(i,j), QI=NI

SRS, QI=(u,v)

• We have following equations

• RS can be found by using 3 of those 4 equations.

0))(())((

0))(())((

0))(())((

0))(())((

32

31

32

31

Sh

IS

Sh

IS

Sh

IS

Sh

IS

Sh

IS

Sh

IS

Sh

IS

Sh

IS

RNvRN

RNuRN

RMjRM

RMiRM

39


34

24

34

24

34

14

1

33

23

32

22

31

21

33

23

32

22

31

21

33

13

32

12

31

11

34

24

34

24

34

14

33

23

32

22

31

21

33

23

32

22

31

21

33

13

32

12

31

11

32

32

31

)()(

)()(

)()(

)()()()()()(

)()()()()()(

)()()()()()(

)()(

)()(

)()(

)()()()()()(

)()()()()()(

)()()()()()(

0

)()(

)()(

)()(

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

IS

S

hISh

IS

hISh

IS

hISh

IS

NvN

MjM

MiM

NvNNvNNvN

MjMMjMMjM

MiMMiMMiM

z

y

x

NvN

MjM

MiM

z

y

x

NvNNvNNvN

MjMMjMMjM

MiMMiMMiM

R

NvN

MjM

MiM

The point RS=(x,y,z,1) is computed as follows.

40

Exam

• Tuesday, Nov 30, 2004, 9:00, amphi D, duration 3h.

• Documents needed for the exam– Class notes– Pocket calculator– Kalman tutorial