introduction to 3d reconstruction and stereo vision -...

Introduction to 3D Reconstructionand Stereo Vision

Francesco Isgro

3D Reconstruction and Stereo – p.1/37

What are we going to see today?

Shape from X

Range data

3D sensors

Active triangulation

Stereo visionepipolar constraintfundamental matrixmore and more next classes . . .


Range Data

Range images are a special class of digital images

Each pixel expresses the distance of a visible point in thescene from a known reference frame

A range image reproduces the 3D structure of a scene

It is best thought of as a sampled surface


Representation of range data

Range data can be represented in two basic forms:

xyz form or cloud of pointsa list of 3D coordinates in a given reference frameno specific order is required

rij forma matrix of depth values of points along the directions ofthe xy image axesthe points follow a specific order, given by the xs and ys


Range sensors

A range sensor is a device used to acquire range data.Range sensors may measure:

depth at one point only

shape or surface profiles

full surfaces


We distinguish between

active range sensors: project energy (light, sonar, pulse) onthe scene and detect its position to perform the measure; orexploit the effects of controlled changes of some sensorparameters (e.g. focus)

passive range sensors: rely only on image intensities toperform the measure


ActiveRadar and sonarMoirè interferometryFocusing/defocusingZoomActive triangulationMotion

PassiveStereopsisMotionContoursTextureShading 3D Reconstruction and Stereo – p.8/37

Active triangulation

A beam of light strikes the surface of the scene, and some of thelight bounces toward a sensor (camera).The centre of the images reflection is triangulated against thelaser line of sight.


θ b

f x

Z

X

light projector

plane of light

P(X,Y,Z)

camera

X

Y

Z

=b

f cot θ − x

x

y

f


The stripe points in the images must be easy to identify. Typicalsolutions

projecting laser light, making the stripe brighter than the restof the image (concavities may create reflections)

projecting a black line onto a matte white or grey object(location may be confused by other dark patches, e.g.shadows)


The IMPACT system


Stereopsis

Stereo vision refers to the ability to infer information on the3Dstructure and distance of a scene from two (or more) imagestaken from different viewpoints.A stereo system must solve two main problems

Finding correspondences: which parts in the left and rightimages are projections of the same scene element

Reconstruction: determining the 3D structure from thecorrespondences


A simple stereo system

P'

Q

P

Ol Or

Il Ir

Q'

pl ql pr qr

(a)

P

OlOr

xl xr

(b)

T

pl pr

f cl

cr

Z

Z =fT

d, d = xr − xl


Parameters of a stereo system

Intrinsic parameterscharacterise the mapping of an image point from camerato pixel coordinates in each cameratwo full-rank 3 × 3 matrices A and A′

Extrinsic parametersdescribe the relative position and orientation of the twocamerasA rotation matrix R and a translation vector T


The two projection matrices can be written as

Q = A[I;0]

Q′ = A′[R;T].

In the uncalibrated case as

Q = [I;0]

Q′ = [Q′;q′].


Homography of a plane

Given a plane τ in space

H : τ −→ π

H′ : τ −→ π′

Hτ : π −→ π′


In the case of Euclidean cameras if τ = (n, d)

Hτ = A′

(

R − tnt

d

)

A−1

The homography of the plane at infinity

H∞ = limd,∞

A′

(

R − tnt

d

)

A−1

= A′RA−1


Epipolar geometry

’

’

Epipolar line

eO

PEpipolar plane

Epipolar lineO

π

p p’

e’

π

epipolar line is the image of an optical ray

epipole is the image in one camera of the centre of the othercamera

all epipolar lines intersect in the epipole


Fundamental matrix

The relation F associating each point p on π with itscorresponding epipolar line λp is projective linear.The matrix governing this relation

F = [e′]×

Q′Q−1

is called fundamental matrix


Given two corresponding points p and p′:the epipolar line λp is

Fp

p′ is on λp thereforeptFtp′ = 0


Properties of the fundamental matrix

F is rank-2

it has 7 degrees of freedom: F encodes the geometry of twocameras

F = [e′]×

Hτ

λP′ = Ftp′

Fe = 0 and Fte′ = 0


The fundamental matrix in thecalibrated case

F = [A′t]×

A′RA−1 = A′−t [t]×

RA−1.

Introducing the essential matrix

E = [t]×

R

F = A′−tEA−1


Estimation of the fundamental matrix

The fundamental matrix can be computed once a set ofcorresponding points (p′

ip′

i) among the images has beendetermined

Linear solution

Least Squares

Non-linear estimation

Robust estimation


Linear solution

F is 3 × 3, determined up to a scale factor.We know that for corresponding points

p′ti Fpi = 0

8 correspondences are enough to build a linear system

Af = 0,

of 8 equations in 9 unknownsEach equation row of A is

[xix′

i, yix′

i, x′

i, xiy′

i, yiy′

i, y′

i, xi, Yi, 1]


Coordinates of corresponding points are not exact.We use more than 8 points in order to minimise the effect of theerror in the coordinatesWe solve

Af = 0,

where A is n × 9

It is solved by using Singular Value Decomposition of A


Singular Value Decomposition

Any m × n matrix A can be written as

A = UWVt

U and V are orthogonal

U is m × n

W is a n × n not-negative diagonal matrix

V is n × n

the entries wj on the diagonal of W are called singularvalues

The columns of V corresponding to wj = 0 generate the nullspace of A


Solution by SVD

A must be rank-8 as the null space must be of dimension 1Because of the noise corrupting the coordinates of the points ingeneral A is full rank

Solution is the column vector vj corresponding to the smallest

singular value wj


Data normalisation

The linear algorithm performs badly when pixel coordinates areused, because A is badly conditionedPerformance are improved normalising the data in both images:

Translate data so that centroid is the origin

Scale coordinates so that average distance from the origin is√2

In this way the average point is [1, 1, 1]t


Correcting F

Computing F linearly the rank-2 condition does not hold.The closest rank-2 matrix F′ can be adjusted using SVD

F = U diag(r, s, t) Vt

Set F′ = U diag(r, s, 0) Vt


Linear regression

Let us suppose that the following set of n observations is given

y1 x11 x12 · · · x1p−1 x1p

y2 x21 x22 · · · x2p−1 x2p

...yn xn1 xn2 · · · xnp−1 xnp

.

We assume that, for each i = 1, · · · , n

yi = xi1θ1 + · · · + x1pθp + ei

ei is the error term normally distributed with mean zero and un-

known standard deviation σ3D Reconstruction and Stereo – p.31/37

An estimator tries to determine the vector θ of the modelparametersThe estimated values θ, are called regression coefficientsThe values

yi = xi1θ1 + · · · + xipθp

are called predicted values of yi, and the values

ri = yi − yi

are called residuals

Usually the estimator determines the vector θ which minimise a

function of the residuals.


Maximum likelihood estimator

We want to maximise the probability of a set of parameters togenerate the dataAssuming Gaussian noise the probability is

∏

i

exp

(

−1

2

r2

i

σ2

)

∆y


Maximising it is equivalent to minimise the negative of itslogarithm

∑

i

r2

i

σ2− n log ∆y

∑

i

r2

i

σ2

This method is called Least Squares


Least Squares and SVD

In general to use SVD to solve overdetermined systems

Ax = b

It can be shown that with SVD we find the vector x solving

minx

||Ax − b||2

SVD gives an optimal solution in a Least Squares sense


Non-linear methods

Better results can be obtained by using the linear method as aninitial estimate for an iterative process with a different costfunction

Minimising distance from epipolar lines

minF

∑

i

(d2(p′,Fp) + d2(p,Ftp′)),

Minimising distance from reprojected points

minF

∑

i

(

‖pi − QFPFi‖2 + ‖p′

i − Q′

FPFi‖2

)

,


Is Least Squares enough?

Given a set of data we call outliers those points whichdeviate from the distribution followed by the majority of data

For these points the residuals distribution is different fromthe one supposed for the error term

Least Squares assumes the noise is Gaussian

If only one data point does not follow this assumption theestimation is far from the correct value

In our case of point correspondences we can have two typesof outliers due to

bad locationsfalse matches

Methods based on robust statistics do exist.3D Reconstruction and Stereo – p.37/37

introduction to 3d reconstruction and stereo vision -...

Documents