introduction to 3d reconstruction and stereo vision -...
TRANSCRIPT
Introduction to 3D Reconstructionand Stereo Vision
Francesco Isgro
3D Reconstruction and Stereo – p.1/37
What are we going to see today?
Shape from X
Range data
3D sensors
Active triangulation
Stereo visionepipolar constraintfundamental matrixmore and more next classes . . .
3D Reconstruction and Stereo – p.2/37
Range Data
Range images are a special class of digital images
Each pixel expresses the distance of a visible point in thescene from a known reference frame
A range image reproduces the 3D structure of a scene
It is best thought of as a sampled surface
3D Reconstruction and Stereo – p.3/37
3D Reconstruction and Stereo – p.4/37
Representation of range data
Range data can be represented in two basic forms:
xyz form or cloud of pointsa list of 3D coordinates in a given reference frameno specific order is required
rij forma matrix of depth values of points along the directions ofthe xy image axesthe points follow a specific order, given by the xs and ys
3D Reconstruction and Stereo – p.5/37
Range sensors
A range sensor is a device used to acquire range data.Range sensors may measure:
depth at one point only
shape or surface profiles
full surfaces
3D Reconstruction and Stereo – p.6/37
We distinguish between
active range sensors: project energy (light, sonar, pulse) onthe scene and detect its position to perform the measure; orexploit the effects of controlled changes of some sensorparameters (e.g. focus)
passive range sensors: rely only on image intensities toperform the measure
3D Reconstruction and Stereo – p.7/37
ActiveRadar and sonarMoirè interferometryFocusing/defocusingZoomActive triangulationMotion
PassiveStereopsisMotionContoursTextureShading 3D Reconstruction and Stereo – p.8/37
Active triangulation
A beam of light strikes the surface of the scene, and some of thelight bounces toward a sensor (camera).The centre of the images reflection is triangulated against thelaser line of sight.
3D Reconstruction and Stereo – p.9/37
θ b
f x
Z
X
light projector
plane of light
P(X,Y,Z)
camera
X
Y
Z
=b
f cot θ − x
x
y
f
3D Reconstruction and Stereo – p.10/37
The stripe points in the images must be easy to identify. Typicalsolutions
projecting laser light, making the stripe brighter than the restof the image (concavities may create reflections)
projecting a black line onto a matte white or grey object(location may be confused by other dark patches, e.g.shadows)
3D Reconstruction and Stereo – p.11/37
The IMPACT system
3D Reconstruction and Stereo – p.12/37
Stereopsis
Stereo vision refers to the ability to infer information on the3Dstructure and distance of a scene from two (or more) imagestaken from different viewpoints.A stereo system must solve two main problems
Finding correspondences: which parts in the left and rightimages are projections of the same scene element
Reconstruction: determining the 3D structure from thecorrespondences
3D Reconstruction and Stereo – p.13/37
A simple stereo system
P'
Q
P
Ol Or
Il Ir
Q'
pl ql pr qr
(a)
P
OlOr
xl xr
(b)
T
pl pr
f cl
cr
Z
Z =fT
d, d = xr − xl
3D Reconstruction and Stereo – p.14/37
Parameters of a stereo system
Intrinsic parameterscharacterise the mapping of an image point from camerato pixel coordinates in each cameratwo full-rank 3 × 3 matrices A and A′
Extrinsic parametersdescribe the relative position and orientation of the twocamerasA rotation matrix R and a translation vector T
3D Reconstruction and Stereo – p.15/37
The two projection matrices can be written as
Q = A[I;0]
Q′ = A′[R;T].
In the uncalibrated case as
Q = [I;0]
Q′ = [Q′;q′].
3D Reconstruction and Stereo – p.16/37
Homography of a plane
Given a plane τ in space
H : τ −→ π
H′ : τ −→ π′
Hτ : π −→ π′
3D Reconstruction and Stereo – p.17/37
In the case of Euclidean cameras if τ = (n, d)
Hτ = A′
(
R − tnt
d
)
A−1
The homography of the plane at infinity
H∞ = limd,∞
A′
(
R − tnt
d
)
A−1
= A′RA−1
3D Reconstruction and Stereo – p.18/37
Epipolar geometry
’
’
Epipolar line
eO
PEpipolar plane
Epipolar lineO
π
p p’
e’
π
epipolar line is the image of an optical ray
epipole is the image in one camera of the centre of the othercamera
all epipolar lines intersect in the epipole
3D Reconstruction and Stereo – p.19/37
Epipolar geometry
’
’
Epipolar line
eO
PEpipolar plane
Epipolar lineO
π
p p’
e’
π
epipolar line is the image of an optical ray
epipole is the image in one camera of the centre of the othercamera
all epipolar lines intersect in the epipole
3D Reconstruction and Stereo – p.19/37
Fundamental matrix
The relation F associating each point p on π with itscorresponding epipolar line λp is projective linear.The matrix governing this relation
F = [e′]×
Q′Q−1
is called fundamental matrix
3D Reconstruction and Stereo – p.20/37
Given two corresponding points p and p′:the epipolar line λp is
Fp
p′ is on λp thereforeptFtp′ = 0
3D Reconstruction and Stereo – p.21/37
Properties of the fundamental matrix
F is rank-2
it has 7 degrees of freedom: F encodes the geometry of twocameras
F = [e′]×
Hτ
λP′ = Ftp′
Fe = 0 and Fte′ = 0
3D Reconstruction and Stereo – p.22/37
The fundamental matrix in thecalibrated case
F = [A′t]×
A′RA−1 = A′−t [t]×
RA−1.
Introducing the essential matrix
E = [t]×
R
F = A′−tEA−1
3D Reconstruction and Stereo – p.23/37
Estimation of the fundamental matrix
The fundamental matrix can be computed once a set ofcorresponding points (p′
ip′
i) among the images has beendetermined
Linear solution
Least Squares
Non-linear estimation
Robust estimation
3D Reconstruction and Stereo – p.24/37
Linear solution
F is 3 × 3, determined up to a scale factor.We know that for corresponding points
p′ti Fpi = 0
8 correspondences are enough to build a linear system
Af = 0,
of 8 equations in 9 unknownsEach equation row of A is
[xix′
i, yix′
i, x′
i, xiy′
i, yiy′
i, y′
i, xi, Yi, 1]
3D Reconstruction and Stereo – p.25/37
Coordinates of corresponding points are not exact.We use more than 8 points in order to minimise the effect of theerror in the coordinatesWe solve
Af = 0,
where A is n × 9
It is solved by using Singular Value Decomposition of A
3D Reconstruction and Stereo – p.26/37
Singular Value Decomposition
Any m × n matrix A can be written as
A = UWVt
U and V are orthogonal
U is m × n
W is a n × n not-negative diagonal matrix
V is n × n
the entries wj on the diagonal of W are called singularvalues
The columns of V corresponding to wj = 0 generate the nullspace of A
3D Reconstruction and Stereo – p.27/37
Solution by SVD
A must be rank-8 as the null space must be of dimension 1Because of the noise corrupting the coordinates of the points ingeneral A is full rank
Solution is the column vector vj corresponding to the smallest
singular value wj
3D Reconstruction and Stereo – p.28/37
Data normalisation
The linear algorithm performs badly when pixel coordinates areused, because A is badly conditionedPerformance are improved normalising the data in both images:
Translate data so that centroid is the origin
Scale coordinates so that average distance from the origin is√2
In this way the average point is [1, 1, 1]t
3D Reconstruction and Stereo – p.29/37
Correcting F
Computing F linearly the rank-2 condition does not hold.The closest rank-2 matrix F′ can be adjusted using SVD
F = U diag(r, s, t) Vt
Set F′ = U diag(r, s, 0) Vt
3D Reconstruction and Stereo – p.30/37
Linear regression
Let us suppose that the following set of n observations is given
y1 x11 x12 · · · x1p−1 x1p
y2 x21 x22 · · · x2p−1 x2p
...yn xn1 xn2 · · · xnp−1 xnp
.
We assume that, for each i = 1, · · · , n
yi = xi1θ1 + · · · + x1pθp + ei
ei is the error term normally distributed with mean zero and un-
known standard deviation σ3D Reconstruction and Stereo – p.31/37
An estimator tries to determine the vector θ of the modelparametersThe estimated values θ, are called regression coefficientsThe values
yi = xi1θ1 + · · · + xipθp
are called predicted values of yi, and the values
ri = yi − yi
are called residuals
Usually the estimator determines the vector θ which minimise a
function of the residuals.
3D Reconstruction and Stereo – p.32/37
Maximum likelihood estimator
We want to maximise the probability of a set of parameters togenerate the dataAssuming Gaussian noise the probability is
∏
i
exp
(
−1
2
r2
i
σ2
)
∆y
3D Reconstruction and Stereo – p.33/37
Maximising it is equivalent to minimise the negative of itslogarithm
∑
i
r2
i
σ2− n log ∆y
∑
i
r2
i
σ2
This method is called Least Squares
3D Reconstruction and Stereo – p.34/37
Least Squares and SVD
In general to use SVD to solve overdetermined systems
Ax = b
It can be shown that with SVD we find the vector x solving
minx
||Ax − b||2
SVD gives an optimal solution in a Least Squares sense
3D Reconstruction and Stereo – p.35/37
Non-linear methods
Better results can be obtained by using the linear method as aninitial estimate for an iterative process with a different costfunction
Minimising distance from epipolar lines
minF
∑
i
(d2(p′,Fp) + d2(p,Ftp′)),
Minimising distance from reprojected points
minF
∑
i
(
‖pi − QFPFi‖2 + ‖p′
i − Q′
FPFi‖2
)
,
3D Reconstruction and Stereo – p.36/37
Is Least Squares enough?
Given a set of data we call outliers those points whichdeviate from the distribution followed by the majority of data
For these points the residuals distribution is different fromthe one supposed for the error term
Least Squares assumes the noise is Gaussian
If only one data point does not follow this assumption theestimation is far from the correct value
In our case of point correspondences we can have two typesof outliers due to
bad locationsfalse matches
Methods based on robust statistics do exist.3D Reconstruction and Stereo – p.37/37