2461059 computer vision solution manual

110
CHAPTER 1 Cameras PROBLEMS 1.1. Derive the perspective equation projections for a virtual image located at a distance f 0 in front of the pinhole. Solution We write again --→ OP 0 = λ --→ OP but this time impose z 0 = -f 0 (since the image plane is in front of the pinhole and has therefore negative depth). The perspective projection equations become x 0 = -f 0 x z , y 0 = -f 0 y z . Note that the magnification is positive in this case since z is always negative. 1.2. Prove geometrically that the projections of two parallel lines lying in some plane Π appear to converge on a horizon line H formed by the intersection of the image plane with the plane parallel to Π and passing through the pinhole. Solution Let us consider two parallel lines Δ 1 and Δ 2 lying in the plane Π and define Δ 0 as the line passing through the pinhole that is parallel to Δ 1 and Δ 2 . The lines Δ 0 and Δ 1 define a plane Π 1 , and the lines Δ 0 and Δ 2 define a second plane Π 2 . Clearly, Δ 1 and Δ 2 project onto the lines δ 1 and δ 2 where Π 1 and Π 2 intersect the image plane Π 0 . These two lines intersect at the point p 0 where Δ 0 intersects Π 0 . This point is the vanishing point associated with the family of lines parallel to Δ 0 , and the projection of any line in the family appears to converge on it. (This is true even for lines parallel to Δ 0 that do not lie in Π.) Now let us consider two other parallel lines Δ 0 1 and Δ 0 2 in Π and define as before the corresponding line Δ 0 0 and vanishing point p 0 0 . The lines Δ 0 and Δ 0 0 line in a plane parallel to Π that intersects the image plane along a line H passing through p 0 and p 0 0 . This is the horizon line, and any two parallel lines in Π appears to intersect on it. They appear to converge there since any image point above the horizon is associated with a ray issued from the pinhole and pointing away from Π. Horizon points correspond to rays parallel to Π and points in that plane located at an infinite distance from the pinhole. 1.3. Prove the same result algebraically using the perspective projection Eq. (1.1). You can assume for simplicity that the plane Π is orthogonal to the image plane. Solution Let us define the plane Π by y = c and consider a line Δ in this plane with equation ax + bz = d. According to Eq. (1.1), a point on this line projects onto the image point defined by x 0 = f 0 x z = f 0 d - bz az , y 0 = f 0 y z = f 0 c z . 1

Upload: choi-seongjun

Post on 07-Apr-2015

1.720 views

Category:

Documents


15 download

TRANSCRIPT

Page 1: 2461059 Computer Vision Solution Manual

C H A P T E R 1

Cameras

PROBLEMS

1.1. Derive the perspective equation projections for a virtual image located at a distancef ′ in front of the pinhole.

Solution We write again−−→OP ′ = λ

−−→OP but this time impose z′ = −f ′ (since

the image plane is in front of the pinhole and has therefore negative depth). Theperspective projection equations become

x′ = −f ′ xz,

y′ = −f ′ yz.

Note that the magnification is positive in this case since z is always negative.1.2. Prove geometrically that the projections of two parallel lines lying in some plane

Π appear to converge on a horizon line H formed by the intersection of the imageplane with the plane parallel to Π and passing through the pinhole.

Solution Let us consider two parallel lines ∆1 and ∆2 lying in the plane Π anddefine ∆0 as the line passing through the pinhole that is parallel to ∆1 and ∆2.The lines ∆0 and ∆1 define a plane Π1, and the lines ∆0 and ∆2 define a secondplane Π2. Clearly, ∆1 and ∆2 project onto the lines δ1 and δ2 where Π1 and Π2

intersect the image plane Π′. These two lines intersect at the point p0 where ∆0

intersects Π′. This point is the vanishing point associated with the family of linesparallel to ∆0, and the projection of any line in the family appears to converge onit. (This is true even for lines parallel to ∆0 that do not lie in Π.)Now let us consider two other parallel lines ∆′

1 and ∆′2 in Π and define as before

the corresponding line ∆′0 and vanishing point p′0. The lines ∆0 and ∆′

0 line in aplane parallel to Π that intersects the image plane along a line H passing throughp0 and p′0. This is the horizon line, and any two parallel lines in Π appears tointersect on it. They appear to converge there since any image point above thehorizon is associated with a ray issued from the pinhole and pointing away from Π.Horizon points correspond to rays parallel to Π and points in that plane located atan infinite distance from the pinhole.

1.3. Prove the same result algebraically using the perspective projection Eq. (1.1). Youcan assume for simplicity that the plane Π is orthogonal to the image plane.

Solution Let us define the plane Π by y = c and consider a line ∆ in this planewith equation ax + bz = d. According to Eq. (1.1), a point on this line projectsonto the image point defined by

x′ = f ′x

z= f ′

d− bzaz

,

y′ = f ′y

z= f ′

c

z.

1

Page 2: 2461059 Computer Vision Solution Manual

2 Chapter 1 Cameras

This is a parametric representation of the image δ of the line ∆ with z as theparameter. This image is in fact only a half-line since when z → −∞, it stopsat the point (x′, y′) = (−f ′b/a, 0) on the x′ axis of the image plane. This is thevanishing point associated with all parallel lines with slope −b/a in the plane Π.All vanishing points lie on the x′ axis, which is the horizon line in this case.

1.4. Derive the thin lens equation.Hint: consider a ray r0 passing through the point P and construct the rays r1 andr2 obtained respectively by the refraction of r0 by the right boundary of the lensand the refraction of r1 by its left boundary.

Solution Consider a point P located at (negative) depth z and distance y fromthe optical axis, and let r0 denote a ray passing through P and intersecting theoptical axis in P0 at (negative) depth z0 and the lens in Q at a distance h from theoptical axis.

1r 0r’P1

z1

z’0

P

P0

Q

O

r

0r

0P’h y

-y’P’ -z

z’ 0-z

Before constructing the image of P , let us first determine the image P ′0 of P0 on

the optical axis: after refraction at the right circular boundary of the lens, r0 istransformed into a new ray r1 intersecting the optical axis at the point P1 whosedepth z1 verifies, according to (1.5),

1

− z0+

n

z1=n− 1

R.

The ray r1 is immediately refracted at the left boundary of the lens, yielding a newray r′0 that intersects the optical axis in P ′

0. The paraxial refraction equation canbe rewritten in this case as

n

− z1+

1

z′0=

1− n−R ,

and adding these two equation yields:

1

z′0− 1

z0=

1

f, where f =

R

2(n− 1). (1.1)

Let r denote the ray passing through P and the center O of the lens, and let P ′

denote the intersection of r and r0, located at depth z′ and at a distance −y′ of theoptical axis. We have the following relations among the sides of similar triangles:

y

h=z − z0− z0

= (1− z

z0),

− y′h

=z′ − z′0z′0

= −(1− z′

z′0),

y′

z′=y

z.

(1.2)

Page 3: 2461059 Computer Vision Solution Manual

3

Combining Eqs. (1.2) and (1.1) to eliminate h, y, and y′ finally yields

1

z′− 1

z=

1

f.

1.5. Consider a camera equipped with a thin lens, with its image plane at position z′

and the plane of scene points in focus at position z. Now suppose that the imageplane is moved to z′. Show that the diameter of the corresponding blur circle is

d|z′ − z′|z′

,

where d is the lens diameter. Use this result to show that the depth of field (i.e.,the distance between the near and far planes that will keep the diameter of theblur circles below some threshold ε) is given by

D = 2εfz(z + f)d

f2d2 − ε2z2,

and conclude that, for a fixed focal length, the depth of field increases as the lensdiameter decreases, and thus the f number increases.Hint: Solve for the depth z of a point whose image is focused on the image plane atposition z′, considering both the case where z′ is larger than z′ and the case whereit is smaller.

Solution If ε denotes the diameter of the blur circle, using similar triangles im-mediately shows that

ε = d|z′ − z′|z′

.

Now let us assume that z′ > z′. Using the thin lens equation to solve for the depthz of a point focused on the plane z′ yields

z = fzd− εdf + εz

.

By the same token, taking z′ < z′ yields

z = fzd+ ε

df − εz .

Finally taking D to be the difference of these two depths yields

D =2εdfz(z + f)

f2d2 − ε2z2.

Now we can write D = kd/(f2d2 − ε2z2), where k > 0 since z′ = fz/(f + z) > 0.Differentiating D with respect to d for a fixed depth z and focal length f yields

∂D

∂d= −k f2d2 + ε2z2

(f2d2 − ε2z2)2< 0,

which immediately shows that D decreases when d increases, or equivalently, thatD increases with the f number of the lens.

Page 4: 2461059 Computer Vision Solution Manual

4 Chapter 1 Cameras

1.6. Give a geometric construction of the image P ′ of a point P given the two focalpoints F and F ′ of a thin lens.

Solution Let us assume that the point P is off the optical axis of the lens. Drawthe ray r passing through P and F . After being refracted by the lens, r emergesparallel to the optical axis. Now draw the ray r′ passing through P and parallel tothe optical axis. After being refracted by the lens, r′ must pass through F ′. Drawthe two refracted rays. They intersect at the image P ′ of P . For a point P on theoptical axis, just construct the image of a point off-axis with the same depth todetermine the depth of the image of P ′. It is easy to derive the thin lens equationfrom this geometric construction.

1.7. Derive the thick lens equations in the case where both spherical boundaries of thelens have the same radius.

Solution The diagram below will help set up the notation. The thickness of thelens is denoted by t. All distances here are taken positive; if some of the pointschanged side in a different setting, all formulas derived below would still be valid,with possibly negative distance values for the points having changed side.

a

bt

A B

c

C

Let us consider a point A located on the optical axis of the thick lens at a distancea from its right boundary. A ray passing through A is refracted at the boundary,and the secondary ray intersects the optical axis in a point B located a distanceb from the boundary (here B is on the right of the boundary, recall that we takeb > 0, if B were on the left, we would use −b for the distance). The secondaryray is then refracted at the left boundary of the lens, and the ternary ray finallyintersects the optical axis in a point C located at (positive) distance c from the leftlens boundary.Applying the paraxial refraction Eq. (1.5) yields

1

a− n

b=n− 1

R,

n

t+ b+

1

c=n− 1

R.

To establish the thick lens equation, let us first postulate the existence of thefocal and principal points of the lens, and compute their positions. Consider thediagram below. This time A is the right focal point F of the lens, and any raypassing through this point emerges from the lens parallel to its optical axis.

Page 5: 2461059 Computer Vision Solution Manual

5

h

a

h’

d

bt

f

c

VH F B

This corresponds to 1/c = 0 or

b =nR

n− 1− t = nR− (n− 1)t

n− 1.

We can now write

1

a=n

b+n− 1

R=n− 1

R[1 +

nR

nR− (n− 1)t],

or

a =R

n− 1[1− nR

2nR− (n− 1)t].

Now let us assume that the right principal point H of the lens is located on theleft of its right boundary, at a (positive) distance d from it. If h is the distancefrom the optical axis to the point where the first ray enters the lens, and if h′ isthe distance between the optical axis and the emerging ray, using similar trianglesshows that

h′

d+ a=h

a

h′

t+ b=h

b

=⇒ d+ a

t+ b=a

b=⇒ d = t

a

b.

Substituting the values of a and b obtained earlier in this equation shows that

d =Rt

2Rn− (n− 1)t.

The focal length is the distance between H and F and it is thus given by

f = d+ a =nR2

(n− 1)[2Rn− (n− 1)t],

or1

f= 2

n− 1

R− (n− 1)2

n

t

R2.

For these values of a, b, c, d, and f , it is now clear that any ray passing through Femerges parallel to the optical axis, and that the emerging ray can be constructedby pretending that the primary ray goes undeflected until it intersects the principalplane passing through H and perpendicular to the optical axis, where refractionturns it into a secondary ray parallel to the axis.

Page 6: 2461059 Computer Vision Solution Manual

6 Chapter 1 Cameras

An identical argument allows the construction of the left focal and principal planes.By symmetry, these are located at distances a and d from the left boundary, andthe left focal length is the same as the right one.We gave in Ex. 1.6 a geometric construction of the image P ′ of a point P giventhe two focal points F and F ′ of a thin lens. The same procedure can be used for athick lens, except for the fact that the ray going through the points P and F (resp.P ′ and F ′) is “refracted” into a ray parallel to the optical axis when it crossesthe right (resp. left) principal plane instead of the right (resp. left) boundary ofthe lens (Figure 1.11). It follows immediately that the thin lens equation holds forthick lenses as well, i.e.,

1

z′− 1

z=

1

f,

where the origin used to measure z is in the right principal plane instead of at theoptical center, and the origin used to measure z′ is in the left principal plane.

Page 7: 2461059 Computer Vision Solution Manual

C H A P T E R 2

Geometric Camera Models

PROBLEMS

2.1. Write formulas for the matrices ABR when (B) is deduced from (A) via a rotation

of angle θ about the axes iA, jA, and kA respectively.

Solution The expressions for the rotations are obtained by writing the coordi-nates of the vectors iB , jB and kB in the coordinate frame (A). When B isdeduced from A by a rotation of angle θ about the axis kA, we obtain

ABR =

(

cos θ − sin θ 0sin θ cos θ 00 0 1

)

.

Note that ABR is of course the inverse of the matrix BAR given by Eq. (2.4).

When B is deduced from A by a rotation of angle θ about the axis iA, we have

ABR =

(

1 0 00 cos θ − sin θ0 sin θ cos θ

)

.

Finally, when B is deduced from A by a rotation of angle θ about the axis jA, wehave

ABR =

(

cos θ 0 sin θ0 1 0

− sin θ 0 cos θ

)

.

2.2. Show that rotation matrices are characterized by the following properties: (a) theinverse of a rotation matrix is its transpose and (b) its determinant is 1.

Solution Let us first show that the two properties are necessary. Consider therotation matrix

BAR = (BiA

BjA

BkA).

Clearly,

RTR =

BiA · BiA BiA · BjA BiA · BkABjA · BiA BjA · BjA BjA · BkABkA · BiA BkA · BjA BkA · BkA

= Id,

thus RT is the inverse of R. Now Det(R) = (BiA × BjA) · BkA = 1 since thevectors iA, jA, and kA form a right-handed orthonormal basis.Conversely, suppose that the matrixR =

(

u v w)

verifies the propertiesRTR =Id and Det(R) = 1. Then, reversing the argument used in the “necessary” partof the proof shows that the vectors u, v, and w form a right-handed orthonormalbasis of R3. If we write u = BiA, v = BjA, and w = BkA for some right-handedorthonormal basis (B) = (iB , jB ,kB) of E3, it is clear that the dot product of anytwo of the vectors iA, jA, and kA is the same as the dot product of the corre-sponding coordinate vectors, e.g., iA · iA = u ·u = 1, and jA · kA = v ·w = 0. By

7

Page 8: 2461059 Computer Vision Solution Manual

8 Chapter 2 Geometric Camera Models

the same token, (iA × jA) · kA = (u × v) ·w = 1. It follows that the vectors iA,jA, and kA form a right-handed orthonormal basis of E3, and that R is a rotationmatrix.

2.3. Show that the set of matrices associated with rigid transformations and equippedwith the matrix product forms a group.

Solution Since the set of invertible 4×4 matrices equipped with the matrix prod-uct already forms a group, it is sufficient to show that (a) the product of two rigidtransformations is also a rigid transformation, and (b) any rigid transformationadmits an inverse and this inverse is also a rigid transformation. Let us first prove(a) by considering two rigid transformation matrices

T =

(

R t

0T 1

)

T ′ =(

R′ t′

0T 1

)

and their product

T ′′ =(

R′′ t′′

0T 1

)

=

(

RR′ Rt′ + t

0T 1

)

.

The matrix R′′ is a rotation matrix since R′′TR′′ = R′TRTRR′ = R′TR′ = Idand Det(R′′) = Det(R)Det(R′) = 1. Thus T ′′ is a rigid transformation matrix.To prove (b), note that T ′′ = Id when R′ = RT and t′ = −RT t. This shows thatany rigid transformation admits an inverse and that this inverse is given by thesetwo equations.

2.4. Let AT denote the matrix associated with a rigid transformation T in the coordi-nate system (A), with

AT =

(

AR At

0T 1

)

.

Construct the matrix BT associated with T in the coordinate system (B) as afunction of AT and the rigid transformation separating (A) and (B).

Solution Let P ′ = T P be the image of the point P under the mapping T .Rewriting this equation in the frame (A) yields

AP ′ = AT AP,

orABT BP ′ = AT A

BT BP .

In turn, this can be rewritten as

BP ′ = ABT

−1 AT ABT BP = BT BP ,

or, since ABT

−1= B

ATBT = B

AT AT ABT .

It follows that an explicit expression for BT is

BT =

(

BARARA

BR BARARAOB + B

ARAt+ BOA0T 1

)

.

Page 9: 2461059 Computer Vision Solution Manual

9

2.5. Show that if the coordinate system (B) is obtained by applying to the coordinate

system (A) the rigid transformation T , then BP = AT −1AP , where AT denotesthe matrix representing T in the coordinate frame (A).

Solution We write

ABT =

(

AiBAjB

AkBAOB

0 0 0 1

)

= AT(

AiAAjA

AkAAOA

0 0 0 1

)

= AT

1 0 0 00 1 0 00 0 1 00 0 0 1

= AT ,

which proves the desired result.2.6. Show that the rotation of angle θ about the k axis of the frame (F ) can be repre-

sented by

FP ′ = RFP , where R =

(

cos θ − sin θ 0sin θ cos θ 00 0 1

)

.

Solution Let us write FP = (x, y, z)T and FP′= (x′, y′, z′)T . Obviously we

must have z = z′, the angle between the two vectors (x, y) and (x′, y′) must beequal to θ, and the norms of these two vectors must be equal. Note that the vector(x, y) is mapped onto the vector (−y, x) by a 90◦ counterclockwise rotation. Thuswe have

cos θ =xx′ + yy′

x2 + y2√

x′2 + y′2=xx′ + yy′

x2 + y2 ,

sin θ =− yx′ + xy′

x2 + y2√

x′2 + y′2=− yx′ + xy′

x2 + y2 .

Solving this system of linear equations in x′ and y′ immediately yieds x′ = x cos θ−y sin θ and y′ = x sin θ + y cos θ, which proves that we have indeed FP ′ = RFP .

2.7. Show that the change of coordinates associated with a rigid transformation pre-serves distances and angles.

Solution Let us consider a fixed coordinate system and identify points of E3

with their coordinate vectors. Let us also consider three points A, B, and C andtheir images A′, B′, and C′ under the rigid transformation defined by the rotationmatrix R and the translation vector t. The squared distance between A′ and B′

is|B′ −A

′|2 = |R(B −A)|2 = (B −A)TRTR(B −A) = |B −A|2,and it follows that rigid transformations preserve distances. Likewise, if θ′ denotesthe angle between the vectors joining the point A′ to the points B′ and C′, wehave

cos θ′ =(B′ −A

′) · (C′ −A′)

|B′ −A′| |C′ −A

′| =[R(B −A)] · [R(C −A)]

|B −A| |C −A|

=(B −A)TRTR(C −A)

|B −A| |C −A| =(B −A) · (C −A)

|B −A| |C −A| = cos θ,

where θ is the angle between the vectors joining the point A to the points B andC. It follows that rigid transformations also preserve angles (to be rigorous, we

Page 10: 2461059 Computer Vision Solution Manual

10 Chapter 2 Geometric Camera Models

should also show that the sine of θ is also preserved). Note that the translationpart of the rigid transformation is irrelevent in both cases.

2.8. Show that when the camera coordinate system is skewed and the angle θ betweenthe two image axes is not equal to 90 degrees, then Eq. (2.11) transforms into Eq.(2.12).

Solution Let us denote by (u, v) the normalized coordinate system for the imageplane centered in the projection C0 of the optical center (see Figure 2.8). Let usalso denote by (u, v) a skew coordinate centered in C0 with unit basis vectors anda skew angle equal to θ. Overlaying the orthogonal and skew coordinate systemsimmediately reveals that v/v = sin θ and u = u+ v cot θ, or

u = u− v cot θ = x

z− cot θ

y

z

v =1

sin θv =

1

sin θ

y

z.

Taking into account the actual position of the image center and the camera mag-nifications yields

u = αu+ u0 = αx

z− α cot θ

y

z+ u0,

v = βv + v0 =β

sin θ

y

z+ v0,

which is the desired result.

2.9. Let O denote the homogeneous coordinate vector of the optical center of a camerain some reference frame, and letM denote the corresponding perspective projectionmatrix. Show thatMO = 0.

Solution As shown by Eq. (2.15), the most general form of the perspectiveprojection matrix in some world coordinate system (W ) isM = K

(

CWR COW

)

.

Here, COW is the non-homogeneous coordinate vector of the origin of (W ) in thenormalized coordinate system (C) attached to the camera. On the other hand, Ois by definition the homogeneous coordinate vector of the origin of (C)—that is,

the camera’s optical center—in the world coordiante system, so OT = (WOTC , 1).

Thus

MO = K(

CWR COW

)

(

WOC1

)

= K(CWRWOC + COW ) = KCOC = K0 = 0.

2.10. Show that the conditions of Theorem 1 are necessary.

Solution As noted in the chapter itself, according to Eq. 2.15, we have A = KR,thus the determinants of A and K are the same and A is nonsingular. Now,according to Eq. (2.17), we have

a1 = αr1 − α cot θr2 + u0r3,

a2 =β

sin θr2 + v0r3,

a3 = r3,

Page 11: 2461059 Computer Vision Solution Manual

11

where rT1 , rT2 and rT3 denote the row vectors of the rotation matrix R. It follows

that

(a1 × a3) · (a2 × a3) = (−αr2 + α cot θr1) · (β

sin θr1) = αβ cos θ,

thus (a1×a3) · (a2×a3) = 0 implies that cos θ = 0 and the cameras as zero skew.Finally, we have

|a1 × a3|2 − |a2 × a3|2 = | − αr2 + α cot θr1|2 − |β

sin θr1|2

= α2(1 + cot2 θ)− β2

sin2 θ=α2 − β2

sin2 θ.

Thus |a1 × a3|2 = |a2 × a3|2 implies that α2 = β2, i.e., that the camera has unitaspect ratio.

2.11. Show that the conditions of Theorem 1 are sufficient. Note that the statement ofthis theorem is a bit different from the corresponding theorems in Faugeras (1993)and Heyden (1995), where the condition Det(A) 6= 0 is replaced by a3 6= 0. Ofcourse, Det(A) 6= 0 implies a3 6= 0.

Solution We follow here the procedure for the recovery of a camera’s intrinsic andextrinsic parameters given in Section 3.2.1 of the next chapter. The conditions ofTheorem 1 ensure via Eq. (3.13) that this procedure succeeds and yields the correctintrinsic parameters. In particular, when the determinant of M is nonzero, thevectors ai are linearly independent, and their pairwise cross-products are nonzero,ensuring that all terms in Eq. (3.13) are well defined. Adding the condition (a1 ×a3) · (a2 × a3) = 0 gives cos θ a value of zero in the that equation, yielding azero-skew camera. Finally, adding the condition |a1 × a3|2 = |a2 × a3|2 gives thetwo magnifications equal values, corresponding to a unit aspect-ratio.

2.12. If AΠ denotes the homogeneous coordinate vector of a plane Π in the coordinateframe (A), what is the homogeneous coordinate vector BΠ of Π in the frame (B)?

Solution Let ABT denote the matrix representing the change of coordinates be-

tween the frames (B) and (A) such that AP = ABT BP . We have, for any point P

in the plane Π,

0 = AΠT AP = AΠ

T ABT BP = [ABT

TAΠ]T BP = BΠT BP .

Thus BΠ = ABT

T AΠ.

2.13. If AQ denotes the symmetric matrix associated with a quadric surface in the co-ordinate frame (A), what is the symmetric matrix BQ associated with this surfacein the frame (B)?

Solution Let ABT denote the matrix representing the change of coordinates be-

tween the frames (B) and (A) such that AP = ABT BP . We have, for any point P

on the quadric surface,

0 = APT AQAP = (BP

T ABT

T)AQ (ABT BP ) = BP

T BQBP .

Thus BQ = ABT

T AQABT . This matrix is symmetric by construction.

Page 12: 2461059 Computer Vision Solution Manual

12 Chapter 2 Geometric Camera Models

2.14. Prove Theorem 2.

Solution Let us consider an affine projection matrix M =(

A b)

, and let usfirst show that it can be written as a general weak-perspective projection matrixas defined by Eq. (2.20), i.e.,

M =1

zr

(

k s0 1

)

(

R2 t2)

.

Writing the entries of the two matrices explicitly yields(

aT1 b1aT2 b2

)

=1

zr

(

krT1 + srT2 ktx + styrT2 ty

)

.

Assuming that zr is positive, it follows that zr = 1/|a2|, r2 = zra2, ty = zrb2. Inaddition, s = z2r (a1 · a2), so kr1 = zr(a1 − sa2). Assuming that k is positive, weobtain k = zr|a1 − sa2|, and r1 = (zr/k)(a1 − sa2). Finally, tx = (zrb1 − ktx)/s.Picking a negative value for zr and/or k yields a total of four solutions.Let us now show thatM can be written as a paraperspective projection matrix asdefined by Eq. (2.21) with k = 1 and s = 0, i.e.,

M =1

zr

((

1 0 −xr/zr0 1 −yr/zr

)

R t2

)

.

We start by showing that a paraperspective projection matrix can indeed alwaysbe written in this form before showing that any affine projection matrix can bewritten in this form as well.Recall that xr, yr and zr denote the coordinates of the reference point R in thenormalized camera coordinate system. The two elementary projection stages P →P ′ → p can be written in this frame as

(

xyz

)

−→(

x′

y′

zr

)

−→(

uv1

)

=

(

x′/zry′/zr1

)

,

where x′ and y′ are the coordinates of the point P ′. Writing that−→PP ′ is parallel

to−−→OQ or equivalently that

−→PP ′ ×−−→OQ = 0 yields

x′ =1

zr(x− xr

zrz + xr),

y′ =1

zr(y − yr

zrz + yr).

The normalized image coordinates of the point p can thus be written as

(

uv1

)

=1

zr

(

1 0 −xr/zr xr0 1 −yr/zr yr0 0 0 zr

)

xyz1

.

Introducing once again the intrinsic and extrinsic parameters K2, p0, R and t of thecamera and using the fact that zr is independent of the point P gives the generalform of the projection equations, i.e.,

p =1

zr

(

K2 p0

)

(

1 0 −xr/zr xr0 1 −yr/zr yr0 0 0 zr

)

(

R t

0T 1

)(

P

1

)

,

Page 13: 2461059 Computer Vision Solution Manual

13

which can indeed be written as an instance of the general affine projection equation(2.19) with

A =1

zrK2

(

1 0 −xr/zr0 1 −yr/zr

)

R,

b =1

zrK2

[

t2 + (1− tzzr

)

(

xryr

)]

+ p0.

The translation parameters are coupled to the camera intrinsic parameters and theposition of the reference point in the expression of b. As in the weak-perspectivecase, we are free to change the position of the camera relative to the origin of theworld coordinate system to simplify this expression. In particular, we can choosetz = zr and reset the value of t2 to t2−zrK−1

2 p0, so the value of b becomes 1zrK2t2.

Now, observing that the value of A does not change when K2 and (xr, yr, zr) arerespectively replaced by λK2 and λ(xr, yr, zr) allows us to rewrite the projectionmatrixM =

(

A b)

as

M =1

zr

(

k s0 1

)((

1 0 −xr/zr0 1 −yr/zr

)

R t2

)

.

Let us now show that any affine projection matrix can be written in this form. Wecan rewrite explicitly the entries of the two matrices of interest as:

(

aT1 b1aT2 b2

)

=1

zr

rT1 −xrzr

rT3 tx

rT2 −yrzr

rT3 ty

.

Since rT1 , rT2 , and rT3 are the rows of a rotation matrix, these vectors are orthogonal

and have unit norm, and it follows that

1 + λ2 = z2r |a1|2,1 + µ2 = z2r |a2|2,λµ = z2r (a1 · a2),

where λ = xr/zr and µ = yr/zr. Eliminating zr among these equations yields

λµ

1 + λ2 = c1,

λµ

1 + µ2 = c2,

where c1 = (a1 · a2)/|a1|2 and c2 = (a1 · a2)/|a2|2. It follows from the firstequation that µ = c1(1 + λ2)/λ, and substituting in the second equation yields,after some simple algebraic manipulation,

(1− c1c2)λ4 + [1− c1c2 −c2

c1(1 + c21)]λ

2 − c1c2 = 0.

This is a quadratic equation in λ2. Note that c1c2 is the squared cosine of theangle between a1 and a2, so the constant and quadratic term have opposite signs.In particular, our quadratic equation always admits two real roots with opposite

Page 14: 2461059 Computer Vision Solution Manual

14 Chapter 2 Geometric Camera Models

signs. The positive root is the only root of interest, and it yields two oppositevalues for λ. Once λ is known, µ is of course determined uniquely, and so is z2

r . Itfollows that there are four possible solutions for the triple (xr, yr, zr). For each ofthese solutions, we have tx = zrb1 and ty = zrb2.We finally determine r1, r2, and r3 by defining a3 = a1 × a2 and noting thatλr1 + µr2 + r3 = z2ra3. In particular, we can write

zr(

a1 a2 zra3

)

=(

r1 r2 r3

)

(

1 0 λ0 1 µ−λ −µ 1

)

.

Multiplying both sides of this equation on the right by the inverse of the rightmostmatrix yields

(

r1 r2 r3

)

=zr

1 + λ2 + µ2

(

a1 a2 zra3

)

1 + µ2 −λµ −λ−λµ 1 + λ2 −µλ µ 1

,

or

r1 =zr

1 + λ2 + µ2 [(1 + µ2)a1 − λµa2 + λzra3],

r2 =zr

1 + λ2 + µ2 [−λµa1 + (1 + λ2)a2 + µzra3],

r3 =zr

1 + λ2 + µ2 [−λa1 − µa2 + zra3].

2.15. Line Plucker coordinates. The exterior product of two vectors u and v in R4

is defined by

u ∧ vdef=

u1v2 − u2v1u1v3 − u3v1u1v4 − u4v1u2v3 − u3v2u2v4 − u4v2u3v4 − u4v3

.

Given a fixed coordinate system and the (homogeneous) coordinates vectors A andB associated with two points A and B in E3, the vector L = A ∧B is called thevector of Plucker coordinates of the line joining A to B.(a) Let us write L = (L1, L2, L3, L4, L5, L6)

T and denote by O the origin of thecoordinate system and by H its projection onto L. Let us also identify the

vectors−→OA and

−−→OB with their non-homogeneous coordinate vectors. Show

that−→AB = −(L3, L5, L6)

T and−→OA × −−→OB =

−−→OH × −→AB = (L4,−L2, L1)

T .Conclude that the Plucker coordinates of a line obey the quadratic constraintL1L6 − L2L5 + L3L4 = 0.

(b) Show that changing the position of the points A and B along the line L onlychanges the overall scale of the vector L. Conclude that Plucker coordinatesare homogeneous coordinates.

(c) Prove that the following identity holds of any vectors x, y, z, and t in R4:

(x ∧ y) · (z ∧ t) = (x · z)(y · t)− (x · t)(y · z).

Page 15: 2461059 Computer Vision Solution Manual

15

(d) Use this identity to show that the mapping between a line with Plucker co-ordinate vector L and its image l with homogeneous coordinates l can berepresented by

ρl = ML, where M def=

(m2 ∧m3)T

(m3 ∧m1)T

(m1 ∧m2)T

, (2.1)

and mT1 , m

T2 , and mT

3 denote as before the rows ofM and ρ is an appropriatescale factor.

Hint: Consider a line L joining two points A and B and denote by a and bthe projections of these two points, with homogeneous coordinates a and b.Use the fact that the points a and b lie on l, thus if l denote the homogeneouscoordinate vector of this line, we must have l · a = l · b = 0.

(e) Given a line L with Plucker coordinate vector L = (L1, L2, L3, L4, L5, L6)T

and a point P with homogeneous coordinate vector P , show that a necessaryand sufficient condition for P to lie on L is that

LP = 0, where L def=

0 L6 −L5 L4

−L6 0 L3 −L2

L5 −L3 0 L1

−L4 L2 −L1 0

.

(f) Show that a necessary and sufficient condition for the line L to lie in the planeΠ with homogeneous coordinate vector Π is that

L∗Π = 0, where L∗ def=

0 L1 L2 L3

−L1 0 L4 L5

−L2 −L4 0 L6

−L3 −L5 −L6 0

.

Solution

(a) If A = (a1, a2, a3, 1)T and B = (b1, b2, b3, 1)

T , we have

L =

L1

L2

L3

L4

L5

L6

= A ∧B =

a1b2 − a2b1a1b3 − a3b1a1 − b1

a2b3 − a3b2a2 − b2a3 − b3

,

thus−→AB = −(L3, L5, L6)

T and−→OA×−−→OB = (L4,−L2, L1)

T . In addition, wehave −→

OA×−−→OB = (−−→OH +

−−→HA)× (

−−→OH +

−−→HB) =

−−→OH ×−→AB

since−−→HA and

−−→HB are parallel.

Since the vectors−→AB and

−→OA × −−→OB are orthogonal, it follows immediately

that their dot product L1L6 − L2L5 + L3L4 is equal to zero.

(b) Replacing A and B by any other two points C and D on the same line only

scales the Plucker coordinates of this lines since we can always write−−→CD =

−→AB

Page 16: 2461059 Computer Vision Solution Manual

16 Chapter 2 Geometric Camera Models

and, according to (a), this means that C ∧D = λA ∧B. Thus the Pluckercoordinates of a line are uniquely defined up to scale, independently of thechoice of the points chosen to represent this line. In other words, they arehomogeneous coordinates for this line.

(c) Writing

(x ∧ y) · (z ∧ t) = (x1y2 − x2y1)(z1t2 − z2t1) + (x1y3 − x3y1)(z1t3 − z3t1)+(x1y4 − x4y1)(z1t4 − z4t1) + (x2y3 − x3y2)(z2t3 − z3t2)+(x2y4 − x4y2)(z2t4 − z4t2) + (x3y4 − x4y3)(z3t4 − z4t3)= (x1z1 + x2z2 + x3z3 + x4z4)(y1t1 + y2t2 + y3t3 + y4t4)−(x1t1 + x2t2 + x3t3 + x4t4)(y1z1 + y2z2 + y3z3 + y4z4)= (x · z)(y · t)− (x · t)(y · z)

proves the identity.

(d) Let us consider the line L passing through the points A and B with homoge-neous coordinates L, A and B and denote by l the line’s projection and by aand b the images of A and B, with homogeneous coordinate vectors l, a andb. The identity proven in (c) allows to write

ML =

(m2 ∧m3)T

(m3 ∧m1)T

(m1 ∧m2)T

(A ∧B) =

(

(m2 ∧m3) · (A ∧B)(m3 ∧m1) · (A ∧B)(m1 ∧m2) · (A ∧B)

)

=

(

(m2 ·A)(m3 ·B)− (m2 ·B)(m3 ·A)(m3 ·A)(m1 ·B)− (m3 ·B)(m1 ·A)(m1 ·A)(m2 ·B)− (m1 ·B)(m2 ·A)

)

= a× b.

Since l must be orthogonal to a and b we can thus write ρl = ML.

(e) Let us assume that L = A ∧B. Since P T = (−−→OPT , 1) we can write

LP =

0 −−→AB3−→AB2 (

−→OA×−−→OB)1−→

AB3 0 −−→AB1 (−→OA×−−→OB)2

−−→AB2−→AB1 0 (

−→OA×−−→OB)3

−(−→OA×−−→OB)1 −(−→OA×−−→OB)2 −(−→OA×−−→OB)3 0

(−−→OP1

)

=

(−→AB ×−−→OP )1 + (

−→OA×−−→OB)1

(−→AB ×−−→OP )2 + (

−→OA×−−→OB)2

(−→AB ×−−→OP )3 + (

−→OA×−−→OB)3

−(−→OA×−−→OB) · −−→OP

=

( −→AB ×−−→BP

−(−−→OH ×−→AB) · −−→OP

)

.

Thus a necessary and sufficient condition for LP to be equal to the zero

vector is that−→AP be parallel to

−−→BP and that the three points A, B, and P

be collinear, or that P lie on the line L.

(f) Let us write ΠT = (nT ,−d), where n is the unit normal to the plane Π andd is the distance between the origin and the plane. We have

L∗Π =

0 (−→OA×−−→OB)3 −(−→OA×−−→OB)2 −−→AB1

−(−→OA×−−→OB)3 0 (−→OA×−−→OB)1 −−→AB2

(−→OA×−−→OB)2 −(−→OA×−−→OB)1 0 −−→AB3−→

AB1−→AB2

−→AB3 0

(

n

−d

)

=

(

n× (−→OA×−−→OB) + d

−→AB

n · −→AB

)

=

(

n× (−−→OH ×−→AB) + d

−→AB

n · −→AB

)

.

Page 17: 2461059 Computer Vision Solution Manual

17

Now, it is easy to show that a× (b× c) = (a · c)b− (a · b)c for any vectorsa, b, and c of R3. Hence,

L∗Π =

(

(n · −→AB)−−→OH + (d− n · −−→OH)

−→AB

n · −→AB

)

,

and a necessary and sufficient condition for L∗Π to be the zero vector is that

the vector−→AB lie in a plane parallel to Π (condition n ·−→AB = 0), located at a

distance d from the origin (condition (n · −→AB)−−→OH + (d−n · −−→OH)

−→AB = 0 =⇒

d = n · −−→OH), or equivalently, that L lie in the plane Π.

Page 18: 2461059 Computer Vision Solution Manual

C H A P T E R 3

Geometric Camera Calibration

PROBLEMS

3.1. Show that the vector x that minimizes |Ux|2 under the constraint |Vx|2 = 1 isthe (appropriately scaled) generalized eigenvector associated with the minimumgeneralized eigenvalue of the symmetric matrices UTU and VTV.Hint: First show that the minimum sought is reached at x = x0, where x0 is the(unconstrained) minimum of the error E(x) = |Ux|2/|Vx|2 such that |Vx0|2 = 1.(Note that since E(x) is obviously invariant under scale changes, so are its extrema,and we are free to fix the norm of |Vx0|2 arbitrarily. Note also that the minimummust be taken over all values of x such that Vx 6= 0.)

Solution By definition of x0, we have

∀x, Vx 6= 0 =⇒ |Ux|2

|Vx|2≥ |Ux0|2

|Vx0|2= |Ux0|2.

In particular,

∀x, |Vx|2 = 1 =⇒ |Ux|2 ≥ |Ux0|2.

Since, by definition, |Vx0|2 = 1, it follows that x0 is indeed the minimum of theconstrained minimization problem.Let us now compute x0. The gradient of E(x) must vanish at its extrema. Itsvalue is

∇E =1

|Vx|2∂

∂x(xTUTUx)− |Ux|

2

|Vx|4∂

∂x(xTVTVx) = 2

|V(x)|2[UTU− |Ux|

2

|Vx|2VVT ]x,

therefore the minimum x0 of E must be a generalized eigenvector associated of thesymmetric matrices UTU and VTV.Now consider any generalized eigenvector x of the symmetric matrices UTU andVTV and let λ denote the corresponding generalized eigenvalue. The value of Ein x is obviously equal to λ. The generalized eigenvalue λ0 associated with x0 isnecessarily the smallest one since any smaller generalized eigenvalue would yield asmaller value for E. We can pick a vector x0 such that |Vx|2 = 1 as the solutionof our problem since, as mentioned earlier, the value of E is obviously invariantunder scale changes.

3.2. Show that the 2× 2 matrix UTU involved in the line-fitting example from Section3.1.1 is the matrix of second moments of inertia of the points pi (i = 1, . . . , n).

Solution Recall that

U =

(

x1 − x y1 − y. . . . . .

xn − x yn − y

)

.

18

Page 19: 2461059 Computer Vision Solution Manual

19

Thus, omitting the summation indexes for conciseness, we have

UTU =

(∑

(xi − x)2∑

(xi − x)(yi − y)∑

(xi − x)(yi − y)∑

(yi − y)2)

=

(∑

(x2i − 2xxi + x2)

(xiyi − xyi − yxi + xy)∑

(xiyi − xyi − yxi + xy)∑

(y2i − 2yyi + y2)

)

=

(∑

x2i − 2x

xi + nx2 ∑

xiyi − x∑

yi − y∑

xi + nxy∑

xiyi − x∑

yi − y∑

xi + nxy∑

y2i − 2y

yi + ny2

)

=

(∑

x2i − nx2 ∑

xiyi − nxy∑

xiyi − nxy∑

y2i − ny2

)

,

which is, indeed, the matrix of second moments of inertia of the points pi.3.3. Extend the line-fitting method presented in Section 3.1.1 to the problem of fitting

a plane to n points in E3.

Solution We consider n points Pi (i = 1, . . . , n) with coordinates (xi, yi, zi)T in

some fixed coordinate system, and find the plane with equation ax+by+cz−d = 0and unit normal = n = (a, b, c)T that best fits these points. This amounts tominimizing

E(a, b, c, d) =

n∑

i=1

(axi + byi + czi − d)2

with respect to a, b, c, and d under the constraint a2 + b2 + c2 = 1. DifferentiatingE with respect to d shows that, at a minimum of this function, we must have0 = ∂E/∂d = −2

∑ni=1(axi + byi + czi − d), thus

d = ax+ by + cz, where x =1

n

n∑

i=1

xi, y =1

n

n∑

i=1

yi, and z =1

n

n∑

i=1

zi. (3.1)

Substituting this expression for d in the definition of E yields

E =

n∑

i=1

[a(xi − x) + b(yi − y) + c(zi − z)]2 = |Un|2,

where

U =

(

x1 − x y1 − y z1 − z. . . . . . . . .

xn − x yn − y zn − z

)

,

and our original problem finally reduces to minimizing |Un|2 with respect to n

under the constraint |n|2 = 1. We recognize a homogeneous linear least-squaresproblem, whose solution is the unit eigenvector associated with the minimum eigen-value of the 3× 3 matrix UTU . Once a, b, and c have been computed, the value ofd is immediately obtained from Eq. (3.1). Similar to the line-fitting case, we have

UTU =

∑ni=1x

2i − nx2 ∑n

i=1xiyi − nxy∑n

i=1xizi − nxz∑n

i=1xiyi − nxy∑n

i=1y2i − ny2 ∑n

i=1yizi − nyz∑n

i=1xizi − nxz∑n

i=1yizi − nyz∑n

i=1z2i − ny2

that is, the matrix of second moments of inertia of the points Pi.

Page 20: 2461059 Computer Vision Solution Manual

20 Chapter 3 Geometric Camera Calibration

3.4. Derive an expression for the Hessian of the functions f2i−1(ξ) = ui(ξ) − ui andf2i(ξ) = vi(ξ)− vi (i = 1, . . . , n) introduced in Section 3.4.

Solution Recall that the first-order partial derivatives of the components of fare given by:

∂fT2i−1

∂ξ

∂fT2i∂ξ

=1

zi

(

P Ti 0T −uiP T

i

0T P Ti −viP T

i

)

Jm,

We just differentiate these expressions once more to get the Jacobian:

∂2fT2i−1

∂ξj∂ξk

=∂

∂ξk

[1

zi

(

P Ti 0T −uiP

Ti

)∂m

∂ξj

]

=

[

− 1

z2

i

∂zi

∂ξk

(

P Ti 0T −uiP

Ti

)

+1

zi

(

0T 0T −∂ui

∂ξk

P Ti

)]

∂m

∂ξj

+1

zi

(

P Ti 0T −uiP

Ti

) ∂2m

∂ξj∂ξk

=

[

− 1

z2

i

(∂m3

ξk

T

P i)(

P Ti 0T −uiP

Ti

)

+1

zi

(

0T 0T − 1

zi

(

P Ti 0T −uiP

Ti

)∂m

∂ξk

P Ti

)]

∂m

∂ξj

+1

zi

(

P Ti 0T −uiP

Ti

) ∂2m

∂ξj∂ξk

.

Rearranging the terms finally yields

∂2fT2i−1

∂ξj∂ξk

=1

zi

(

P Ti 0T −uiP

Ti

) ∂2m

∂ξj∂ξk

− 1

zi

∂m

∂ξj

T

00T 00T P iPTi

00T 00T 00T

P iPTi 00T −2uiP iP

Ti

∂m

∂ξk

.

The same line of reasoning can be used to show that

∂2fT2i

∂ξj∂ξk

=1

zi

(

P Ti 0T −uiP

Ti

) ∂2m

∂ξj∂ξk

− 1

zi

∂m

∂ξj

T

00T 00T 00T

00T 00T P iPTi

00T P iPTi −2viP iP

Ti

∂m

∂ξk

.

3.5. Euler angles. Show that the rotation obtained by first rotating about the z axis ofsome coordinate frame by an angle α, then rotating about the y axis of the newcoordinate frame by an angle β and finally rotating about the z axis of the resultingframe by an angle γ can be represented in the original coordinate system by

(

cosα cosβ cos γ − sinα sin γ − cosα cosβ sin γ − sinα cos γ cosα sinβsinα cosβ cos γ + cosα sin γ − sinα cosβ sin γ + cosα cos γ sinα sinβ

− sinβ cos γ sinβ sin γ cosβ

)

.

Solution Let us denote by (A), (B), (C), and (D) the consecutive coordinatesystems. If Rx(θ), Ry(θ), and Rz(θ) denotes the rotation matrices about axes x,

Page 21: 2461059 Computer Vision Solution Manual

21

y, and z, we have ABR = Rz(α), BCR = Ry(β), and C

DR = Rz(γ). ThusADR = A

BRBCRC

DR

=

(

cosα − sinα 0sinα cosα 00 0 1

)(

cosβ 0 sinβ0 1 0

− sinβ 0 cosβ

)(

cos γ − sin γ 0sin γ cos γ 00 0 1

)

=

(

cosα cosβ − sinα cosα sinβsinα cosβ cosα sinα sinβ− sinβ 0 cosβ

)(

cos γ − sin γ 0sin γ cos γ 00 0 1

)

=

(

cosα cosβ cos γ − sinα sin γ − cosα cosβ sin γ − sinα cos γ cosα sinβsinα cosβ cos γ + cosα sin γ − sinα cosβ sin γ + cosα cos γ sinα sinβ

− sinβ cos γ sinβ sin γ cosβ

)

.

Now the rotation that maps (A) onto (D) maps a point P with position AP in thecoordinate system (A) onto the point P ′ with the same position DP ′ =A P in the

coordinate system D. We have AP ′ = ADRDP ′ = A

DRAP , which proves the desiredresult.

3.6. The Rodrigues formula. Consider a rotation R of angle θ about the axis u (a unitvector). Show that Rx = cos θx+ sin θu× x+ (1− cos θ)(u · x)u.Hint: A rotation does not change the projection of a vector x onto the direction u

of its axis and applies a planar rotation of angle θ to the projection of x into theplane orthogonal to u.

Solution Let a denote the orthogonal projection of x onto u, b = x− a denoteits orthogonal projection onto the plane perpendicular to u, and c = u × b. Byconstruction c is perpendicular to both u and b, and, according to the property ofrotation matrices mentioned in the hint, we must have Rx = a + cos θb + sin θc.Obviously, we also have

{

a = (u · x)u,b = x− (u · x)u,c = u× x,

and it follows that Rx = cos θx+ sin θu× x+ (1− cos θ)(u · x)u.3.7. Use the Rodrigues formula to show that the matrix R associated with a rotation

of angle θ about the unit vector u = (u, v, w)T

u2(1− c) + c uv(1− c)− ws uw(1− c) + vs

uv(1− c) + ws v2(1− c) + c vw(1− c)− usuw(1− c)− vs vw(1− c) + us w2(1− c) + c

.

where c = cos θ and s = sin θ.

Solution With the notation u = (u, v, w)T , c = cos θ and s = sin θ, the Rodriguesformula is easily rewritten as

Rx =(

cId + s[u×] + (1− c)uuT)

x,

and it follows that

R = c

(

1 0 00 1 00 0 1

)

+ s

(

0 −w vw 0 −u−v u 0

)

+ (1− c)

u2 uv uw

uv v2 vw

uw vw w2

,

and the result follows immediately.

Page 22: 2461059 Computer Vision Solution Manual

22 Chapter 3 Geometric Camera Calibration

3.8. Assuming that the intrinsic parameters of a camera are known, show how to com-pute its extrinsic parameters once the vector n′ defined in Section 3.5 is known.Hint: Use the fact that the rows of a rotation matrix form an orthonormal family.

Solution Recall that the vector n′ = (m11,m12,m14,m21,m22,m24)T can only

be recovered up to scale. With the intrinsic parameters known, this means that wecan write the projection matrix as M =

(

A b)

= ρ(

R t)

, where R and t arethe rotation matrix and translation vector associated with the camera’s extrinsicparameters.Let aT1 and aT2 denote as usual the two rows of the matrix A. Since the rowsof a rotation matrix have unit norm and are orthogonal to each other, we have|a2

1| = |a22| = ρ2 and a1 · a2 = 0. These two constraints can be seen as quadratic

equations in the unknowns m13 and m23, namely

{

m223 −m2

13 = |b1|2 − |b2|2,m13m23 = −b1 · b2,

where b1 = (m11,m12)T and b2 = (m21,m22)

T . Squaring the second equationand substituting the value of m2

23 from the first equation into it yields

m213[m

213 + |b1|2 − |b2|2] = (b1 · b2)

2,

or equivalentlym4

13 + (|b1|2 − |b2|2)m213 − (b1 · b2)

2 = 0.

This is a quadratic equation in m213. Since the constant term and the quadratic

term have opposite signs, it always admits two real solutions with opposite signs.Only the positive one is valid of course, and it yields two opposite solutions form13. The remaining unknown is then determined as m23 = −(b1 · b2)/m13.At this point, there are four valid values for the triple (a1,a2, ρ) since m13 andm23 are determined up to a single sign ambiguity, and the value of ρ is determinedup to a second sign ambiguity by ρ2 = |a1|2. In turn, this determines four validvalues for the rows rT1 and rT2 of R and the coordinates tx and ty of t. For eachof these solutions, the last row of R is computed as r3 = r1 × r2, which gives inturn a3 = ρr3. Finally, an initial value of tz = m14/ρ can be computed usinglinear least squares by setting λ = 1 in Eq. (3.23). The correct solution amongthe four found can be identified by (a) using the sign of tz (when it is known) todiscard obviously incorrect solutions, and (b) picking among the remaining ones thesolution that yields the smallest residual in the least-squares estimation process.

3.9. Assume that n fiducial lines with known Plucker coordinates are observed by acamera.(a) Show that the line projection matrix M introduced in the exercises of chapter

2 can be recovered using linear least squares when n ≥ 9.(b) Show that once M is known, the projection matrixM can also be recovered

using linear least squares.

Hint: Consider the rows mi ofM as the coordinate vectors of three planes Πi

and the rows mi of M as the coordinate vectors of three lines, and use theincidence relationships between these planes and these lines to derive linearconstraints on the vectors mi.

Solution

(a) We saw in Exercise 2.15 that the Plucker coordinate vector of a line ∆ and

Page 23: 2461059 Computer Vision Solution Manual

23

the homogeneous coordinate vector of its image δ are related by

ρδ = M∆, where M def=

(m2 ∧m3)T

(m3 ∧m1)T

(m1 ∧m2)T

.

We can eliminate the unknown scale factor ρ by using the fact that the crossproduct of two parallel vectors is zero, thus δ × M∆ = 0. This linear vectorequation in the components of M is equivalent to two independent scalarequations. Since the 3× 6 matrix M is only defined up to scale, its 17 inde-pendent coefficients can thus be estimated as before via linear least squares(ignoring the non-linear constraints imposed by the fact that the rows of Mare Plucker coordinate vectors) when n ≥ 9.

(b) Once M is known, we can recover M as well through linear least squares.Indeed, the vectors mi (i = 1, 2, 3) can be thought of as the homogeneouscoordinate vectors of three projection planes Πi (see diagram below). Theseplanes intersect at the optical center O of the camera since the homogeneouscoordinate vector of this point satisfies the equationMO = 0. Likewise, it iseasy to show that Π3 is parallel to the image plane, that Π3 and Π1 intersectalong a line L31 parallel to the u = 0 coordinate axis of the image plane, thatΠ2 and Π3 intersect along a line L23 parallel to its v = 0 coordinate axis,and that the line L12 formed by the intersection of Π1 and Π2 is simply theoptical axis.

Π

C

O

L

Π

Π

L

u

v1 3

2

31

23

12L

According to Exercise 2.15, if L12 = m1 ∧m2, we can write that L12 lies inthe plane Π1 as L12m1 = 0, where L12 is the 4×4 matrix associated with thevector L12. Five more homogeneous constraints on the vectorsmi (i = 1, 2, 3)are obtained by appropriate permutations of the indices. In addition, we musthave

ρ12L12m3 = ρ23L23m1 = ρ31L31m2

for some non-zero scalars ρ12, ρ23 and ρ31 since each matrix product in thisequation gives the homogeneous coordinate vector of the optical point. Infact, it is easy to show that the three scale factors can be taken equal to eachother, which yields three homogeneous vector equations in mi (i = 1, 2, 3):

L12m3 = L23m1 = L31m2.

Putting it all together, we obtain 6×3+3×4 = 30 homogeneous (scalar) linearequations in the coefficients of M, whose solution can once again be found

Page 24: 2461059 Computer Vision Solution Manual

24 Chapter 3 Geometric Camera Calibration

via linear least squares (at most 11 of the 30 equations are independent inthe noise-free case). OnceM is known, the intrinsic and extrinsic parameterscan be computed as before. We leave to the reader the task of characterizingthe degenerate line configurations for which the proposed method fails.

Programming Assignments

3.10. Use linear least-squares to fit a plane to n points (xi, yi, zi)T (i = 1, . . . , n) in R3.

3.11. Use linear least-squares to fit a conic section defined by ax2+bxy+cy2+dx+ey+f =0 to n points (xi, yi)

T (i = 1, . . . , n) in R2.3.12. Implement the linear calibration algorithm presented in Section 3.2.3.13. Implement the calibration algorithm that takes into account radial distortion and

that was presented in Section 3.3.3.14. Implement the nonlinear calibration algorithm from Section 3.4.

Page 25: 2461059 Computer Vision Solution Manual

C H A P T E R 4

Radiometry—Measuring Light

PROBLEMS

4.1. How many steradians in a hemisphere?

Solution 2π.4.2. We have proved that radiance does not go down along a straight line in a non-

absorbing medium, which makes it a useful unit. Show that if we were to use powerper square meter of foreshortened area (which is irradiance), the unit must changewith distance along a straight line. How significant is this difference?

Solution Assume we have a source and two receivers that look exactly the samefrom the source. One is large and far away, the other is small and nearby. Becausethey look the same from the source, exactly the same rays leaving the source passthrough each receiver. If our unit is power per square meter of foreshortened area,the amount of power arriving at a receiver is given by integrating this over thearea of the receiver. But the distant one is bigger, and so if the value of powerper square meter of foreshortened area didn’t go down with distance, then it wouldreceive more power than the nearby receiver, which is impossible (how does thesource know which one should get more power?).

4.3. An absorbing medium: Assume that the world is filled with an isotropic absorb-ing medium. A good, simple model of such a medium is obtained by consideringa line along which radiance travels. If the radiance along the line is N at x, it isN − (αdx)N at x+ dx.(a) Write an expression for the radiance transferred from one surface patch to

another in the presence of this medium.(b) Now qualitatively describe the distribution of light in a room filled with this

medium for α small and large positive numbers. The room is a cube, and thelight is a single small patch in the center of the ceiling. Keep in mind that ifα is large and positive, little light actually reaches the walls of the room.

Solution

(a) dNdx = −αN .

(b) Radiance goes down exponentially with distance. Assume the largest distancein the room is d. If α is small enough — much less than 1/d, room looks likeusual. As α gets bigger, interreflections are quenched; for large α only objectsthat view the light directly and are close to the light will be bright.

4.4. Identify common surfaces that are neither Lambertian nor specular using the un-derside of a CD as a working example. There are a variety of important biologicalexamples, which are often blue in color. Give at least two different reasons that itcould be advantageous to an organism to have a non-Lambertian surface.

Solution There are lots. Many possible advantages; for example, an animal thatlooks small to a predator approaching in one direction (because it looks dark fromthis direction) could turn quickly and look big (because it looks bright from thisdirection).

25

Page 26: 2461059 Computer Vision Solution Manual

26 Chapter 4 Radiometry—Measuring Light

4.5. Show that for an ideal diffuse surface the directional hemispheric reflectance is con-stant; now show that if a surface has constant directional hemispheric reflectance,it is ideal diffuse.

Solution In an ideal diffuse surface, the BRDF is constant; DHR is an integralof the BRDF over the outgoing angles, and so must be constant too. The otherdirection is false (sorry!—DAF).

4.6. Show that the BRDF of an ideal specular surface is

ρbd(θo, φo, θi, φi) = ρs(θi){2δ(sin2 θo − sin2 θi)}{δ(φo − φπ)},

where ρs(θi) is the fraction of radiation that leaves.

Solution See the book, Radiosity and Global Illumination, F. Sillion and C.Puech, Morgan-Kauffman, 1994 — this is worked on page 16.

4.7. Why are specularities brighter than diffuse reflection?

Solution Because the reflected light is concentrated in a smaller range of angles.4.8. A surface has constant BRDF. What is the maximum possible value of this constan-

t? Now assume that the surface is known to absorb 20% of the radiation incidenton it (the rest is reflected); what is the value of the BRDF?

Solution 1/π; 0.8/π.4.9. The eye responds to radiance. Explain why Lambertian surfaces are often referred

to as having a brightness independent of viewing angle.

Solution The radiance leaving an ideal diffuse surface is independent of exitangle.

4.10. Show that the solid angle subtended by a sphere of radius ε at a point a distancer away from the center of the sphere is approximately π( εr )

2, for r À ε.

Solution If the sphere is far enough, then the rays from the sphere to the pointare approximately parallel, and the cone of rays intersects the sphere in a circle ofradius ε. This circle points toward the point, and the rest is simple calculation.

Page 27: 2461059 Computer Vision Solution Manual

C H A P T E R 5

Sources, Shadows and Shading

PROBLEMS

5.1. What shapes can the shadow of a sphere take if it is cast on a plane and the sourceis a point source?

Solution Any conic section. The vertex of the cone is the point source; the raystangent to the sphere form a right circular cone, and this cone is sliced by a plane.It’s not possible to get both parts of a hyperbola.

5.2. We have a square area source and a square occluder, both parallel to a plane. Thesource is the same size as the occluder, and they are vertically above one anotherwith their centers aligned.(a) What is the shape of the umbra?(b) What is the shape of the outside boundary of the penumbra?

Solution

(a) Square.

(b) Construct with a drawing, to get an eight-sided polygon.

5.3. We have a square area source and a square occluder, both parallel to a plane. Theedge length of the source is now twice that of the occluder, and they are verticallyabove one another with their centers aligned.(a) What is the shape of the umbra?(b) What is the shape of the outside boundary of the penumbra?

Solution

(a) Depends how far the source is above the occluder; it could be absent, orsquare.

(b) Construct with a drawing, to get an eight sided polygon.

5.4. We have a square area source and a square occluder, both parallel to a plane. Theedge length of the source is now half that of the occluder, and they are verticallyabove one another with their centers aligned.(a) What is the shape of the umbra?(b) What is the shape of the outside boundary of the penumbra?

Solution (a) Square. (b) Construct with a drawing, to get an eight sided poly-gon.

5.5. A small sphere casts a shadow on a larger sphere. Describe the possible shadowboundaries that occur.

Solution Very complex, given by the intersection of a right circular cone and asphere. In the simplest case, the two centers are aligned with the point source, andthe shadow is a circle.

5.6. Explain why it is difficult to use shadow boundaries to infer shape, particularly ifthe shadow is cast onto a curved surface.

27

Page 28: 2461059 Computer Vision Solution Manual

28 Chapter 5 Sources, Shadows and Shading

Solution As the example above suggests, the boundary is a complicated 3D curve.Typically, we see this curve projected into an image. This means it is very diffi-cult to infer anything except in quite special cases (e.g. projection of the shadowboundary onto a plane).

5.7. An infinitesimal patch views a circular area source of constant exitance frontallyalong the axis of symmetry of the source. Compute the radiosity of the patch dueto the source exitance E(u) as a function of the area of the source and the distancebetween the center of the source and the patch. You may have to look the integralup in tables — if you don’t, you’re entitled to feel pleased with yourself — but thisis one of few cases that can be done in closed form. It is easier to look up if youtransform it to get rid of the cosine terms.

Solution Radiosity is proportional to A/(A + πd2), where A is the area of thesource and d is the distance.

5.8. As in Figure 5.17, a small patch views an infinite plane at unit distance. The patchis sufficiently small that it reflects a trivial quantity of light onto the plane. Theplane has radiosity B(x, y) = 1 + sin ax. The patch and the plane are parallel toone another. We move the patch around parallel to the plane, and consider itsradiosity at various points.(a) Show that if one translates the patch, its radiosity varies periodically with its

position in x.(b) Fix the patch’s center at (0, 0); determine a closed form expression for the

radiosity of the patch at this point as a function of a. You’ll need a tableof integrals for this (if you do not, you are entitled to feel very pleased withyourself).

Solution

(a) Obvious symmetry in the geometry.

(b) This integral is in “Shading primitives”, J. Haddon and D.A. Forsyth, Proc.Int. Conf. Computer Vision, 1997.

5.9. If one looks across a large bay in the daytime, it is often hard to distinguish themountains on the opposite side; near sunset, they are clearly visible. This phe-nomenon has to do with scattering of light by air — a large volume of air is actuallya source. Explain what is happening. We have modeled air as a vacuum and assert-ed that no energy is lost along a straight line in a vacuum. Use your explanationto give an estimate of the kind of scales over which that model is acceptable.

Solution In the day, the air between you and the other side is illuminated by thesun; some light scatters toward your eye. This has the effect of reducing contrast,meaning that the other side of the bay is hard to see because it’s about as brightas the air. By evening, the air is less strongly illuminated and the contrast goesup. This suggests that assuming air doesn’t interact with light is probably dubiousat scales of multiple kilometers (considerably less close to a city).

5.10. Read the book Colour and Light in Nature, by Lynch and Livingstone, publishedby Cambridge University Press, 1995.

Programming Assignments

5.11. An area source can be approximated as a grid of point sources. The weakness ofthis approximation is that the penumbra contains quantization errors, which canbe quite offensive to the eye.(a) Explain.

Page 29: 2461059 Computer Vision Solution Manual

29

(b) Render this effect for a square source and a single occluder casting a shadowonto an infinite plane. For a fixed geometry, you should find that as the numberof point sources goes up, the quantization error goes down.

(c) This approximation has the unpleasant property that it is possible to producearbitrarily large quantization errors with any finite grid by changing the ge-ometry. This is because there are configurations of source and occluder thatproduce large penumbrae. Use a square source and a single occluder, castinga shadow onto an infinite plane, to explain this effect.

5.12. Make a world of black objects and another of white objects (paper, glue and spray-paint are useful here) and observe the effects of interreflections. Can you come upwith a criterion that reliably tells, from an image, which is which? (If you can,publish it; the problem looks easy, but isn’t).

5.13. (This exercise requires some knowledge of numerical analysis.) Do the numericalintegrals required to reproduce Figure 5.17. These integrals aren’t particularly easy:If one uses coordinates on the infinite plane, the size of the domain is a nuisance;if one converts to coordinates on the view hemisphere of the patch, the frequencyof the radiance becomes infinite at the boundary of the hemisphere. The best wayto estimate these integrals is using a Monte Carlo method on the hemisphere. Youshould use importance sampling because the boundary contributes rather less tothe integral than the top.

5.14. Set up and solve the linear equations for an interreflection solution for the interiorof a cube with a small square source in the center of the ceiling.

5.15. Implement a photometric stereo system.(a) How accurate are its measurements (i.e., how well do they compare with known

shape information)? Do interreflections affect the accuracy?(b) How repeatable are its measurements (i.e., if you obtain another set of images,

perhaps under different illuminants, and recover shape from those, how doesthe new shape compare with the old)?

(c) Compare the minimization approach to reconstruction with the integrationapproach; which is more accurate or more repeatable and why? Does thisdifference appear in experiment?

(d) One possible way to improve the integration approach is to obtain depths byintegrating over many different paths and then average these depths (you needto be a little careful about constants here). Does this improve the accuracy orrepeatability of the method?

Page 30: 2461059 Computer Vision Solution Manual

C H A P T E R 6

Color

PROBLEMS

6.1. Sit down with a friend and a packet of colored papers, and compare the color namesthat you use. You need a large packet of papers — one can very often get collectionsof colored swatches for paint, or for the Pantone color system very cheaply. Thebest names to try are basic color names — the terms red, pink, orange, yellow,green, blue, purple, brown, white, gray and black, which (with a small number ofother terms) have remarkable canonical properties that apply widely across differentlanguages (the papers in ?) give a good summary of current thought on this issue).You will find it surprisingly easy to disagree on which colors should be called blueand which green, for example.

Solution Students should do the experiment; there’s no right answer, but if twopeople agree on all color names for all papers with a large range of colour, somethingfunny is going on.

6.2. Derive the equations for transforming from RGB to CIE XYZ and back. This is alinear transformation. It is sufficient to write out the expressions for the elementsof the linear transformation — you don’t have to look up the actual numericalvalues of the color matching functions.

Solution Write the RGB primaries as pr(λ), pg(λ), pb(λ). If a colour has RGBcoords. (a, b, c), that means it matches apr(λ)+bpg(λ)+cpb(λ). What are the XYZcoord.’s (d, e, f) of this colour? we compute them with the XYZ colour matchingfunctions x(λ), y(λ) and z(λ), to get

(

def

)

=

x(λ)pr(λ)dλ∫

x(λ)pg(λ)dλ∫

x(λ)pb(λ)dλ∫

y(λ)pr(λ)dλ∫

y(λ)pg(λ)dλ∫

y(λ)pb(λ)dλ∫

z(λ)pr(λ)dλ∫

z(λ)pg(λ)dλ∫

z(λ)pb(λ)dλ

(

abc

)

.

6.3. Linear color spaces are obtained by choosing primaries and then constructing colormatching functions for those primaries. Show that there is a linear transformationthat takes the coordinates of a color in one linear color space to those in another;the easiest way to do this is to write out the transformation in terms of the colormatching functions.

Solution Look at the previous answer.6.4. Exercise 6.3 means that, in setting up a linear color space, it is possible to choose

primaries arbitrarily, but there are constraints on the choice of color matchingfunctions. Why? What are these constraints?

Solution Assume I have some linear color space and know its color matchingfunctions. Then the color matching functions for any other linear color space area linear combination of the color matching functions for this space. Arbitraryfunctions don’t have this property, so we can’t choose color matching functionsarbitrarily.

30

Page 31: 2461059 Computer Vision Solution Manual

31

6.5. Two surfaces that have the same color under one light and different colors underanother are often referred to as metamers. An optimal color is a spectral reflectanceor radiance that has value 0 at some wavelengths and 1 at others. Although op-timal colors don’t occur in practice, they are a useful device (due to Ostwald) forexplaining various effects.(a) Use optimal colors to explain how metamerism occurs.(b) Given a particular spectral albedo, show that there are an infinite number of

metameric spectral albedoes.(c) Use optimal colors to construct an example of surfaces that look different under

one light (say, red and green) and the same under another.(d) Use optimal colors to construct an example of surfaces that swop apparent

color when the light is changed (i.e., surface one looks red and surface twolooks green under light one, and surface one looks green and surface two looksred under light two).

Solution

(a) See Figure 6.1.

(b) You can either do this graphically by extending the reasoning of Figure 6.1,or analytically.

(c,d) This follows directly from (a) and (b).

6.6. You have to map the gamut for a printer to that of a monitor. There are colorsin each gamut that do not appear in the other. Given a monitor color that can’tbe reproduced exactly, you could choose the printer color that is closest. Why isthis a bad idea for reproducing images? Would it work for reproducing “businessgraphics” (bar charts, pie charts, and the like, which all consist of many differerntlarge blocks of a single color)?

Solution Some regions that, in the original picture had smooth gradients willnow have a constant color. Yes.

6.7. Volume color is a phenomenon associated with translucent materials that are col-ored — the most attractive example is a glass of wine. The coloring comes fromdifferent absorption coefficients at different wavelengths. Explain (a) why a smallglass of sufficiently deeply colored red wine (a good Cahors or Gigondas) looksblack (b) why a big glass of lightly colored red wine also looks black. Experimentalwork is optional.

Solution Absorption is exponential with distance; if the rate of absorption issufficiently large for each wavelength, within a relatively short distance all will beabsorbed and the wine will look black. If the coefficients are smaller, you need alarger glass to get the light absorbed.

6.8. (This exercise requires some knowledge of numerical analysis.) In Section 6.5.2,we set up the problem of recovering the log albedo for a set of surfaces as one ofminimizing

| Mxl− p |2 + | Myl− q |2,

whereMx forms the x derivative of l andMy forms the y derivative (i.e.,Mxl isthe x-derivative).(a) We asserted that Mx and My existed. Use the expression for forward dif-

ferences (or central differences, or any other difference approximation to thederivative) to form these matrices. Almost every element is zero.

Page 32: 2461059 Computer Vision Solution Manual

32 Chapter 6 Color

350 400 450 500 550 600 650 700 750 800 850-0.5

0

0.5

1

1.5

2

2.5

3

3.5

r

g

b

wavelength in nm

FIGURE 6.1: The figure shows the RGB colormatching functions. The reflectance

given by the two narrow bars is metameric to that given by the single, slightly

thicker bar under uniform illumination, because under uniform illumination either

reflectance will cause no response in B and about the same response in R and G.

However, if the illuminant has high energy at about the center wavelength of the

thicker bar and no energy elsewhere, the surface with this reflectance will look the

same as it does under a uniform illuminant but the other one will be dark. It’s worth

trying to do a few other examples with this sort of graphical reasoning because it will

give you a more visceral sense of what is going on than mere algebraic manipulation.

(b) The minimization problem can be written in the form

choose l to minimize (Al+ b)T (Al+ b).

Determine the values of A and b, and show how to solve this general problem.You will need to keep in mind that A does not have full rank, so you can’t goinverting it.

Solution

(a) Straightforward detail.

(b) The difficulty is the constant of integration. The problem is

choose l to minimize(Al+ b)T (Al+ b)

or, equivalently,

choose l so that (ATA)l = −AT b,

Page 33: 2461059 Computer Vision Solution Manual

33

(which is guaranteed to have at least one solution). Actually, it must have atleast a 2D space of solutions; one chooses an element of this space such that(say) the sum of values is a constant.

6.9. In Section 6.5.2, we mentioned two assumptions that would yield a constant ofintegration.(a) Show how to use these assumptions to recover an albedo map.(b) For each assumption, describe a situation where it fails, and describe the nature

of the failure. Your examples should work for cases where there are manydifferent albedoes in view.

Solution Assuming the brightest patch is white yields an albedo map because weknow the difference of the logs of albedos for every pair of albedos; so we can setthe brightest patch to have the value 1, and recover all others from these knownratios. Assuming the average lightness is a fixed constant is equally easy — if wechoose some patches albedo to be an unknown constant c, each other albedo willbe some known factor times c, so the average lightness will be some known term (aspatial average of the factors) times c. But this is assumed to be a known constant,so we get c. If the lightest albedo in view is not white, everything will be reportedas being lighter than it actually is. If the spatial average albedo is different fromthe constant it is assumed to be, everything will be lighter or darker by the ratioof the constant to the actual vallue.

6.10. Read the book Colour: Art and Science, by Lamb and Bourriau, Cambridge Uni-versity Press, 1995.

Programming Assignments

6.11. Spectra for illuminants and for surfaces are available on the web(try http://www.it.lut.fi/research/color/lutcs database.html). Fit a finite-dimensional linear model to a set of illuminants and surface reflectances using prin-cipal components analysis, render the resulting models, and compare your renderingwith an exact rendering. Where do you get the most significant errors? Why?

6.12. Print a colored image on a color inkjet printer using different papers and comparethe result. It is particularly informative to (a) ensure that the driver knows whatpaper the printer will be printing on, and compare the variations in colors (which areideally imperceptible), and (b) deceive the driver about what paper it is printing on(i.e., print on plain paper and tell the driver it is printing on photographic paper).Can you explain the variations you see? Why is photographic paper glossy?

6.13. Fitting a finite-dimensional linear model to illuminants and reflectances separatelyis somewhat ill-advised because there is no guarantee that the interactions will berepresented well (they’re not accounted for in the fitting error). It turns out thatone can obtain gijk by a fitting process that sidesteps the use of basis functions.Implement this procedure (which is described in detail in ?)), and compare theresults with those obtained from the previous assignment.

6.14. Build a color constancy algorithm that uses the assumption that the spatial averageof reflectance is constant. Use finite-dimensional linear models. You can get valuesof gijk from your solution to Exercise 3.

6.15. We ignore color interreflections in our surface color model. Do an experiment to getsome idea of the size of color shifts possible from color interreflections (which areastonishingly big). Humans seldom interpret color interreflections as surface color.Speculate as to why this might be the case, using the discussion of the lightnessalgorithm as a guide.

6.16. Build a specularity finder along the lines described in Section 6.4.3

Page 34: 2461059 Computer Vision Solution Manual

C H A P T E R 7

Linear Filters

PROBLEMS

7.1. Show that forming unweighted local averages, which yields an operation of the form

Rij =1

(2k + 1)2

u=i+k∑

u=i−k

v=j+k∑

v=j−k

Fuv

is a convolution. What is the kernel of this convolution?

Solution Get this by pattern matching between formulae.

7.2. Write E0 for an image that consists of all zeros with a single one at the center.Show that convolving this image with the kernel

Hij =1

2πσ2exp

(

− ((i− k − 1)2 + (j − k − 1)2)

2σ2

)

(which is a discretised Gaussian) yields a circularly symmetric fuzzy blob.

Solution Convolving this image with any kernel reproduces the kernel; the cur-rent kernel is a circularly symmetric fuzzy blob.

7.3. Show that convolving an image with a discrete, separable 2D filter kernel is equiv-alent to convolving with two 1D filter kernels. Estimate the number of operationssaved for an NxN image and a 2k + 1× 2k + 1 kernel.

Solution

7.4. Show that convolving a function with a δ function simply reproduces the originalfunction. Now show that convolving a function with a shifted δ function shifts thefunction.

Solution Simple index jockeying.

7.5. We said that convolving the image with a kernel of the form (sinx sin y)/(xy) isimpossible because this function has infinite support. Why would it be impossibleto Fourier transform the image, multiply the Fourier transform by a box function,and then inverse-Fourier transform the result? Hint: Think support.

Solution You’d need an infinite image.

7.6. Aliasing takes high spatial frequencies to low spatial frequencies. Explain why thefollowing effects occur:(a) In old cowboy films that show wagons moving, the wheel often seems to be

stationary or moving in the wrong direction (i.e., the wagon moves from leftto right and the wheel seems to be turning counterclockwise).

(b) White shirts with thin dark pinstripes often generate a shimmering array ofcolors on television.

(c) In ray-traced pictures, soft shadows generated by area sources look blocky.

34

Page 35: 2461059 Computer Vision Solution Manual

35

Solution

(a) The wheel has a symmetry, and has rotated just enough to look like itself,and so is stationary. It moves the wrong way when it rotates just too littleto look like itself.

(b) Typically, color images are obtained by using three different sites on the sameimaging grid, each sensitive to a different range of wavelengths. If the bluesite in the camera sees on a stripe and the nearby red and green sites see onthe shirt, the pixel reports yellow; but a small movement may mean (say) thegreen sees the stripe and the red and blue see the shirt, and we get purple.

(c) The source has been subdivided into a grid with point sources at the vertices;each block boundary occurs when one of these elements disappears behind,or reappears from behind, an occluder.

Programming Assignments

7.7. One way to obtain a Gaussian kernel is to convolve a constant kernel with itselfmany times. Compare this strategy with evaluating a Gaussian kernel.(a) How many repeated convolutions do you need to get a reasonable approxima-

tion? (You need to establish what a reasonable approximation is; you mightplot the quality of the approximation against the number of repeated convo-lutions).

(b) Are there any benefits that can be obtained like this? (Hint: Not every com-puter comes with an FPU.)

7.8. Write a program that produces a Gaussian pyramid from an image.7.9. A sampled Gaussian kernel must alias because the kernel contains components at

arbitrarily high spatial frequencies. Assume that the kernel is sampled on an infinitegrid. As the standard deviation gets smaller, the aliased energy must increase. Plotthe energy that aliases against the standard deviation of the Gaussian kernel inpixels. Now assume that the Gaussian kernel is given on a 7x7 grid. If the aliasedenergy must be of the same order of magnitude as the error due to truncating theGaussian, what is the smallest standard deviation that can be expressed on thisgrid?

Page 36: 2461059 Computer Vision Solution Manual

C H A P T E R 8

Edge Detection

PROBLEMS

8.1. Each pixel value in 500×500 pixel image I is an independent, normally distributedrandom variable with zero mean and standard deviation one. Estimate the numberof pixels that, where the absolute value of the x derivative, estimated by forwarddifferences (i.e., |Ii+1,j − Ii,j|, is greater than 3.

Solution The signed difference has mean 0 and standard deviation√2. There

are 500 rows and 499 differences per row, so a total of 500 × 499 differences. Theprobability that the absolute value of a difference is larger than 3 is

P (diff > 3) =

∫ ∞

3

1√2√2πe(−x

2/4)dx+

∫ −∞

−3

1√2√2πe(−x

2/4)dx

and the answer is 500 ∗ 499 ∗ P (diff > 3).P (diff > 3) can be obtained from tables for the complementary error function,defined by

erfc(x) =2√π

∫ ∞

x

e−u2

du.

Notice that1√2πσ

∫ ∞

a

e− u2

2σ2 du =1√π

∫ ∞

a√2σ

e−w2

dw

by a change of variables in the integral, so that

P (diff > 3) = erfc(3

2)

which can be looked up in tables.8.2. Each pixel value in 500×500 pixel image I is an independent, normally distributed

random variable with zero mean and standard deviation one. I is convolved withthe 2k + 1× 2k + 1 kernel G. What is the covariance of pixel values in the result?There are two ways to do this; on a case-by-case basis (e.g. at points that are greaterthan 2k+1 apart in either the x or y direction, the values are clearly independent)or in one fell swoop. Don’t worry about the pixel values at the boundary.

Solution The value of each pixel in the result is a weighted sum of pixels fromthe input. Each pixel in the input is independent. For two pixels in the outputto have non-zero covariance, they must share some elements in their sum. Thecovariance of two pixels with shared elements is the expected value of a product ofsums, that is

Rij =∑

lm

GlmIi−l,j−m

andRuv =

st

GstIu−s,v−t.

36

Page 37: 2461059 Computer Vision Solution Manual

37

Now some elements of these sums are shared, and it the shared values that producecovariance. In particular, the shared terms occur when i−l = u−s and j−m = v−t.The covariance will be the variance times the weights with which these shared termsappear. Hence

E(RijRuv) =∑

i−l=u−s,j−m=v−t

GlmGst.

8.3. We have a camera that can produce output values that are integers in the rangefrom 0 to 255. Its spatial resolution is 1024 by 768 pixels, and it produces 30 framesa second. We point it at a scene that, in the absence of noise, would produce theconstant value 128. The output of the camera is subject to noise that we model aszero mean stationary additive Gaussian noise with a standard deviation of 1. Howlong must we wait before the noise model predicts that we should see a pixel witha negative value? (Hint: You may find it helpful to use logarithms to compute theanswer as a straightforward evaluation of exp(−1282/2) will yield 0; the trick is toget the large positive and large negative logarithms to cancel.)

Solution The hint is unhelpful; DAF apologizes. Most important issue here isP (value of noise < −128). This is

1√2π

∫ −128

−∞

e(−x2/2)dx,

which can be looked up in tables for the complementary error function, as above.There are 30× 1024× 768 samples per second, each of which has probability

P (value of noise < −128) = p

of having negative value. The probability of obtaining a run of samples that is Nlong, and contains no negative value, is (1 − p)N . Assume we would like a runthat has a 0.9 probability of having a negative value in it; it must have at leastlog(0.9)/log(1− p) samples in it.

8.4. We said a sensible 2D analogue to the 1D second derivative must be rotationallyinvariant in Section 8.3.1. Why is this true?

Solution This depends on whether we are looking for directed or undirectededges. If we look for maxima of the magnitude of the gradient, this says nothingabout the direction of the edge — we have to look at the gradient magnitude for this— and so we can mark edge points by looking at local maxima without worryingabout the direction of the edge. To do this with a second derivative operator, weneed one that will be zero whatever the orientation of the edge; i.e. rotating theoperator will not affect the response. This means it must be rotationally invariant.

Programming Assignments

8.5. Why is it necessary to check that the gradient magnitude is large at zero crossingsof the Laplacian of an image? Demonstrate a series of edges for which this test issignificant.

8.6. The Laplacian of a Gaussian looks similar to the difference between two Gaussiansat different scales. Compare these two kernels for various values of the two scales.Which choices give a good approximation? How significant is the approximationerror in edge finding using a zero-crossing approach?

Page 38: 2461059 Computer Vision Solution Manual

38 Chapter 8 Edge Detection

8.7. Obtain an implementation of Canny’s edge detector (you could try the vision homepage; MATLAB has an implementation in the image processing toolbox, too) andmake a series of images indicating the effects of scale and contrast thresholds onthe edges that are detected. How easy is it to set up the edge detector to markonly object boundaries? Can you think of applications where this would be easy?

8.8. It is quite easy to defeat hysteresis in edge detectors that implement it — essentially,one sets the lower and higher thresholds to have the same value. Use this trick tocompare the behavior of an edge detector with and without hysteresis. There area variety of issues to look at:(a) What are you trying to do with the edge detector output? It is sometimes

helpful to have linked chains of edge points. Does hysteresis help significantlyhere?

(b) Noise suppression: We often wish to force edge detectors to ignore some edgepoints and mark others. One diagnostic that an edge is useful is high contrast(it is by no means reliable). How reliably can you use hysteresis to suppresslow-contrast edges without breaking high-contrast edges?

Page 39: 2461059 Computer Vision Solution Manual

C H A P T E R 9

Texture

PROBLEMS

9.1. Show that a circle appears as an ellipse in an orthographic view, and that the minoraxis of this ellipse is the tilt direction. What is the aspect ratio of this ellipse?

Solution The circle lies on a plane. An orthographic view of the plane is obtainedby projecting along some family of parallel rays onto another plane. Now on theimage plane there will be some direction that is parallel to the object plane — callthis T . Choose another direction on the image plane that is perpendicular to thisone, and call it B. Now I can rotate the coordinate system on the object planewithout problems (it’s a circle!) so I rotate it so that the x direction is parallelto T . The y-coordinate projects onto the B direction (because the image planeis rotated about T with respect to the object plane) but is foreshortened. Thismeans that the point (x, y) in the object plane projects to the point (x, αy) in theT , B coordinate system on the image plane (0 ≤ α ≤ 1 is a constant to do withthe relative orientation of the planes). This means that the curve (cos θ, sin θ) onthe object plane goes to (cos θ, α sin θ) on the image plane, which is an ellipse.

9.2. We will study measuring the orientation of a plane in an orthographic view, giventhe texture consists of points laid down by a homogenous Poisson point process.Recall that one way to generate points according to such a process is to samplethe x and y coordinate of the point uniformly and at random. We assume that thepoints from our process lie within a unit square.(a) Show that the probability that a point will land in a particular set is propor-

tional to the area of that set.(b) Assume we partition the area into disjoint sets. Show that the number of

points in each set has a multinomial probability distribution.We will now use these observations to recover the orientation of the plane. Wepartition the image texture into a collection of disjoint sets.(c) Show that the area of each set, backprojected onto the textured plane, is a

function of the orientation of the plane.(d) Use this function to suggest a method for obtaining the plane’s orientation.

Solution The answer to (d) is no. The rest is straightforward.

Programming Assignments

9.3. Texture synthesis: Implement the non-parametric texture synthesis algorithmof Section 9.3.2. Use your implementation to study:(a) the effect of window size on the synthesized texture;(b) the effect of window shape on the synthesized texture;(c) the effect of the matching criterion on the synthesized texture (i.e., using

weighted sum of squares instead of sum of squares, etc.).

9.4. Texture representation: Implement a texture classifier that can distinguish be-tween at least six types of texture; use the scale selection mechanism of Section9.1.2, and compute statistics of filter outputs. We recommend that you use atleast the mean and covariance of the outputs of about six oriented bar filters and

39

Page 40: 2461059 Computer Vision Solution Manual

40 Chapter 9 Texture

a spot filter. You may need to read up on classification in chapter 22; use a simpleclassifier (nearest neighbor using Mahalanobis distance should do the trick).

Page 41: 2461059 Computer Vision Solution Manual

C H A P T E R 10

The Geometry of Multiple Views

PROBLEMS

10.1. Show that one of the singular values of an essential matrix is 0 and the other twoare equal. (Huang and Faugeras [1989] have shown that the converse is also true—that is, any 3×3 matrix with one singular value equal to 0 and the other two equalto each other is an essential matrix.)Hint: The singular values of E are the eigenvalues of EET .Solution We have E = [t×]R, thus EET = [t×][t×]

T = [t×]T [t×]. If a is an

eigenvector of EET associated with the eigenvalue λ then, for any vector b

λb · a = bT ([t×]

T [t×]a) = (t× b) · (t× a).

Choosing a = b = t shows that λ = 0 is an eigenvalue of EET . Choosing b = t

shows that if λ 6= 0 then a is orthogonal to t. But then choosing a = b shows that

λ|a|2 = |t× a|2 = |t|2|a|2.

It follows that all non-zero singular values of E must be equal. Note that thesingular values of E cannot all be zero since this matrix has rank 2.

10.2. Exponential representation of rotation matrices. The matrix associated with therotation whose axis is the unit vector a and whose angle is θ can be shown to be

equal to eθ[a×] def=∑+∞

i=01i! (θ[a×])

i. Use this representation to derive Eq. (10.3).

Solution Let us consider a small motion with translational velocity v and ro-tational velocity ω. If the two camera frames are separated by the small timeinterval δt, the translation separating them is obviously (to first order) t = δtv.The corresponding rotation is a rotation of angle δt|ω| about the axis (1/|ω|)ω,i.e.,

R = eδt[ω×] =

+∞∑

i=0

1

i!(δt[ω×])

i = Id + δt [ω×] + higher-order terms.

Neglecting all terms of order two or higher yields Eq. (10.3).10.3. The infinitesimal epipolar constraint of Eq. (10.4) was derived by assuming that the

observed scene was static and the camera was moving. Show that when the camerais fixed and the scene is moving with translational velocity v and rotational velocityω, the epipolar constraint can be rewritten as pT ([v×][ω×])p + (p × p) · v = 0.Note that this equation is now the sum of the two terms appearing in Eq. (10.4)instead of their difference.Hint: If R and t denote the rotation matrix and translation vectors appearing inthe definition of the essential matrix for a moving camera, show that the objectdisplacement that yields the same motion field for a static camera is given by therotation matrix RT and the translation vector −RT t.

41

Page 42: 2461059 Computer Vision Solution Manual

42 Chapter 10 The Geometry of Multiple Views

Solution Let us consider first a moving camera and a static scene, use the coor-dinate system attached to the camera in its initial position as the world coordinatesystem, and identify scene points with their positions in this coordinate system andimage points with their position in the corresponding camera coordinate system.We have seen that the projection matrix associated with this camera can be takenequal to M =

(

Id 0)

before the motion and to M =(

RTc −RTc tc)

after thecamera has undergone a rotation Rc and a translation tc. Using non-homogeneouscoordinates for scene points and homogeneous ones for image points, the two imagesof a point P are thus p = P and p′ = RTc P −RTc tc.Let us now consider a static camera and a moving object. Suppose this objectundergoes the (finite) motion defined by P ′ = RoP + to in the coordinate systemattached to this camera. Since the projection matrix is

(

Id 0)

in this coordinatesystem, the image of P before the object displacement is p = P . The imageafter the displacement is p′ = RoP + to, and it follows immediately that takingRo = RT and to = −RT tc yields the same motion field as before.For small motions, we have

Ro = Id + δt [ωo×] = RTc = Id− δt [ωc×],

and it follows that ωo = −ωc. Likewise,

to = δtvo = −RTc tc = −(Id− δt [ωc×])(δtvc) = −δtvc

when second-order terms are neglected. Thus vc = −vo. Recall that Eq. (10.4)can be written as

pT ([vc×][ωc×])p− (p× p) · vc = 0.

Substituting vc = −vo and ωc = −ωo in this equation finally yields

pT ([vo×][ωo×])p+ (p× p) · vo = 0.

10.4. Show that when the 8×8 matrix associated with the eight-point algorithm is singu-lar, the eight points and the two optical centers lie on a quadric surface (Faugeras,1993).Hint: Use the fact that when a matrix is singular, there exists some nontriviallinear combination of its columns that is equal to zero. Also take advantage of thefact that the matrices representing the two projections in the coordinate system ofthe first camera are in this case (Id 0) and (RT −RT t).Solution We follow the proof in Faugeras (1993): Each row of the 8 × 8 matrixassociated with the eight-point algorithm can be written as

(uu′, uv′, u, vu′, vv′, v, u′, v′) =1

zz′(xx′, xy′, xz′, yx′, yy′, yz′, zx′, zy′),

where P = (x, y, z)T and P ′ = (x′, y′, z′)T denote the positions of the scene pointprojecting onto (u, v)T and (u′, v′)T in the corresponding camera coordinate sys-tems (C) and (C ′). For the matrix to be singular, there must exist some nontriviallinear combination of its columns that is equal to zero—that is, there must existeight scalars λi (i = 1, . . . , 8) such that

λ1xx′ + λ2xy

′ + λ3xz′ + λ4yx

′ + λ5yy′ + λ6yz

′ + λ7zx′ + λ8zy

′ = 0,

Page 43: 2461059 Computer Vision Solution Manual

43

or, in matrix form,

PTQP ′ = 0, where Q =

(

λ1 λ2 λ3

λ4 λ5 λ6

λ7 λ8 0

)

.

Now, with the conventions used to define R and t, we have R = CC′R and t =

CO′ = C−−→OO′, where O and O′ denote the optical centers of the two cameras. It

follows that

P′ = C′P = C′

C RCP+C′−−→O′O = C′

C RCP−C′−−→OO′ = C′

C RCP−C′

C RC−−→OO′ = RT (P−t).

The degeneracy condition derived earlier can therefore be written as

PTQRT (P − t) = 0.

This equation is quadratic in the coordinates of P and defines a quadric surface inthe coordinate system attached to the first camera. This quadric passes throughthe two optical centers since its equation is obviously satisfied by P = 0 and P = t.When the 8× 8 matrix involved in the eight-point algorithm is singular, the eightpoints must therefore lie on a quadric passing through these two points.

10.5. Show that three of the determinants of the 3× 3 minors of

L =

lT1 0

lT2R2 lT2 t2

lT3R3 lT3 t3

can be written as l1 ×

lT2 G11 l3

lT2 G21 l3

lT2 G31 l3

= 0.

Show that the fourth determinant can be written as a linear combination of these.

Solution Let us first note that

lT2 G11 l3

lT2 G21 l3

lT2 G31 l3

=

(lT2 t2)(R1T3 l3)− (lT2 R

12)(t

T3 l3)

(lT2 t2)(R2T3 l3)− (lT2 R

22)(t

T3 l3)

(lT2 t2)(R3T3 l3)− (lT2 R

32)(t

T3 l3)

= (lT2 t2)(RT3 l3)− (lT3 t3)(RT2 l2).

To simplify notation, we now introduce the three vectors a = l1, b = RT2 l2, andc = RT3 l3, and the two scalars d = tT2 l2 and e = tT3 l3. With this notation, we cannow rewrite the trifocal constraints as

a× [dc− eb] = d(a× c)− e(a× b) = 0.

With the same notation, we have

LT =

(

a b c

0 d e

)

Let us now compute the determinants of the 3× 3 minors of LT (instead of thoseof L; this is just to avoid transposing too many vectors and does not change theresult of our derivstion).Three of these determinants can be written as

D12 =a1 b1 c1a2 b2 c20 d e

, D23 =a2 b2 c2a3 b3 c30 d e

, D31 =a3 b3 c3a1 b1 c10 d e

.

Page 44: 2461059 Computer Vision Solution Manual

44 Chapter 10 The Geometry of Multiple Views

In particular,

(

D23

D31

D12

)

=

(

(a2b3 − a3b2)e− (a2c3 − a3c2)d(a3b1 − a1b3)e− (a3c1 − a1c3)d(a1b2 − a2b1)e− (a1c2 − a2c1)d

)

= e(a× b)− d(a× c),

which, except for a change in sign, is identical to the expression derived earlier.Thus the fact that three of the 3× 3 minors of L are zero can indeed be expressedby the trifocal tensor.Let us conclude by showing that the fourth determinant is a linear combination ofthe other three. This determinant is

D = a b c = (a× b) · c.

But (a× c) · c = 0, thus we can write

D =1

e[e(a× b)− d(a× c)] · c =

c1

eD23 +

c2

eD31 +

c3

eD12,

which shows that D can indeed be written as a linear combination of D12, D23,and D31.

10.6. Show that Eq. (10.18) reduces to Eq. (10.2) when M1 = (Id 0) and M2 =(RT −RT t).Solution Recall that Eq. (10.18) expresses the bilinear constraints associatedwith two cameras as D = 0, where D is the determinant

D =

u1M31 −M1

1

v1M31 −M2

1

u2M32 −M1

2

v2M32 −M2

2

,

andMji denotes row number j of camera number i. When

M1 =(

Id 0)

=

(

1 0 0 00 1 0 00 0 1 0

)

andM2 =(

RT −RT t)

=

cT1 −c1 · tcT2 −c2 · tcT3 −c3 · t

,

where c1, c2, and c3 denote the three columns ofR, we can rewrite the determinantas

D =

(−1, 0, u1) 0(0,−1, v1) 0

u2cT3 − cT1 −(u2c3 − c1) · t

v2cT3 − cT2 −(v2c3 − c2) · t

=−1 00 −1 v2c3 − c2

u1 v1

(u2c3 − c1) · t−−1 00 −1 u2c3 − c1

u1 v1

(v2c3 − c2) · t

= [p1 · (v2c3 − c2)][(u2c3 − c1) · t]− [p1 · (u2c3 − c1)][(v2c3 − c2) · t]

= pT1(

(c2 · t)c3 − (c3 · t)c2 (c3 · t)c1 − (c1 · t)c3 (c1 · t)c2 − (c2 · t)c1

)

p2,

Page 45: 2461059 Computer Vision Solution Manual

45

where p1 = (u1, v1, 1)T and p2 = (u2, v2, 1)

T .Now, let (t1, t2, t3) be the coordinates of the vector t in the right-handed orthonor-mal basis formed by the columns of R, we have

D = pT1(

t2c3 − t3c2 t3c1 − t1c3 t1c2 − t2c1

)

p2

= pT1(

t× c1 t× c2 t× c3

)

p2 = pT1 [t×]Rp2,

which is indeed an instance of Eq. (10.2).

10.7. Show that Eq. (10.19) reduces to Eq. (10.15) whenM1 = (Id 0).

Solution Recall that Eq. (10.18) expresses the trilinear constraints associatedwith two cameras as D = 0, where D is the determinant

D =

u1M31 −M1

1

v1M31 −M2

1

u2M32 −M1

2

v3M33 −M2

3

,

andMji denotes row number j of camera number i. WhenM1 =

(

Id 0)

,

D =

(−1, 0, u1, 0)(0,−1, v1, 0)u2M3

2 −M12

v3M33 −M2

3

.

Introducing l1 = (0,−1, v1)T , l2 = (−1, 0, u2)T , and l3 = (0,−1, v3)T allows us to

rewrite this equation as

D =

(−1, 0, u1, 0)

l1TM1

l2TM2

l3TM3

,

or, since the determinant of a matrix is equal to the determinant of its transpose,

D =

(−10u1

)

a b c

0 0 d e

= (−1, 0, u1)

(

D23

D31

D12

)

,

where we use the same notation as in Ex. 10.5. According to that exercise, we thushave

D = −(−1, 0, u1)[l1 ×

lT2 G11 l3

lT2 G21 l3

lT2 G31 l3

] = −(−1, 0, u1)[l1×]

lT2 G11 l3

lT2 G21 l3

lT2 G31 l3

= −((−1

0u1

)

×(

0−1v1

)

) ·

lT2 G11 l3

lT2 G21 l3

lT2 G31 l3

= −pT1

lT2 G11 l3

lT2 G21 l3

lT2 G31 l3

,

which is the result we aimed to prove.

Page 46: 2461059 Computer Vision Solution Manual

46 Chapter 10 The Geometry of Multiple Views

10.8. Develop Eq. (10.20) with respect to the image coordinates, and verify that thecoefficients can indeed be written in the form of Eq. (10.21).

Solution This follows directly from the multilinear nature of determinants. In-deed, Eq. (10.20) can be written as

0 =

v1M31 −M2

1

u2M32 −M1

2

v3M33 −M2

3

v4M34 −M2

4

= v1

M31

u2M32 −M1

2

v3M33 −M2

3

v4M34 −M2

4

−M2

1

u2M32 −M1

2

v3M33 −M2

3

v4M34 −M2

4

= v1u2

M31

M32

v3M33 −M2

3

v4M34 −M2

4

− v1

M31

M12

v3M33 −M2

3

v4M34 −M2

4

− u2

M21

M32

v3M33 −M2

3

v4M34 −M2

4

+

M21

M12

v3M33 −M2

3

v4M34 −M2

4

, etc.

It is thus clear that all the coefficients of the quadrilinear tensor can be written inthe form of Eq. (10.21).

10.9. Use Eq. (10.23) to calculate the unknowns zi, λi, and zi1 in terms of p1, pi, Ri, and

ti (i = 2, 3). Show that the value of λi is directly related to the epipolar constraint,and characterize the degree of the dependency of z2

1 − z31 on the data points.

Solution We rewrite Eq. (10.23) as

zi1p1 = ti + ziRipi + λi(p1 ×Ripi)⇐⇒ zi1p1 = ti + ziqi + λiri,

where qi = Ripi, ri = p1 × qi, and i = 2, 3.Forming the dot product of this equation with ri yields

ti · [p1 ×Ripi] + λi|ri|2 = 0,

or, rearranging the terms in the triple product,

λi|ri|2 = p1 · [ti ×Ripi],

which can be rewritten as

λi =pT1 Eipi|ri|2

,

where Ei is the essential matrix associated with views number 1 and i. So thevalue of λi is indeed directly related to the epipolar constraint. In particular, ifthis constraint is exactly satisfied, λi = 0.Now forming the dot product of Eq. (10.23) with qi × ri yields

zi1p1 · (qi × ri) = ti · (qi × ri).

Let us denote by [a, b, c] = a · (b× c) the triple product of vectors in R3. Notingthat the value of the triple product is invariant under circular permutations of thevectors reduces the above equation to

zi1ri · (p1 × qi) = ri · (ti × qi),

or

zi1 =rTi Eipi|ri|2

.

Page 47: 2461059 Computer Vision Solution Manual

47

Likewise, forming the dot product of Eq. (10.23) with p1 × ri yields

[ti,p1, ri] + zi[qi,p1, r1] = 0,

or

zi =[ti,p1, ri]

|ri|2.

The scale-restraint condition can be written as z21 − z31 = 0, or

|r2|2(rT3 E3p3) = |r3|2(rT2 E2p2),

which is an algebraic condition of degree 7 in p1, p2, and p3.

Programming Assignments

10.10. Implement the eight-point algorithm for weak calibration from binocular pointcorrespondences.

10.11. Implement the linear least-squares version of that algorithm with and withoutHartley’s preconditioning step.

10.12. Implement an algorithm for estimating the trifocal tensor from point correspon-dences.

10.13. Implement an algorithm for estimating the trifocal tensor from line correspon-dences.

Page 48: 2461059 Computer Vision Solution Manual

C H A P T E R 11

Stereopsis

PROBLEMS

11.1. Show that, in the case of a rectified pair of images, the depth of a point P in thenormalized coordinate system attached to the first camera is z = −B/d, where Bis the baseline and d is the disparity.

Solution Note that for rectified cameras, the v and v′ axis of the two imagecoordinate systems are parallel to each other and to the y axis of the coordinatesystem attached to the first camera. In addition, the images q and q′ of any pointQ in the plane y = 0 verify v = v′ = 0. As shown by the diagram below, if H, C0

and C′0 denote respectively the orthogonal projection of Q onto the baseline and

the principal points of the two cameras, the triangles OHQ and qC0O are similar,thus −b/z = −u/1. Likewise, the triangles HO′Q and C′

0q′O′ are similar, thus

−b′/z = u′/1, where b and b′ denote respectively the lengths of the line segmentsOH and HO′. It follows that u′ − u = −B/z or z = −B/d.

O’O

P

Q

v

u’

v’

p

q

H

C0 C’0

q’

p’

d

u

1 1

-z

b’b

Let us now consider a point P with nonzero y coordinate and its orthogonal pro-jection Q onto the plane y = 0. The points P and Q have the same depth sincethe line PQ joining them is parallel to the y axis. The lines pq and p′q′ joiningthe projections of the two points in the two images are also obviously parallel toPQ and to the v and v′ axis. It follows that the u coordinates of p and q are thesame, and that the u′ coordinates of p′ and q′ are also the same. In other words,the disparity and depths for the points P and Q are the same, and the formulaz = −B/d holds in general.

11.2. Use the definition of disparity to characterize the accuracy of stereo reconstructionas a function of baseline and depth.

Solution Let us assume that the cameras have been rectified. In this case, as in

48

Page 49: 2461059 Computer Vision Solution Manual

49

Ex. 11.1, we have z = −B/d. Let us assume the disparity has been measured withsome error ε. Using a first-order Taylor expansion of the depth shows that

z(δ + ε)− z(δ) ≈ εz′(δ) = εB

d2=

ε

Bz2.

In other words, the error is proportional to the squared depth and inversely pro-portional to the baseline.

11.3. Give reconstruction formulas for verging eyes in the plane.

Solution Let us define a Cyclopean coordinate system with origin at the mid-point between the two eyes, x axis in the direction of the baseline, and z axisoriented so that z > 0 for points in front of the two eyes (note that this contradictsour usual conventions, but allows us to use a right-handed (x, z) coordinate system).Now consider a point P with coordinates (x, z). As shown by the diagram below, ifthe corresponding projection rays make angles θl and θr with the z axis, we musthave

x+B/2

z= cot θl,

x−B/2z

= cot θr,

⇐⇒

x =B

2

cot θl + cot θrcot θl − cot θr

,

z =B

cot θl − cot θr.

r

r

l

-x

P

B

F z

z

x

θθ φ

Now, if the fixated point has angular coordinates (φl, φr) and some other point Phas absolute angular coordinates (θl, θr), Cartesian coordinates (x, z), and retinalangular coordinates (ψl, ψr), we must have θl = φl + ψl and θr = φr + ψr, whichgives reconstruction formulas for given values of (φl, φr) and (ψl, ψr).

11.4. Give an algorithm for generating an ambiguous random dot stereogram that candepict two different planes hovering over a third one.

Solution We display two squares hovering at different heights over a larger bac-ground square. The background images can be synthesized by spraying randomblack dots on a white background plate after (virtually) covering the area corre-sponding to the hovering squares. For the other two squares, the dots are generatedas follows: On a given scanline, intersect a ray issued from the left eye with thefirst plane and the second one, and paint black the resulting dots P1 and P2. Thenpaint a black dot on the first plane at the point P3 where the ray joining the righteye to P2 intersects the first plane. Now intersect the ray joining the left eye toP3 with the second plane. Continue this process as long as desired. It is clear thatthis will generate a deterministic, but completely ambiguous pattern. Limiting this

Page 50: 2461059 Computer Vision Solution Manual

50 Chapter 11 Stereopsis

process to a few iterations and repeating it at many random locations will achievethe desired random effect.

11.5. Show that the correlation function reaches its maximum value of 1 when the imagebrightnesses of the two windows are related by the affine transform I ′ = λI +µ forsome constants λ and µ with λ > 0.

Solution Let us consider two images represented by the vectorsw = (w1, . . . , wp)T

and w′ = (w′1, . . . , w

′p)T of Rp (typically, p = (2m+1)× (2n+1) for some positive

values of m and n). As noted earlier, the corresponding normalized correlationvalue is the cosine of the angle θ between the vectors w− w and w′− w′, where adenotes the vector whose coordinates are all equal to the mean a of the coordinatesof a.The correlation function reaches its maximum value of 1 when the angle θ is zero.In this case, we must have w′− w′ = λ(w− w) for some λ > 0, or for i = 1, . . . , p,

w′i = λwi + (w′ − λw) = λwi + µ,

where µ = w′ − λw.Conversely, suppose that w′

i = λwi + µ for some λ, µ with λ > 0. Clearly, w′ =λw+µ and w′ = λw+µ, where µ denotes this time the vector with all coordinatesequal to µ. Thus w′ − w′ = λ(w − w), and the angle θ is equal to zero, yieldingthe maximum possible value of the correlation function.

11.6. Prove the equivalence of correlation and sum of squared differences for images withzero mean and unit Frobenius norm.

Solution Let w and w′ denote the vectors associated with two image windows.If these windows have zero mean and unit Frobenius norm, we have by definition|w|2 = |w′|2 = 1 and w = w′ = 0. In this case, the sum of squared differences is

|w′ −w|2 = |w|2 − 2w ·w′ + |w′|2 = 2− 2w ·w′ = 2− 2C,

where C is the normalized correlation of the two windows. Thus minimizing thesum of squared differences is equivalent to maximizing the normalized correlation.

11.7. Recursive computation of the correlation function.(a) Show that (w − w) · (w′ − w′) = w ·w′ − (2m+ 1)(2n+ 1)I I ′.(b) Show that the average intensity I can be computed recursively, and estimate

the cost of the incremental computation.(c) Generalize the prior calculations to all elements involved in the construction

of the correlation function, and estimate the overall cost of correlation over apair of images.

Solution

(a) First note that for any two vectors of size p, we have a · b = pab, where a andb denote respectively the average values of the coordinates of a and b. Forvectors w and w′ representing images of size (2m+1)× (2n+1) with averageintensities I and I ′, we have therefore

(w − w) · (w′ − w′) = w ·w′ − w ·w′ −w · w′ + w · w′

= w ·w′ − (2m+ 1)(2n+ 1)I I ′.

(b) Let I(i, j) and I(i+1, j) denote the average intensities computed for windowsrespectively centered in (i, j) and (i + 1, j). If p = (2m + 1) × (2n + 1), we

Page 51: 2461059 Computer Vision Solution Manual

51

have

I(i+ 1, j) =1

p

m∑

k=−m

n∑

l=−n

I(i+ k + 1, j + l)

=1

p

m−1∑

k=−m

n∑

l=−n

I(i+ k + 1, j + l) +1

p

n∑

l=−n

I(i+m+ 1, j + l)

=1

p

m∑

k′=−m+1

n∑

l=−n

I(i+ k′, j + l) +1

p

n∑

l=−n

I(i+m+ 1, j + l)

= I(i, j)− 1

p

n∑

l=−n

I(i−m, j + l) +1

p

n∑

l=−n

I(i+m+ 1, j + l).

Thus the average intensity can be updated in 4(n+1) operations when movingfrom one pixel to the one below it. The update for moving one column to theright costs 4(m+ 1) operations. This is to compare to the (2m+ 1)(2n+ 1)operations necessary to compute the average from scratch.

(c) It is not possible to compute the dot product incrementally during columnshifts associated with successive disparities. However, it is possible to computethe dot product associated with elementary row shifts since 2m of the rowsare shared by consecutive windows. Indeed, let w(i, j) and w′(i, j) denotesthe vectors w and w′ associated with windows of size (2m + 1) × (2n + 1)centered in (i, j). We have

w(i+ 1, j) ·w′(i+ 1, j) =

m∑

k=−m

n∑

l=−n

I(i+ k + 1, j + l)I ′(i+ k + 1, j + l),

and the exact same line of reasoning as in (b) can be used to show that

w(i+ 1, j) ·w′(i+ 1, j) = w(i, j) ·w′(i, j)

−n∑

l=−n

I(i−m, j + l)I ′(i−m, j + l)

+

n∑

l=−n

I(i+m+ 1, j + l)I ′(i+m+ 1, j + l).

Thus the dot product can be updated in 4(2n + 1) operations when movingfrom one pixel to the one below it. This is to compare to the 2(2m+1)(2n+1)− 1 operations necessary to compute the dot product from scratch.

To complete the computation of the correlation function, one must also com-pute the norms |w− w| and |w′ − w′|. This computation also reduces to theevaluation of a dot product and an average, but it can be done recursively forboth rows and column shifts.

Suppose that images are matched by searching for each pixel in the left imageits match in the same scanline of the right image, within some disparity range[−D,D]. Suppose also that the two images have size M × N and that thewindows being compared have, as before, size (2m+1)×(2n+1). By assumingif necessary that the two images have been obtained by removing the outerlayer of a (M + 2m + 2D) × (N + 2n + 2D) image, we can ignore boundaryeffects.

Page 52: 2461059 Computer Vision Solution Manual

52 Chapter 11 Stereopsis

Processing the first scan line requires computing and storing (a) 2N dot prod-ucts of the form w · w or w′ · w′, 2N averages of the form I or I ′, and (c)(2D + 1)N dot products of the form w · w′. The total storage required is(2D + 5)N , which is certainly reasonable for, say, 1000 × 1000 images, anddisparity ranges of [−100, 100]. The computation is dominated by the w ·w′

dot products, and its cost is on the order of 2(2m + 1)(2n + 1)(2D + 1)N .The incremental computations for the next scan line amount to updating allaverages and dot products, with a total cost of 4(2n+ 1)(2D + 1)N . Assum-ing M À m, the overall cost of the correlation is therefore, after M updates,4(2n + 1)(2D + 1)MN operations. Note that a naive implementation wouldrequire instead 2(2m+ 1)(2n+ 1)(2D + 1)MN operations.

11.8. Show how a first-order expansion of the disparity function for rectified images canbe used to warp the window of the right image corresponding to a rectangular regionof the left one. Show how to compute correlation in this case using interpolationto estimate right-image values at the locations corresponding to the centers of theleft window’s pixels.

Solution Let us set up local coordinate systems whose origins are at the twopoints of interest—that is, the two matched points have coordinates (0, 0) in thesecoordinate systems. If d(u, v) denotes the disparity function in the neighborhoodof the first point, and α and β denotes its derivatives in (0, 0), we can write thecoordinates of a match (u′, v′) for the point (u, v) in the first image as

(

u′

v′

)

=

(

u+ d(u, v)v

)

,

or approximating d by its first-order Taylor expansion,

(

u′

v′

)

=

(

1 + α β0 1

)(

uv

)

.

It follows that, to first-order, a small rectangular region in the first image mapsonto a parallelogram in the second image, and that the corresponding affine trans-formation is completely determined by the derivatives of the disparity function.To exploit this property in stereo matching, one can map the centers of the pixelscontained in the left window onto their right images, calculate the correspondingintensity values via bilinear interpolation of neighborhing pixels in the right image,and finally computer the correlation function from these values. This is essentiallythe method described in Devernay and Faugeras (1994).

11.9. Show how to use the trifocal tensor to predict the tangent line along an image curvefrom tangent line measurements in two other pictures.

Solution Let us assume we have estimated the trifocal tensor associated withthree images of a curve Γ. Let us denote by pi the projection of a point P of Γonto image number i (i = 1, 2, 3). The tangent line T to Γ in P projects onto thetangent line ti to γi. Given the coordinate vectors t2 and t3 of t2 and t2, we canpredict the coordinate vector t1 of t1 as t1 = (tT2 G1

1t3, tT2 G2

1t3, tT2 G3

1t3)T . This is

another method for transfer, this time applied to lines instead of points.

Programming Assignments

11.10. Implement the rectification process.11.11. Implement a correlation-based approach to stereopsis.11.12. Implement a multiscale approach to stereopsis.

Page 53: 2461059 Computer Vision Solution Manual

53

11.13. Implement a dynamic-programming approach to stereopsis.11.14. Implement a trinocular approach to stereopsis.

Page 54: 2461059 Computer Vision Solution Manual

C H A P T E R 12

Affine Structure from Motion

PROBLEMS

12.1. Explain why any definition of the “addition” of two points or of the “multiplication”of a point by a scalar necessarily depends on the choice of some origin.

Solution Any such definition should be compatible with vector addition. In

particular, when R = P + Q,we should also have−→OR =

−−→OP +

−−→OQ and

−−→O′R =−−→

O′P +−−→O′Q for any choice of the origins O and O′. Subtracting the two expressions

shows that−−→OO′ = 2

−−→OO′ or O′ = O. Thus any definition of point addition would

have to be relative to a fixed origin. Likewise, if Q = λP , we should also have−−→OQ = λ

−−→OP and

−−→O′Q = λ

−−→O′P for any choice of origins O and O′. This implies that−−→

OO′ = λ−−→OO′ or O = O′ when λ 6= 1. Thus any definition of point multiplication

by a scalar would have to be relative to a fixed origin.

12.2. Show that the definition of a barycentric combination as

m∑

i=0

αiAidef= Aj +

m∑

i=0,i6=j

αi(Ai −Aj),

is independent of the choice of j.

Solution Let us define

P = Aj +

m∑

i=0,i6=j

αi(Ai −Aj) = Ak + (Aj −Ak) +m∑

i=0,i6=j

αi(Ai −Aj).

Noting that the summation defining P can be taken over the whole 0..m rangewithout changing its result and using the fact that

∑mi=0 αi = 1, we can write

P = Ak +

m∑

i=0

αi[(Ai −Aj) + (Aj −Ak)].

Now, we can write Ai = Ak + (Ai − Ak), but also Ai = Aj + (Ai − Aj) =Ak + (Aj −Ak) + (Ai −Aj). Thus, by definition of an affine space, we must have(Ai −Aj) + (Aj −Ak) = (Ai −Ak) (which is intuitively obvious). It follows that

P = Ak +

m∑

i=0

αi(Ai −Ak).

As before, omitting the term corresponding to i = k does not change the result ofthe summation, which proves the result.

54

Page 55: 2461059 Computer Vision Solution Manual

55

12.3. Given the two affine coordinate systems (A) = (OA,uA,vA,wA) and (B) =(OB ,uB ,vB ,wB) for the affine space E3, let us define the 3× 3 matrix

BAC =

(

BuABvA

BwA

)

,

where Ba denotes the coordinate vector of the vector u in the (vector) coordinatesystem (uA,vA,wA). Show that

BP = BACAP + BOA or, equivalently,

(

BP1

)

=

(

BAC BOA0T 1

)(

AP1

)

.

Solution The proof follows the derivation of the Euclidean change of coordinatesin chapter 2. We write

−−−→OBP =

(

uB vB wB

)

BP =−−−−→OBOA +

(

uA vA wA

)

AP .

Rewriting this equation in the coordinate frame (B) yields immediately

BP = BACAP + BOA

since BBC is obviously the identity. The homogeneous form of this expression followsimmediately, exactly as in the Euclidean case.

12.4. Show that the set of barycentric combinations of m+ 1 points A0, . . . , Am in X isindeed an affine subspace of X, and show that its dimension is at most m.

Solution Let us denote by Y the set of barycentric combinations of the pointsA0, A1, . . . , Am, pick some number j between 0 andm, and denote by Uj the vectorspace defined by all linear combinations of the vectors Ai−Aj for i 6= j. It is clearthat Y = Aj + Uj since any barycentric combination of the points A0, A1, . . . , Amis by definition in Aj+Uj , and any point in Aj+Uj can be written as a barycentriccombination of the points A0, A1, . . . , Am with αj = 1−

∑mi=0,i6=j αi.

Thus Y is indeed an affine subspace of X, and its dimension is at most m since thevector space Uj is spanned by m vectors. This subspace is of course independentof the choice of j: This follows directly from the fact that the definition of abarycentric combination is also independent of the choice of j.

12.5. Derive the equation of a line defined by two points in R3. (Hint: You actually needtwo equations.)

Solution We equip R3 with a fixed affine coordinate system and identify pointswith their (non-homogeneous) coordinate vectors. According to Section 12.1.2, anecessary and sufficient for the three points P 1 = (x1, y1, z1)

T , P 2 = (x2, y2, z2)T ,

and P = (x, y, z)T to define a line (i.e., a one-dimensional affine space) is that thematrix

x1 x2 xy1 y2 yz1 z2 z1 1 1

have rank 2, or equivalently, that all its 3 × 3 minors have zero determinant (weassume that the three points are distincts so the matrix has at least rank 2). Note

Page 56: 2461059 Computer Vision Solution Manual

56 Chapter 12 Affine Structure from Motion

that three of these determinants are

y1 y2 yz1 z2 z1 1 1

= y(z1 − z2)− z(y1 − y2) + y1z2 − y2z1,

z1 z2 zx1 x2 x1 1 1

= z(x1 − x2)− x(z1 − z2) + z1x2 − z2x1,

x1 x2 xy1 y2 y1 1 1

= x(y1 − y2)− y(x1 − x2) + x1y2 − x2y1,

i.e., the coordinates of

P × (P 1 − P 2) + P 1 × P 2 = (P − P 2)× (P 1 − P 2).

As could have been expected, writing that these three coordinates are zero is e-quivalent to writing that P1, P2, and P are collinear. Only two of the equationsassociated with the three coordinates of the cross product are equivalent. It is easyto see that the fourth minor is a linear combination of the other three, so the lineis defined by any two of the above equations.

12.6. Show that the intersection of a plane with two parallel planes consists of two parallellines.

Solution Consider the plane A+U and the two parallel planes B+V and C+Vin some affine space X. Here A, B, and C are points in X, and U and V are vectorplanes in ~X, and we will assume from now on that U and V are distinct (otherwisethe three planes are parallel). As shown in Example 12.2, the intersection of twoaffine subspaces A+U and B + V is an affine subspace associated with the vectorsubspace W = U ∩V . The intersection of two distinct planes in a vector space is aline, thus the intersection of A+U and B+V is a line. The same reasoning showsthat the intersection of A+U and C+V is also a line associated with W . The twolines are parallel since they are associated with the same vector subspace W .

12.7. Show that an affine transformation ψ : X → Y between two affine subspacesX and Y associated with the vector spaces ~X and ~Y can be written as ψ(P ) =

ψ(O) + ~ψ(P − O), where O is some arbitrarily chosen origin, and ~ψ : ~X → ~Y is alinear mapping from ~X onto ~Y that is independent of the choice of O.

Solution Let us pick some point O in X and define ~ψ : ~X → ~Y by ~ψ(u) =

ψ(O + u)− ψ(O). Clearly, ψ(P ) = ψ(O + (P −O)) = ψ(O) + ~ψ(P −O). To show

that ~ψ is indeed a linear mapping, let us consider two vectors u and v in ~X, twoscalars λ and µ in R, and the points A = O + u and B = O + v. Since ψ is anaffine mapping, we have

~ψ(λu+ µv) = ψ(O + λu+ µv)− ψ(O)= ψ(O + λ(A−O) + µ(B −O))− ψ(O)= ψ((1− λ− µ)O + λA+ µB)− ψ(O)= (1− λ− µ)ψ(O) + λψ(A) + µψ(B)− ψ(O)= ψ(O) + λ(ψ(A)− ψ(O)) + µ(ψ(B)− ψ(O))− ψ(O)

= λ~ψ(u) + µ~ψ(v).

Page 57: 2461059 Computer Vision Solution Manual

57

Thus ~ψ is indeed a linear mapping. Let us conclude by showing that it is inde-pendent of the choice of O. We define the mappings ~ψO and ~ψO′ from ~X to ~Y by~ψO(u) = ψ(O + u)− ψ(O) and ~ψO′(u) = ψ(O′ + u)− ψ(O′). Now,

~ψO′(u) = ψ(O′ +u)−ψ(O′) = ψ(O+ (O′ −O) +u)−ψ(O+ (O′ −O)) = ~ψO(u),

thus ~ψ is independent of the choice of O.12.8. Show that affine cameras (and the corresponding epipolar geometry) can be viewed

as the limit of a sequence of perspective images with increasing focal length recedingaway from the scene.

Solution As shown in chapter 2, the projection matrix associated with a pinholecamera can be written as M = K

(

R t)

, where K is the matrix of intrinsic pa-

rameters, R = CWR, and t = COW . It follows that tz , the third coordinate of t,

can be interpreted as the depth of the origin OW of the world coordinate systemrelative to the camera. Now let us consider a camera moving away from OW alongits optical axis while zooming. Its projection matrix can be written as

Mλ,µ =

λα −λα cot θ u0

0λβ

sin θv0

0 0 1

(

Rtxtyµtz

)

,

where λ and µ are the parameters controlling respectively the zoom and cameramotion, withM1,1 =M. We can rewrite this matrix as

Mλ,µ = K(

λ 0 00 λ 00 0 1

)(

Rtxtyµtz

)

= K

λrT1 λtxλrT2 λtyrT3 µtz

.

Now if we choose µ = λ we can write

Mλ,λ = λK

rT1 txrT2 ty1λr

T3 tz

.

When λ→ +∞, the projection becomes affine, with affine projection matrix

1

tz

(

K2R2 K2t2 + p0

)

,

where we follow the notation used in Eq. (2.19) of chapter 2.Note that picking µ = λ ensures that the magnification remains constant for thefronto-parallel plane Π0 that contains OW . Indeed, let us denote by (iC , jC ,kC)the camera coordinate system, and consider a point A = OW + xiC + yjC in Π0.

Since R = CWR = W

C RT, we have W iC

T= rT1 ,

W jCT= rT2 , and

WkCT= rT3 . It

follows that

Mλ,λWA = λK

rT1 (xr1 + yr2) + txrT2 (xr1 + yr2) + ty1λr

T3 (xr1 + yr2) + tz

= λK(

x+ tzy + tytz

)

.

Thus the denominator involved in the perspective projection equation is equal totz and the same for all points in Π0, which in turns implies that the magnificationassociated with Π0 is independent of λ.

Page 58: 2461059 Computer Vision Solution Manual

58 Chapter 12 Affine Structure from Motion

12.9. Generalize the notion of multilinearities introduced in chapter 10 to the affine case.

Solution Let us consider an affine 2× 4 projection matrixM with rowsM1 andM2. Note that we can write the projection equations, just as in chapter 10, as

(

uM3 −M1

vM3 −M2

)

P = 0,

where this time M3 = (0, 0, 0, 1). We can thus construct as in that chapter the8×4 matrix Q, and all its 4×4 minors must, as before, have zero determinant. Thisyields multi-image constraints involving two, three, or four images, but since imagecoordinates only occur in the fourth column of Q, these constraints are now linear

in these coordinates (note the similarity with the affine fundamental matrix). Onthe other hand, the multi-image relations between lines remain multilinear in theaffine case. For example, the derivation of the trifocal tensor for lines in Section10.2.1 remains unchanged (except for the fact the third row of M is now equalto (0, 0, 0, 1)), and yields trilinear relationships among the three lines’ coordinatevectors. Likewise, the interpretation of the quadrifocal tensor in terms of linesremains valid in the affine case.

12.10. Prove Theorem 3.

Solution Let us write the singular value decomposition of A as A = UWVT .Since U is column-orthogonal, we have

ATA = VWTUTUWVT = VWTWVT .

Now let ci (i = 1, . . . , n) denote the columns of V. Since V is orthogonal, we have

ATAci = VWTW

cT1...

ci−1

cici+1...cn

ci = Vdiag(w21, . . . , w

2i−1, w

2i , w

2i+1, . . . , w

2n)

0...010...0

=(

c1, . . . , ci−1, ci, ci+1, . . . , cn)

0...0

w2i0...0

= w2i ci.

It follows that the vectors ci are indeed eigenvectors of ATA, and that the singularvalues are the nonnegative square roots of the corresponding eigenvalues.

12.11. Show that a calibrated paraperspective camera is an affine camera that satisfiesthe constraints

a · b =urvr

2(1 + u2r)|a|2 +

urvr

2(1 + v2r )|b|2 and (1 + v2r )|a|2 = (1 + u2

r)|b|2,

Page 59: 2461059 Computer Vision Solution Manual

59

where (ur, vr) denote the coordinates of the perspective projection of the point R.

Solution Recall from chapter 2 that the paraperspective projection matrix canbe written as

M =1

zr

((

k s u0 − ur0 1 v0 − vr

)

R(

k s0 1

)

t2

)

.

For calibrated cameras, we can take k = 1, s = 0, and u0 = v0 = 0. If rT1 , rT2 , and

rT3 are the rows of the rotation matrix R, it follows that

a =1

zr(r1 − urr3) and b =

1

zr(r2 − vrr3).

In particular, we have |a|2 = (1 + u2r)/z

2r , |b|2 = (1 + v2r )/z

2r , and a · b = urvr/z

2r .

The result immediately follows.12.12. What do you expect the RREF of an m × n matrix with random entries to be

when m ≥ n? What do you expect it to be when m < n? Why?

Solution A random m × n matrix A usually has maximal rank. When m > n,this rank is n, all columns are base columns, and the m − n bottom rows of theRREF of A are zero. When m < n, the rank is m, and the first m columns of A arenormally independent. It follows that the base columns of the RREF are its firstm columns; the n−m rightmost columns of the RREF contain the coordinates ofthe corresponding columns of A in the basis formed by its first m columns. Thereare no zeros in the RREF in this case.

Programming Assignments

12.13. Implement the Koenderink–Van Doorn approach to affine shape from motion.12.14. Implement the estimation of affine epipolar geometry from image correspondences

and the estimation of scene structure from the corresponding projection matrices.12.15. Implement the Tomasi–Kanade approach to affine shape from motion.12.16. Add random numbers uniformly distributed in the [0, 0.0001] range to the entries

of the matrix U used to illustrate the RREF and compute its RREF (using, e.g., therref routine in MATLAB); then compute again the RREF using a “robustified”version of the reduction algorithm (using, e.g., rref with a nonzero tolerance).Comment on the results.

Page 60: 2461059 Computer Vision Solution Manual

C H A P T E R 13

Projective Structure from Motion

PROBLEMS

13.1. Use a simple counting argument to determine the minimum number of point corre-spondences required to solve the projective structure-from-motion problem in thetrinocular case.

Solution As shown at the beginning of this chapter, the projective structure-from-motion problem admits a finite number of solutions when 2mn ≥ 11m+3n−15, where m denotes the number of input pictures, and n denotes the number ofpoint correspondences. This shows that the minimum number of point correspon-dences required to solve this problem in the trinocular case (m = 3) is given by6n ≥ 33 + 3n− 15 or n ≥ 6.

13.2. Show that the change of coordinates between the two projective frames (A) =(A0, A1, A2, A3, A

∗) and (B) = (B0, B1, B2, B3, B∗) can be represented by ρBP =

BAT AP , where AP and BP denote respectively the coordinate vectors of the pointP in the frames (A) and (B), and ρ is an appropriate scale factor.

Solution Let us denote by a0, a1, a2, and a3 the representative vectors asso-ciated with the fundamental points of (A). Recall that these vectors are defineduniquely (up to a common scale factor) by the choice of the unit point A∗. Like-wise, let us denote by b0, b1, b2 and b3 the representative vectors associated withthe fundamental points of (B), and let (b) denote the corresponding vector basisof ~X. Given a point P and a representative vector v for this point, we can write

v = λ(Ax0a0 +Ax1a1 +

Ax2a2 +Ax3a3) = µ(Bx0b0 +

Bx1b1 +Bx2b2 +

Bx3b3),

or, in matrix form,

λ(

a0 a1 a2 a3

)

AP = µ(

b0 b1 b2 b3

)

BP .

Rewriting this equation in the coordinate frame (b) yields immediately

ρBP = BAT AP , where B

AT =(

ba0ba1

ba2ba3

)

,

and ρ = µ/λ, which proves the desired result. Note that the columns of BAT

are related to the coordinate vectors BAi by a priori unknown scale factors. Atechnique for computing these scale factors is given in Section 13.1.

13.3. Show that any two distinct lines in a projective plane intersect in exactly one pointand that two parallel lines ∆ and ∆′ in an affine plane intersect at the point atinfinity associated with their common direction v in the projective completion ofthis plane.Hint: Use JA to embed the affine plane in its projective closure, and write the vectorof Π×R associated with any point in JA(∆) (resp. JA(∆

′)) as a linear combination

of the vectors (−→AB, 1) and (

−→AB + v, 1) (resp. (

−−→AB′, 1) and (

−−→AB′ + v, 1)), where B

and B′ are arbitrary points on ∆ and ∆′.

60

Page 61: 2461059 Computer Vision Solution Manual

61

Solution Consider two distinct lines ∆ and ∆′ in a projective plane, and let(e1, e2) and (e′1, e

′2) denote two bases for the associated two-dimensional vector

spaces. The intersection of ∆ and ∆′ is the set of points p(u), where u = λe1 +µe2 = λ′e′1 + µ′e′2 for some value of the scalars λ, µ, λ′, µ′. When e′1 can bewritten as a linear combination of the vectors e1 and e2, we must have µ′ = 0 sinceotherwise e′2 would also be a (non trivial) linear combination of e1 and e2 and thetwo lines would be the same. In this case, p(e′1) is the unique intersection point of ∆and ∆′. Otherwise, the three vectors e1, e2, and e′1 are linearly independent, andthe vector e′2 can be written in a unique manner as a linear combination of thesevectors, yielding a unique solution (defined up to scale) for the scalars λ, µ, λ′, µ′,and therefore a unique intersection for the lines ∆ and ∆′.Now let us consider two parallel (and thus distinct) lines ∆ and ∆′ with directionv in the affine plane. The intersection of their images JA(∆) and JA(∆

′) is de-

termined by the solutions of the equation λ(−→AB, 1) + µ(

−→AB + v, 1) = λ′(

−→AB′, 1) +

µ′(−→AB′ + v, 1). This equation can be rewritten as

λ+ µ = λ′ + µ′ and (λ+ µ)−−→BB′ + (µ′ − µ)v = 0.

Since the lines are not the same, the vectors−−→BB′ and v are not proportional to

each other, thus we must have µ = µ′ and λ+ µ = λ′ + µ′ = 0. Thus the two linesJA(∆) and JA(∆

′) intersect at the point associated with the vector

((λ+ µ)−→AB + µv, λ+ µ) = (µv, 0),

which is the point at infinity associated with the direction v.13.4. Show that a perspective projection between two planes of P3 is a projective trans-

formation.

Solution Let us consider two planes Π and Π′ of P3 and a point O in P3 thatdoes not belong to either plane. Now let us define the corresponding perspectiveprojection ψ : Π → Π′ as the mapping that associates with any point P in Π thepoint P ′ where the line passing through O and P intersects Π′. This function isbijective since any line in P3 that does not belong to a plane intersects this plane inexactly one point, and the inverse of ψ can be defined as the perspective projectionfrom Π′ onto Π. The function ψ obviously maps lines onto lines (the image of aline ∆ in Π is the line ∆′ where the plane defined by O and ∆ intersects Π′). Givenfour points A, B, C, and D lying on the same line ∆ in Π, the cross-ratio of thesefour points is equal to the cross-ratio of the lines ∆A, ∆B , ∆C and ∆D passingthrough these points and the point O. But this cross ratio is also equal to the crossratio of the image points A′, B′, C′, and D′ that all lie on the image ∆′ of ∆.Thus ψ is a projective transformation. This construction is obviously correct forthe finite points of Π. It remains valid for points at infinity using the definition ofthe cross-ratio extended to the whole projective line.

13.5. Given an affine space X and an affine frame (A0, . . . , An) for that space, what is

the projective basis of X associated with the vectors eidef= (−−−→A0Ai, 0) (i = 1, . . . , n)

and the vector en+1 = (0, 1)? Are the points JA0(Ai) part of that basis?

Solution The fundamental points of this projective basis are the point p(ei)(i = 1, . . . , n + 1). All but the last one lie in the hyperplane at infinity. The

unit point is p(∑n+1

i=1 ei). The n points JA0(Ai) (i = 1, . . . , n) are all finite and

do not belong to the projective basis. Their coordinates are (1, 0, . . . , 0, 1)T , . . .,(0, 0, . . . , 1, 1)T

Page 62: 2461059 Computer Vision Solution Manual

62 Chapter 13 Projective Structure from Motion

13.6. In this exercise, you will show that the cross-ratio of four collinear points A, B, C,and D is equal to

{A,B;C,D} = sin(α+ β) sin(β + γ)

sin(α+ β + γ) sinβ,

where the angles α, β, and γ are defined as in Figure 13.2.(a) Show that the area of a triangle PQR is

A(P,Q,R) =1

2PQ×RH =

1

2PQ× PR sin θ,

where PQ denotes the distance between the two points P and Q, H is theprojection of R onto the line passing through P and Q, and θ is the anglebetween the lines joining the point P to the points Q and R.

(b) Define the ratio of three collinear points A, B, C as

R(A,B,C) =AB

BC

for some orientation of the line supporting the three points. Show thatR(A,B,C) =A(A,B,O)/A(B,C,O), where O is some point not lying on this line.

(c) Conclude that the cross-ratio {A,B;C,D} is indeed given by the formulaabove.

Solution

(a) The distance between the points H and R is by construction HR = PR sin θ.It is possible to construct a rectangle of dimensions PQ × RH by adding tothe triangles PHR and RHQ their mirror images relative to the lines PR andRQ respectively. The area A(P,Q,R) of the triangle PQR is half the area ofthe rectangle, i.e.,

A(P,Q,R) =1

2PQ×RH =

1

2PQ× PR sin θ.

(b) Let H denote the orthogonal projection of the point O onto the line passingthrough the points A, B, and C. According to (a), we have A(A,B,O) =12AB ×OH, and A(B,C,O) = 1

2BC ×OH. Thus

R(A,B,C) =AB

BC= ε

A(A,B,O)

A(B,C,O),

where ε = ∓1. Taking the convention that the area A(P,Q,R) is negativewhen the points P , Q, and R are in clockwise order yields the desired result.

(c) By definition of the cross-ratio,

{A,B;C,D} = CA

CB

DB

DA=−R(A,C,B)

−R(A,D,B)=R(A,C,B)

R(A,D,B).

Now, according to (a) and (b), we have, with the same sign convention asbefore

R(A,C,B) =A(A,C,O)

A(C,B,O)=OA×OC sin(α+ β)

−OB ×OC sinβ= −OA sin(α+ β)

OB sinβ

Page 63: 2461059 Computer Vision Solution Manual

63

and

R(A,D,B) =A(A,D,O)

A(D,B,O)=OA×OD sin(α+ β + γ)

−OB ×OD sin(β + γ)= −OA sin(α+ β + γ)

OB sin(β + γ),

thus

{A,B;C,D} = sin(α+ β) sin(β + γ)

sin(α+ β + γ) sinβ.

13.7. Show that the homography between two epipolar pencils of lines can be written as

τ → τ ′ =aτ + b

cτ + d,

where τ and τ ′ are the slopes of the lines.

Solution The coordinate vectors of all lines in the pencil passing through thepoint (α, β)T of the first image can be written as a linear combination of thevertical and horizontal lines going through that point, i.e., l = λv + µh, withv = (1, 0,−α)T and h = (0, 1,−β)T . Likewise, we can write any line in the pencilof lines passing through the point (α′, β′)T of the second image as l′ = λ′v′+µ′h′,with v′ = (1, 0,−α′)T and h′ = (0, 1,−β′)T . We can thus write the linear mapassociated with the epipolar transformation as

(

λ′

µ′

)

=

(

A BC D

)(

λµ

)

,

Now the slope of the line l = (λ, µ,−λα− µβ)T is τ = −λ/µ, and the slope of theline l′ is τ ′ = −λ′/µ′. It follows that

τ ′ = −Aλ+Bµ

Cλ+Dµ= − −Aτ +B

− Cτ +D=aτ + b

cτ + d,

where a = −A, b = B, c = C and d = −D.

13.8. Here we revisit the three-point reconstruction problem in the context of the homoge-neous coordinates of the point D in the projective basis formed by the tetrahedron(A,B,C,O′) and the unit point O′′. Note that the ordering of the reference points,and thus the ordering of the coordinates, is different from the one used earlier: Thisnew choice is, like the previous one, made to facilitate the reconstruction.We denote the (unknown) coordinates of the point D by (x, y, z, w), equip the first(resp. second) image plane with the triangle of reference a′, b′, c′ (resp. a′′, b′′, c′′)and the unit point e′ (resp. e′′), and denote by (x′, y′, z′) (resp. (x′′, y′′, z′′)) thecoordinates of the point d′ (resp. d′′).Hint: Drawing a diagram similar to Figure 13.3 helps.(a) What are the homogeneous projective coordinates of the points D′, D′′, and

E where the lines O′D, O′′D, and O′O′′ intersect the plane of the triangle?(b) Write the coordinates of D as a function of the coordinates of O′ and D′ (resp.

O′′ and D′′) and some unknown parameters.

Hint: Use the fact that the points D, O′, and D′ are collinear.(c) Give a method for computing these unknown parameters and the coordinates

of D.

Page 64: 2461059 Computer Vision Solution Manual

64 Chapter 13 Projective Structure from Motion

Solution The following diagram will help articulate the successive steps of thesolution.

(0,0,0,1)

(1,0,0,0)

(1,1,1,1)

(0,1,0,0)

(0,0,1,0)

(1,1,1,0)

(x",y",z",0)

(x’,y’,z’,0)

(x,y,z,w)

A

O’

O"

B

CD

D’

D"E

(a) Obviously, the coordinates of the points D′ and D′′ are simply (x′, y′, z′, 0)and (x′′, y′′, z′′, 0). The coordinates of the point E are (1, 1, 1, 0).

(b) Since D lies on the line O′D′, we can writeD = λ′O′+µ′D′ = λ′′O′′+µ′′D′′.It remains to compute the coordinates of D as the intersection of the two raysO′D′ and O′′D′′.

We write D = λ′O′ + µ′D′ = λ′′O′′ + µ′′D′′, which yields:

x = µ′x′ = λ′′ + µ′′x′′,y = µ′y′ = λ′′ + µ′′y′′,z = µ′z′ = λ′′ + µ′′z′′,w = λ′ = λ′′.

(13.1)

(c) The values of µ′, µ′′, λ′′ are found (up to some scale factor) by solving thefollowing homogeneous system:

(−x′ x′′ 1−y′ y′′ 1−z′ z′′ 1

)(

µ′

µ′′

λ′′

)

= 0. (13.2)

Note that the determinant of this equation must be zero, which correspondsto D′, D′′, and E being collinear. In practice, (13.2) is solved through linearleast-squares, and the values of x, y, z, w are then computed using (13.1).

13.9. Show that if M = (A b) and M′ = (Id 0) are two projection matrices, and ifF denotes the corresponding fundamental matrix, then [b×]A is proportional to Fwhenever FT b = 0 and

A = −λ[b×]F + ( µb νb τb ).

Solution Suppose that FT b = 0 and A = −λ[b×]F + ( µb νb τb ). Since,

as noted in Section 13.3, [b×]2 = bbT − |b|2Id for any vector b, we have

A = −λbbTF + λ|b|2IdF + [b×]( µb νb τb ) = λ|b|2F .

Page 65: 2461059 Computer Vision Solution Manual

65

This shows that [b×] is indeed proportional to F and there exists a four-parameterfamily of solutions for the matrix M defined (up to scale) by the parameters λ, µ,ν, and τ .

13.10. We derive in this exercise a method for computing a minimal parameterizationof the fundamental matrix and estimating the corresponding projection matrices.This is similar in spirit to the technique presented in Section 12.2.2 of chapter 12in the affine case.(a) Show that two projection matrices M and M′ can always be reduced to the

following canonical forms by an appropriate projective transformation:

M =

(

1 0 0 00 1 0 00 0 1 0

)

and M′ =

aT1 b1aT2 b20T 1

.

Note: For simplicity, you can assume that all the matrices involved in yoursolution are nonsingular.

(b) Note that applying this transformation to the projection matrices amounts toapplying the inverse transformation to every scene point P . Let us denote byP = (x, y, z)T the position of the transformed point P in the world coordinatesystem and by p = (u, v, 1)T and p′ = (u′, v′, 1)T the homogeneous coordinatevectors of its images. Show that

(u′ − b1)(a2 · p) = (v′ − b2)(a1 · p).

(c) Derive from this equation an eight-parameter parameterization of the funda-mental matrix, and use the fact that F is only defined up to a scale factor toconstruct a minimal seven-parameter parameterization.

(d) Use this parameterization to derive an algorithm for estimating F from atleast seven point correspondences and for estimating the projective shape ofthe scene.

Solution

(a) Let mTi and m′T

i (i = 1, 2, 3) denote the rows of the matrices M and M′.We can define the 4× 4 matrix

N =

mT1

mT2

mT3

m′T3

and choose Q = N−1 when N is not singular.

(b) We can write the corresponding projection equations as

zp = P ,

z′p′ =

aT1aT20T

P +

(

b1b21

)

= z

aT1aT20

p+

(

b1b21

)

.

It follows that P = zp, z′ = 1, and

{

u′ = za1 · p+ b1,v′ = za2 · p+ b2.

(13.3)

Page 66: 2461059 Computer Vision Solution Manual

66 Chapter 13 Projective Structure from Motion

Eliminating z among these equations yields

(u′ − b1)(a2 · p) = (v′ − b2)(a1 · p).

(c) The above equation is easily rewritten in the familiar form pFp′ = 0 of theepipolar constraint, the fundamental matrix being written in this case as

F =(

a2 −a1 b2a1 − b1a2

)

.

This construction provides a parameterization of the fundamental matrix by 8independent coefficients (the components of the vectors a1 and a2 and the twoscalars b1 and b2) and guarantees that F is singular. Since the fundamentalmatrix is only defined up to a scale factor, one of the coordinates of, say,the vector a1 can arbitrarily be set to 1, yielding a minimal seven-parameterparameterization.

(d) Using this parameterization, the matrix F can be estimated from at least 7point matches using non-linear least-squares. Once its parameters are known,we can reconstruct every scene point as P = zp, where z is the least-squaressolution of (13.3), i.e.,

z = − (a1 · p)(b1 − u′) + (a2 · p)(b2 − v′)(a1 · p)2 + (a2 · p)2

.

13.11. We show in this exercise that when two cameras are (internally) calibrated so theessential matrix E can be estimated from point correspondences, it is possible therecover the rotation R and translation t such E = [t×]R without solving first theprojective structure-from-motion problem. (This exercise is courtesy of AndrewZisserman.)(a) Since the structure of a scene can only be determined up to a similitude, the

translation t can only be recovered up to scale. Use this and the fact thatET t = 0 to show that the SVD of the essential matrix can be written as

E = U diag(1, 1, 0)V>,

and conclude that t can be taken equal to the third column vector of U .(b) Show that the two matrices

R1 = UWV> and R2 = UW>V>

satisfy (up to an irrelevant sign change) E = [t]×R, where

W =

(

0 −1 01 0 00 0 1

)

.

Solution

(a) Since an essential matrix is singular with two equal nonzero singular values(see chapter 10), and E and t are only defined up to scale, we can always takethe two nonzero singular values equal to 1, and write the SVD of E as

E = U diag(1, 1, 0)V>.

Page 67: 2461059 Computer Vision Solution Manual

67

Writing ET t = 0 now yields

0 = Vdiag(1, 1, 0)UT t = V(

u1 · tu2 · t0

)

,

where u1 and u2 are the first two columns of U . Since V is an orthogonalmatrix, it is nonsingular and we must have u1 · t = u2 · t = 0. Thus t must beparallel to the third column of the orthogonal matrix U . Since, once again,E is only defined up to scale, we can take t to be the third column (a unitvector) of U .

(b) First note that we can always assume that the orthogonal matrices U and Vare rotation matrices. Indeed, since the third singular value of E is zero, wecan always replace the third column of either matrix by its opposite to makethe corresponding determinant positive. The resulting decomposition of E isstill a valid SVD. Since the matrices U , V and W (and their transposes) arerotation matrices, it follows that both R1 and R2 are also rotation matrices.

Now since t is the third column of U , we have t×u1 = u2 and t×u2 = −u1.In particular,

[t×]R1 =(

u2 −u1 0)

WVT = −(

u1 u2 0)

VT= −Udiag(1, 1, 0)VT = −E .

Likewise, it is easy to show that [t×]R2 = E . Since E is only defined up toscale, both solutions are valid essential matrices.

Programming Assignments

13.12. Implement the geometric approach to projective scene estimation introduced inSection 13.2.1

13.13. Implement the algebraic approach to projective scene estimation introduced inSection 13.2.2.

13.14. Implement the factorization approach to projective scene estimation introducedin Section 13.4.1.

Page 68: 2461059 Computer Vision Solution Manual

C H A P T E R 14

Segmentation by Clustering

PROBLEMS

14.1. We wish to cluster a set of pixels using color and texture differences. The objectivefunction

Φ(clusters, data) =∑

i∈clusters

j∈i‘th cluster

(xj − ci)T (xj − ci)

used in Section 14.4.2 may be inappropriate — for example, color differences couldbe too strongly weighted if color and texture are measured on different scales.(a) Extend the description of the k-means algorithm to deal with the case of an

objective function of the form

Φ(clusters, data) =∑

i∈clusters

j∈i‘th cluster

(xj − ci)TS(xj − ci)

,

where S is an a symmetric, positive definite matrix.(b) For the simpler objective function, we had to ensure that each cluster con-

tained at least one element (otherwise we can’t compute the cluster center).How many elements must a cluster contain for the more complicated objectivefunction?

(c) As we remarked in Section 14.4.2, there is no guarantee that k-means gets toa global minimum of the objective function; show that it must always get to alocal minimum.

(d) Sketch two possible local minima for a k-means clustering method clusteringdata points described by a two-dimensional feature vector. Use an examplewith only two clusters for simplicity. You shouldn’t need many data points.You should do this exercise for both objective functions.

Solution

(a) Estimate the covariance matrix for the cluster and use its inverse.

(b) O(d2) where d is the dimension of the feature vector.

(c) The value is bounded below and it goes down at each step unless it is alreadyat a minimum with respect to that step.

(d) Do this with a symmetry; equilateral triangle with two cluster centers is theeasiest.

14.2. Read Shi and Malik (2000) and follow the proof that the normalized cut criteri-on leads to the integer programming problem given in the text. Why does thenormalized affinity matrix have a null space? Give a vector in its kernel.

Solution Read the paper.

68

Page 69: 2461059 Computer Vision Solution Manual

69

14.3. Show that choosing a real vector that maximises the expression

yT (D −W)y

yTDyis the same as solving the eigenvalue problem

D−1/2WWz = µz,

where z = D−1/2y.

Solution DAF suggests not setting this as an exercise, because he got it wrong(sorry!). The correct form would be: Show that choosing a real vector that max-imises the expression

yT (D −W)y

yTDyis the same as solving the eigenvalue problem

D−1/2WD−1/2z = µz,

where z = D1/2y. Of course, this requires that D have full rank, in which case onecould also solve

D−1Wy = λy

or simply the generalized eigenvalue problem,

Wy − λDy = 0

which Matlab will happily deal with.14.4. This exercise explores using normalized cuts to obtain more than two clusters. One

strategy is to construct a new graph for each component separately and call thealgorithm recursively. You should notice a strong similarity between this approachand classical divisive clustering algorithms. The other strategy is to look at eigen-vectors corresponding to smaller eigenvalues.(a) Explain why these strategies are not equivalent.(b) Now assume that we have a graph that has two connected components. De-

scribe the eigenvector corresponding to the largest eigenvalue.(c) Now describe the eigenvector corresponding to the second largest eigenvalue.(d) Turn this information into an argument that the two strategies for generating

more clusters should yield quite similar results under appropriate conditions;what are appropriate conditions?

Solution

(a) They would be equivalent if the matrix actually was block diagonal.

(b) It has zeros in the entries corresponding to one connected component.

(c) Could be anything; but there is another eigenvector which has zeros in theentries corresponding to the other connected component. This doesn’t haveto correspond to the second eigenvalue.

(d) Basically, if the graph is very close to block diagonal, the eigenvectors splitinto a family corresponding to the eigenvectors of the first block and theeigenvectors of the second block, which implies that for a graph that is closeenough to block diagonal (but what does this mean formally — we’ll duckthis bullet) the two strategies will be the same.

Page 70: 2461059 Computer Vision Solution Manual

70 Chapter 14 Segmentation by Clustering

Programming Assignments

14.5. Build a background subtraction algorithm using a moving average and experimentwith the filter.

14.6. Build a shot boundary detection system using any two techniques that appeal, andcompare performance on different runs of video.

14.7. Implement a segmenter that uses k-means to form segments based on color andposition. Describe the effect of different choices of the number of segments andinvestigate the effects of different local minima.

Page 71: 2461059 Computer Vision Solution Manual

C H A P T E R 15

Segmentation by Fitting a Model

PROBLEMS

15.1. Prove the simple, but extremely useful, result that the perpendicular distance froma point (u, v) to a line (a, b, c) is given by abs(au+ bv + c) if a2 + b2 = 1.

Solution Work with the squared distance; choose a point (x, y) on the line, andwe now wish to minimize (u− x)2 +(v− y)2 subject to ax+ by+ c = 0. This gives

2

(

(u− x)(v − y)

)

+ λ

(

ab

)

= 0,

which means that ((u − x), (v − y)) is parallel to the line’s normal, so (u, v) =(x, y) + λ(a, b). Now if a2 + b2 = 1, abs(λ) would be the distance, because (a, b) isa unit vector. But ax+ by+ c = 0, so au+ bv + c = −λ(a2 + b2) = −λ and we aredone.

15.2. Derive the eigenvalue problem

(

x2 − x x xy − x yxy − x y y2 − y y

)(

ab

)

= µ

(

ab

)

from the generative model for total least squares. This is a simple exercise —maximum likelihood and a little manipulation will do it — but worth doing rightand remembering; the technique is extremely useful.

Solution We wish to minimise∑

i(axi + byi + c)2 subject to a2 + b2 = 1. Thisyields

x2 xy x

y2 xy yx y 1

(

abc

)

+ λ

(

ab0

)

,

where λ is the Lagrange multiplier. Now substitute back the third row (which isxa+ yb+ c = 0) to get the result.

15.3. How do we get a curve of edge points from an edge detector that returns orientation?Give a recursive algorithm.

15.4. A slightly more stable variation of incremental fitting cuts the first few pixels andthe last few pixels from the line point list when fitting the line because these pixelsmay have come from a corner(a) Why would this lead to an improvement?(b) How should one decide how many pixels to omit?

Solution

(a) The first and last few are respectively the end of one corner and the beginningof the next, and tend to bias the fit.

(b) Experiment, though if you knew a lot about the edge detector and the lensyou might be able to derive an estimate.

71

Page 72: 2461059 Computer Vision Solution Manual

72 Chapter 15 Segmentation by Fitting a Model

15.5. A conic section is given by ax2 + bxy + cy2 + dx+ ey + f = 0.(a) Given a data point (dx, dy), show that the nearest point on the conic (u, v)

satisfies two equations:

au2 + buv + cv2 + du+ ev + f = 0

and

2(a− c)uv − (2ady + e)u+ (2cdx + d)v + (edx − ddy) = 0.

(b) These are two quadratic equations. Write u for the vector (u, v, 1). Now showthat we can write these equations as uTM1u = 0 and uTM2u = 0, forM1

andM2 symmetric matrices.(c) Show that there is a transformation T , such that T TM1T = Id and T TM2T

is diagonal.(d) Now show how to use this transformation to obtain a set of solutions to the

equations; in particular, show that there can be up to four real solutions.(e) Show that there are four, two, or zero real solutions to these equations.(f) Sketch an ellipse and indicate the points for which there are four or two solu-

tions.

Solution All this is straightforward algebra, except for (c) which gives a lot ofpeople trouble. M1 is symmetric, so can be reduced to a diagonal form by theeigenvector matrix and to the identity using the square roots of the eigenvalues.Now any rotation matrix fixes the identity; so I can use the eigenvector matrix ofM2 to diagonalize M2 while fixing M1 at the identity.

15.6. Show that the curve

(1− t21 + t2

,2t

1 + t2)

is a circular arc (the length of the arc depending on the interval for which theparameter is defined).(a) Write out the equation in t for the closest point on this arc to some data point

(dx, dy). What is the degree of this equation? How many solutions in t couldthere be?

(b) Now substitute s3 = t in the parametric equation, and write out the equationfor the closest point on this arc to the same data point. What is the degree ofthe equation? Why is it so high? What conclusions can you draw?

Solution Do this by showing that

(

1− t21 + t2

)2

+(

2t

1 + t2

)2

= 1.

(a) The normal is(

1− t21 + t2

,2t

1 + t2

)

,

so our equation is

(x− 1− t21 + t2

)2t

1 + t2+ (y − 2t

1 + t2)(−1− t2

1 + t2) = 0,

and if we clear denominators by multiplying both sides by (1+t2)2, the highestdegree term in t will have degree 4, so the answer is in principle 4. But if youexpand the sum out, you’ll find that the degree 4 and degree 3 terms cancel,and you’ll have a polynomial of degree 2.

Page 73: 2461059 Computer Vision Solution Manual

73

(b) It will have degree 6 in s; this is because the parametrisation allows eachpoint on the curve to have three different parameter values (the s value for

each t is t(1/3), and every number has three cube roots; it is very difficult inpractice to limit this sort of calculation to real values only).

15.7. Show that the viewing cone for a cone is a family of planes, all of which passthrough the focal point and the vertex of the cone. Now show the outline of a coneconsists of a set of lines passing through a vertex. You should be able to do thisby a simple argument without any need for calculations.

Solution The viewing cone for a surface consists of all rays through the focalpoint and tangent to the surface. Construct a line through the focal point and thevertex of the cone. Now construct any plane through the focal point that does notpass through the vertex of the cone. This second plane slices the cone in somecurve. Construct the set of tangents to this curve that pass through the focal point(which is on the plane by construction). Any plane that contains the first line andone of these tangents is tangent to the cone, and the set of such planes exhauststhe planes tangent to the cone and passing through the focal point. The outline isobtained by slicing this set of planes with another plane not lying on their sharedline, and so must be a set of lines passing through some common point.

Programming Assignments

15.8. Implement an incremental line fitter. Determine how significant a difference resultsif you leave out the first few pixels and the last few pixels from the line point list(put some care into building this, as it’s a useful piece of software to have lyingaround in our experience).

15.9. Implement a hough transform line finder.15.10. Count lines with an HT line finder - how well does it work?

Page 74: 2461059 Computer Vision Solution Manual

C H A P T E R 16

Segmentation and Fitting usingProbabilistic Methods

PROBLEMS

16.1. Derive the expressions of Section 16.1 for segmentation. One possible modificationis to use the new mean in the estimate of the covariance matrices. Perform anexperiment to determine whether this makes any difference in practice.

Solution The exercises here can be done in too many different ways to makemodel solutions helpful. Jeff Bilmes’ document, “A Gentle Tutorial of the EMAlgorithm and its Application to Parameter Estimation for Gaussian Mixture andHidden Markov Models,” which can be found on CiteSeer, gives a very detailedderivation of EM for Gaussian mixture models (which is the first problem).

16.2. Supply the details for the case of using EM for background subtraction. Would ithelp to have a more sophisticated foreground model than uniform random noise?

16.3. Describe using leave-one-out cross-validation for selecting the number of segments.

Programming Assignments

16.4. Build an EM background subtraction program. Is it practical to insert a ditherterm to overcome the difficulty with high spatial frequencies illustrated in Figure14.11?

16.5. Build an EM segmenter that uses color and position (ideally, use texture too) tosegment images; use a model selection term to determine how many segments thereshould be. How significant a phenomenon is the effect of local minima?

16.6. Build an EM line fitter that works for a fixed number of lines. Investigate theeffects of local minima. One way to avoid being distracted by local minima is tostart from many different start points and then look at the best fit obtained fromthat set. How successful is this? How many local minima do you have to searchto obtain a good fit for a typical data set? Can you improve things using a Houghtransform?

16.7. Expand your EM line fitter to incorporate a model selection term so that the fittercan determine how many lines fit a dataset. Compare the choice of AIC and BIC.

16.8. Insert a noise term in your EM line fitter, so that it is able to perform robust fits.What is the effect on the number of local minima? Notice that, if there is a lowprobability of a point arising from noise, most points will be allocated to lines, butthe fits will often be quite poor If there is a high probability of a point arising fromnoise, points will be allocated to lines only if they fit well. What is the effect ofthis parameter on the number of local minima?

16.9. Construct a RANSAC fitter that can fit an arbitrary (but known) number of linesto a given data set. What is involved in extending your fitter to determine the bestnumber of lines?

74

Page 75: 2461059 Computer Vision Solution Manual

C H A P T E R 17

Tracking with Linear DynamicModels

PROBLEMS

17.1. Assume we have a model xi = Dixi−1 and yi = MTi xi. Here the measurement

yi is a one-dimensional vector (i.e., a single number) for each i and xi is a k-dimensional vector. We say model is observable if the state can be reconstructedfrom any sequence of k measurements.(a) Show that this requirement is equivalent to the requirement that the matrix

[

M iDTi M i+1DTi DTi+1M i+2 . . .DTi . . .DTi+k−2M i+k−1

]

has full rank.(b) The point drifting in 3D, where M3k = (0, 0, 1), M3k+1 = (0, 1, 0), and

M3k+2 = (1, 0, 0) is observable.(c) A point moving with constant velocity in any dimension, with the observation

matrix reporting position only, is observable.(d) A point moving with constant acceleration in any dimension, with the obser-

vation matrix reporting position only, is observable.

Solution There is a typo in the indices here, for which DAF apologizes. A se-quence of k measurements is (yi, yi+1, . . . , yi+k−1). This sequence is

(MTi xi,M

Ti+1xi+1, . . . ,M

Ti+k−1xi+k−1).

Now xi+1 = Di+1xi etc., so the sequence is

(MTi xi,M

Ti+1Dixi, . . .MT

i+k−1Di+k−2Di+k−3 . . .Dixi)

, which is

MTi

MTi+1Di. . .

MTi+k−1Di+k−2Di+k−3 . . .Di

xi,

and if the rank of this matrix (which is the transpose of the one given, except forthe index typo) is k, we are ok. The rest are calculations.

17.2. A point on the line is moving under the drift dynamic model. In particular, wehave xi ∼ N(xi−1, 1). It starts at x0 = 0.(a) What is its average velocity? (Remember, velocity is signed.)

Solution 0.(b) What is its average speed? (Remember, speed is unsigned.)

Solution This depends (a) on the timestep and (b) on the number of stepsyou allow before measuring the speed (sorry - DAF). But the average of 1-stepspeeds is 1 if the timestep is 1.

75

Page 76: 2461059 Computer Vision Solution Manual

76 Chapter 17 Tracking with Linear Dynamic Models

(c) How many steps, on average, before its distance from the start point is greaterthan two (i.e., what is the expected number of steps, etc.?)

Solution This is finicky and should not have been set — don’t use it; sorry– DAF.

(d) How many steps, on average, before its distance from the start point is greaterthan ten (i.e., what is the expected number of steps, etc.)?

Solution This is finicky and should not have been set — don’t use it; sorry– DAF.

(e) (This one requires some thought.) Assume we have two nonintersecting inter-vals, one of length 1 and one of length 2; what is the limit of the ratio (averagepercentage of time spent in interval one)/ (average percentage of time spentin interval two) as the number of steps becomes infinite?

Solution This is finicky and should not have been set — don’t use it; sorry– DAF.

(f) You probably guessed the ratio in the previous question; now run a simulationand see how long it takes for this ratio to look like the right answer.

Solution This is finicky and should not have been set — sorry, DAF. Theanswer is 1/2, and a simulation will produce it, but will take quite a long timeto do so.

17.3. We said that

g(x; a, b)g(x; c, d) = g(x;ad+ cb

b+ d,bd

b+ d)f(a, b, c, d).

Show that this is true. The easiest way to do this is to take logs and rearrange thefractions.

17.4. Assume that we have the dynamics

xi ∼ N(dixi−1, σ2di);

yi ∼ N(mixi, σ2mi

).

(a) P (xi|xi−1) is a normal density with mean dixi−1 and variance σ2di. What is

P (xi−1|xi)?(b) Now show how we can obtain a representation of P (xi|yi+1, . . . ,yN ) using a

Kalman filter.

Solution

(a) We have xi = dixi−1 + ζ, where ζ is Gaussian noise with zero mean andvariance σ2

di. This means that xi−1 = (1/di)(xi − ζ) = xi/di + ξ, where ξ is

Gaussian noise with zero mean and variance σ2di/d2i .

(b) Run time backwards.

Programming Assignments

17.5. Implement a 2D Kalman filter tracker to track something in a simple video se-quence. We suggest that you use a background subtraction process and track theforeground blob. The state space should probably involve the position of the blob,its velocity, its orientation — which you can get by computing the matrix of secondmoments — and its angular velocity.

Page 77: 2461059 Computer Vision Solution Manual

77

17.6. If one has an estimate of the background, a Kalman filter can improve backgroundsubtraction by tracking illumination variations and camera gain changes. Imple-ment a Kalman filter that does this; how substantial an improvement does thisoffer? Notice that a reasonable model of illumination variation has the backgroundmultiplied by a noise term that is near one — you can turn this into linear dynamicsby taking logs.

Page 78: 2461059 Computer Vision Solution Manual

C H A P T E R 18

Model-Based Vision

PROBLEMS

18.1. Assume that we are viewing objects in a calibrated perspective camera and wishto use a pose consistency algorithm for recognition.(a) Show that three points is a frame group.(b) Show that a line and a point is not a frame group.(c) Explain why it is a good idea to have frame groups composed of different types

of feature.(d) Is a circle and a point not on its axis a frame group?

Solution

(a) We have a calibrated perspective camera, so in the camera frame we canconstruct the three rays through the focal point corresponding to each imagepoint. We must now slice these three rays with some plane to get a prescribedtriangle (the three points on the object). If there is only a discrete set of waysof doing this, we have a frame group, because we can recover the rotation andtranslation of the camera from any such plane. Now choose a point along ray1 to be the first object point. There are at most two possible points on ray2 that could be the second object point — see this by thinking about the 12edge of the triangle as a link of fixed length, and swinging this around thefirst point; it forms a sphere, which can intersect a line in at most two points.Choose one of these points. Now we have fixed one edge of our triangle inspace — can we get the third point on the object triangle to intersect the thirdimage ray? In general, no, because we can only rotate the triangle about the12 edge, which means the third point describes a circle; but a circle will not ingeneral intersect a ray in space, so we have to choose a special point for alongray 1 to be the object point. It follows that only a discrete set of choices arepossible, and we are done.

(b) We use a version of the previous argument. We have a calibrated perspectivecamera, and so can construct in the camera frame the plane and ray corre-sponding respectively to the image line and image point. Now choose a lineon the plane to be the object line. Can we find a solution for the object point?The object point could lie anywhere on a cylinder whose axis is the chosenline and whose radius is the distance from line to point. There are now twogeneral cases — either there is no solution, or there are two (where the rayintersects the cylinder). But for most lines where there are two solutions,slightly moving the line results in another line for which there are two solu-tions, so there is a continuous family of available solutions, meaning it can’tbe a frame group.

(c) Correspondence search is easier.

(d) Yes, for a calibrated perspective camera.

78

Page 79: 2461059 Computer Vision Solution Manual

79

18.2. We have a set of plane points P j ; these are subject to a plane affine transformation.Show that

det[

P iP jP k

]

det[

P iP jP l

]

is an affine invariant (as long as no two of i, j, k, and l are the same and no threeof these points are collinear).

Solution Write Qi =MP i for the affine transform of point P i. Now

det[

QiQjQk

]

det[

QiQjQl

] =det(M

[

P iP jP k

]

)

det(M[

P iP jP l

]

)=

det(M) det[

P iP jP k

]

det(M) det[

P iP jP l

] =det[

P iP jP k

]

det[

P iP jP l

]

18.3. Use the result of the previous exercise to construct an affine invariant for:(a) four lines,(b) three coplanar points,(c) a line and two points (these last two will take some thought).

Solution

(a) Take the intersection of lines 1 and 2 as P i, etc.

(b) Typo! can’t be done; sorry - DAF.

(c) Construct the line joining the two points; these points, with the intersectionbetween the lines, give three collinear points. The ratio of their lengths is anaffine invariant. Easiest proof: an affine transformation of the plane restrictedto this line is an affine transformation of the line. But this involves only scalingand translation, and the ratio of lengths is invariant to both.

18.4. In chamfer matching at any step, a pixel can be updated if the distances from someor all of its neighbors to an edge are known. Borgefors counts the distance from apixel to a vertical or horizontal neighbor as 3 and to a diagonal neighbor as 4 toensure the pixel values are integers. Why does this mean

√2 is approximated as

4/3? Would a better approximation be a good idea?18.5. One way to improve pose estimates is to take a verification score and then optimize

it as a function of pose. We said that this optimization could be hard particularlyif the test to tell whether a backprojected curve was close to an edge point was athreshold on distance. Why would this lead to a hard optimization problem?

Solution Because the error would not be differentiable — as the backprojectedoutline moved, some points would start or stop contributing.

18.6. We said that for an uncalibrated affine camera viewing a set of plane points, theeffect of the camera can be written as an unknown plane affine transformation.Prove this. What if the camera is an uncalibrated perspective camera viewing aset of plane points?

18.7. Prepare a summary of methods for registration in medical imaging other than thegeometric hashing idea we discussed. You should keep practical constraints in mind,and you should indicate which methods you favor, and why.

18.8. Prepare a summary of nonmedical applications of registration and pose consistency.

Programming Assignments

18.9. Representing an object as a linear combination of models is often represented asabstraction because we can regard adjusting the coefficients as obtaining the sameview of different models. Furthermore, we could get a parametric family of models

Page 80: 2461059 Computer Vision Solution Manual

80 Chapter 18 Model-Based Vision

by adding a basis element to the space. Explore these ideas by building a system formatching rectangular buildings where the width, height, and depth of the buildingare unknown parameters. You should extend the linear combinations idea to han-dle orthographic cameras; this involves constraining the coefficients to representrotations.

Page 81: 2461059 Computer Vision Solution Manual

C H A P T E R 19

Smooth Surfaces and Their Outlines

PROBLEMS

19.1. What is (in general) the shape of the silhouette of a sphere observed by a perspectivecamera?

Solution The silhouette of a sphere observed by a perspective camera is theintersection of the corresponding viewing cone with the image plane. By symmetry,this cone is circular and grazes the sphere along a circular occluding contour. Thesilhouette is therefore the intersection of a circular cone with a plane, i.e., a conicsection. For most viewing situations this conic section is an ellipse. It is a circlewhen the image plane is perpendicular to the axis of the cone. It may also be aparabola or even a hyperbola branch for extreme viewing angles.

19.2. What is (in general) the shape of the silhouette of a sphere observed by an ortho-graphic camera?

Solution Under orthographic projection, the viewing cone degenerates into aviewing cylinder. By symmetry this cylinder is circular. Since the image plane isperpendicular to the projection direction under orthographic projection, the silhou-ette of the sphere is the intersection of a circular cylinder with a plane perpendicularto its axis, i.e., a circle.

19.3. Prove that the curvature κ of a planar curve in a point P is the inverse of the radiusof curvature r at this point.Hint: Use the fact that tanu ≈ u for small angles.

Solution As P ′ approaches P the direction of the line PP ′ approaches that ofthe tangent T , and δs is, to first order, equal to the distance between P and P ′, itfollows that PM ≈ PP ′/ tan δθ ≈ δs/δθ. Passing to the limit, we obtain that thecurvature is the inverse of the radius of curvature.

19.4. Given a fixed coordinate system, let us identify points of E3 with their coordi-nate vectors and consider a parametric curve x : I ⊂ R → R3 not necessarilyparameterized by arc length. Show that its curvature is given by

κ =|x′ × x

′′||x′|3

, (19.1)

where x′ and x′′ denote, respectively, the first and second derivatives of x withrespect to the parameter t defining it.Hint: Reparameterize x by its arc length and reflect the change of parameters inthe differentiation.

Solution We can write

x′ =

d

dtx =

ds

dt

d

dsx =

ds

dtt,

81

Page 82: 2461059 Computer Vision Solution Manual

82 Chapter 19 Smooth Surfaces and Their Outlines

and

x′′ =

d

dtx′ =

d2s

dt2t+ (

ds

dt)2d

dst =

d2s

dt2t+ κ(

ds

dt)2n.

It follows that

x′ × x

′′ = κ(ds

dt)3b,

and since t and b have unit norm, we have indeed

κ =|x′ × x

′′||x′|3

.

19.5. Prove that, unless the normal curvature is constant over all possible directions, theprincipal directions are orthogonal to each other.

Solution According to Ex. 19.6 below, the second fundamental form is symmet-ric. If follows that the tangent plane admits an orthonormal basis formed by theeigenvectors of the associated linear map dN , and that the corresponding eigen-values are real (this is a general property of symmetric operators). Unless theyare equal, the orthonormal basis is essentially unique (except for swapping the twoeigenvectors or changing their orientation), and the two eigenvalues are the maxi-mum and minimum values of the second fundamental form (this is another generalproperty of quadratic forms, see chapter 3 for a proof that the maximum value ofa quadratic form is the maximum eigenvalue of the corresponding linear map). Itfollows that the principal curvatures are the two eigenvalues, and the principal di-rections are the corresponding eigenvectors, that are uniquely defined (in the senseused above) and orthogonal to each other unless the eigenvalues are equal, in whichcase the normal curvature is constant.

19.6. Prove that the second fundamental form is bilinear and symmetric.

Solution The bilinearity of the second fundamental form follows immediatelyfrom the fact that the differential of the Gauss map is linear. We remain quiteinformal in our proof of its symmetry. Given two directions u and v in the tangentplane of a surface S at some point P0, we pick a parameterization P : U×V ⊂ R2 →W ⊂ S of S in some neighborhoodW of P0 such that P (0, 0) = P0, and the tangentsto the two surface curves α and β respectively defined by P (u, 0) for u ∈ I andP (0, v) for v ∈ J are respectively u and v. We assume that this parameterizationis differentiable as many times as desired and abstain from justifying its existence.We omit the parameters from now on and assume that all functions are evaluatedin (0, 0). We use subscripts to denote partial derivatives, e.g., Puv denotes thesecond partial derivative of P with respect to u and v. The partial derivatives Puand Pv lie in the tangent plane at any point in W . Differentiating N · Pu = withrespect to v yields

Nv · Pu +N · Puv = 0.

Likewise, we have

Nu · Pv +N · Pvu = 0.

But since the cross derivatives are equal, we have

Nu · Pv = Nv · Pu,

Page 83: 2461059 Computer Vision Solution Manual

83

or equivalentlyv · dNu = u · dNv,

which shows that the second fundamental form is indeed symmetric.19.7. Let us denote by α the angle between the plane Π and the tangent to a curve Γ

and by β the angle between the normal to Π and the binormal to Γ, and by κ thecurvature at some point on Γ. Prove that if κa denotes the apparent curvature ofthe image of Γ at the corresponding point, then

κa = κcosβ

cos3 α.

(Note: This result can be found in Koenderink, 1990, p. 191.)Hint: Write the coordinates of the vectors t, n, and b in a coordinate system whosez-axis is orthogonal to the image plane, and use Eq. (19.6) to compute κa.

Solution Let us consider a particular point P0 on the curve and pick the coor-dinate system (i, j,k) so that (i, j) is a basis for the image plane with i along theprojection of the tangent t to the curve in P0. Given this coordinate system, letus now identify curve points with their coordinate vectors, and parameterize Γ byits arc length s in the neighborhood of P0. Let us denote by x : I ⊂ R → R3 thisparametric curve and by y : I ⊂ R→ R3 its orthographic projection. We omit theparameter s from now on and write

{

y = (x · i)i+ (x · j)j,y′ = (t · i)i+ (t · j)j,y′′ = κ[(n · i)i+ (n · j)j].

Recall that the curvature κa of y is, according to Ex. 19.4,

κa =|y′ × y

′′||y′|3

By construction, we have t = (cosα, 0, sinα). Thus y′ = cosαi. Now, if n =(a, b, c), we have b = t×n = (−b sinα, a sinα− c cosα, b cosα), and since the anglebetween the projection direction and b is β, we have b = cosβ/ cosα. It followsthat

y′ × y

′′ = κ[(t · i)(n · j)− (t · j)(n · i)]k = κb cosαk = κ cosβk.

Putting it all together we finally obtain

κa =| cosβ|| cos3 α|

,

but α can always be taken positive (just pick the appropriate orientation for i),and β can also be taken positive by choosing the orientation of Γ appropriately.The result follows.

19.8. Let κu and κv denote the normal curvatures in conjugated directions u and v ata point P , and let K denote the Gaussian curvature; prove that

K sin2 θ = κuκv ,

where θ is the angle between the u and v.

Page 84: 2461059 Computer Vision Solution Manual

84 Chapter 19 Smooth Surfaces and Their Outlines

Hint: Relate the expressions obtained for the second fundamental form in thebases of the tangent plane respectively formed by the conjugated directions andthe principal directions.

Solution Let us assume that u and v are unit vectors and write them in the basisof the tangent plane formed by the (unit) principal directions as u = u1e1 + u2e2

and v = v1e1 + v2e2. We have

II(u,u) = κu = κ1u21 + κ2u

22,

II(v,v) = κv = κ1v21 + κ2v

22 ,

II(u,v) = 0 = κ1u1v1 + κ2u2v2.

According to the third equation, we have κ2 = −κ1u1v1/u2v2. Substituting thisvalue in the first equation yields

κu = κ1u1

v2(u1v2 − u2v1) = κ1

u1

v2sin θ

since e1 and e2 form an orthonormal basis of the tangent plane and u has unitnorm. A similar line of reasoning shows that

κv = κ2v2

u1(u1v2 − u2v1) = κ2

v2

u1sin θ,

and we finally obtain κuκv = κ1κ2 sin2 θ = K sin2 θ.

19.9. Show that the occluding is a smooth curve that does not intersect itself.Hint: Use the Gauss map.

Solution Suppose that the occluding contour has a tangent discontinuity or aself intersection at some point P . In either case, two branches of the occludingcontour meet in P with distinct tangents u1 and u2, and the viewing directionv must be conjugated with both directions. But u1 and u2 form a basis of thetangent plane, and by linearity v must be conjugated with all directions of thetangent plane. This cannot happen until the point is planar, in which case dN iszero, or v is an asymptotic direction at a parabolic point, in which case dNv = 0.Neither situation occurs for generic observed from generic viewpoints.

19.10. Show that the apparent curvature of any surface curve with tangent t is

κa =κt

cos2 α,

where α is the angle between the image plane and t.Hint: Write the coordinates of the vectors t, n, and b in a coordinate system whosez axis is orthogonal to the image plane, and use Eq. (19.2) and Meusnier’s theorem.

Solution Let us denote by Γ the surface curve and by γ its projection. We assumeof course that the point P that we are considering lies on the occluding contour(even though the curve under consideration may not be the occluding contour).Since κt is a signed quantity, it will be necessary to give κa a meaningful sign toestablish the desired result. Let us first show that, whatever that meaning may be,we have indeed

|κa| =|κt|cos2 α

.

Page 85: 2461059 Computer Vision Solution Manual

85

We follow the notation of Ex. 19.7 and use the same coordinate system. Since Pis on the occluding contour, the surface normal N in P is also the normal to γ,and we must have N = ∓j. Let φ denote the angle between N and the principalnormal n to Γ. We must therefore have b = | cosφ| = cosβ/ cosα (since we havechosen our coordinate system so cosα ≥ 0 and cosβ ≥ 0), and it follows, accordingto Meusnier’s theorem and Ex. 19.7, that

|κa| = κcosβ

cos3 α= |κt cosφ|

cosβ

cos3 α=|κt|cos2 α

.

Let us now turn to giving a meaningful sign to κa and determining this sign. Byconvention, we take κa positive when the principal normal n′ to γ is equal to −N ,and negative when n′ = N .It is easy to show that with our choice of coordinate system, we always have n′ = j:Briefly, let us reparameterize y by its arc length s′, noting that, because of theforeshortening induced by the projection, ds′ = ds cosα. Using a line or reasoningsimilar to Ex. 19.7 but differentiating y with respect to s′, it is easy to show thatthe cross product of the tangent t′ = i and (principal) normal n′ to γ verify

t′ × (κ′n′) = κ

cosβ

cos3 αk,

where κ′ is the (nonnegative) curvature of γ. In particular, the vectors t′ = i,n′ = ∓j, and k must form a right-handed coordinate system, which implies n′ = j.Therefore we must take κa > 0 when N = −j, and κa < 0 when N = j. Supposethat N = −j, then cosφ = N · n = −j · n = −b must be negative, and byMeusnier’s theorem, κt must be positive. By the same token, when N = j, cosφmust be positive and κt must be negative. It follows that we can indeed take

κa =κt

cos2 α.

Note that when Γ is the occluding contour, the convention we have chosen forthe sign of the apparent curvature yields the expected result: κa is positive whenthe contour point is convex (i.e., its principal normal is (locally) inside the regionbounded by the image contour), and κa is negative when the point is concave.

Page 86: 2461059 Computer Vision Solution Manual

C H A P T E R 20

Aspect Graphs

PROBLEMS

20.1. Draw the orthographic and spherical perspective aspect graphs of the transparentFlatland object below along with the corresponding aspects.

Solution The visual events for the transparent object, along with the variouscells of the perspective and orthographic aspect graph, are shown below.

7

6

21

12

11

98

3

10

541

2 34

5

79

10

13

14

15

1617

8

12

11

18

19

20

6

Note that cells of the perspective aspect graph created by the intersection of visualevent rays outside of the box are not shown. Three of the perspective aspects areshown below. Note the change in the order of the contour points between aspects 1and 13, and the addition of two contour points as one goes from aspect 1 to aspect12.

20.2. Draw the orthographic and spherical perspective aspect graphs of the opaque objectalong with the corresponding aspects.

86

Page 87: 2461059 Computer Vision Solution Manual

87

Solution The visual events for the opaque object, along with the various cells ofthe perspective and orthographic aspect graph, are shown below.

7

3

21

8

6

4

5

1

2

4

3

8

5

76

910

11

13

1214

Note that cells of the perspective aspect graph created by the intersection of visualevent rays outside of the box are not shown. Three of the perspective aspects areshown below. Note the change in the order of the contour points between aspects1 and 13, and the addition of a single contour point as one goes from aspect 1 toaspect 12.

20.3. Is it possible for an object with a single parabolic curve (such as a banana) to haveno cusp of Gauss at all? Why (or why not)?

Solution No it is not possible. To see why, consider a nonconvex compact solid.It is easy to see that the surface bounding this solid must have at least one convexpoint, so there must exist a parabolic curve separating this point from the noncon-vex part of the surface. On the Gaussian sphere, as one crosses the image of thiscurve (the fold) at some point, the multiplicity of the sphere covering goes from kto k + 2 with k ≥ 1, or from k to k − 2 with k ≥ 3. Let us choose the directionof traversal so we go from k to k + 2. Locally, the fold separates layers k + 1 andk + 2 of the sphere covering (see diagram below).

Layer k+2

Layer

k k+2 k

kLayer k+1

If there is no cusp, the fold is a smooth closed curve that forms (globally) theboundary between layers k+1 and k+2 (the change in multiplicity cannot changealong the curve). But layer k + 1 must be connected to layer k by another foldcurve. Thus either the surface of a nonconvex compact solid admits cusps of Gauss,or it has at least two distinct parabolic curves.

Page 88: 2461059 Computer Vision Solution Manual

88 Chapter 20 Aspect Graphs

20.4. Use an equation-counting argument to justify the fact that contact of order sixor greater between lines and surfaces does not occur for generic surfaces. (Hint:Count the parameters that define contact.)

Solution A line has contact of order n with a surface when all derivatives oforder less than or equal to n − 1 of the surface are zero in the direction of theline. Ordinary tangents have order-two contact with the surface, and there isa three-parameter family of those (all tangent lines in the tangent planes of allsurface points); asymptotic tangents have order-three contact and there is a two-parameter family of those (the two asymptotic tangents at each saddle-shapedpoint); order-four contact occurs for the asymptotic tangents along flecnodal andparabolic curves; there are a finite number of order-five tangents at isolated pointsof the surface (including gutterpoints and cusps of Gauss); and finally there is ingeneral no order-six tangent.

20.5. We saw that the asymptotic curve and its spherical image have perpendicular tan-gents. Lines of curvature are the integral curves of the field of principal directions.Show that these curves and their Gaussian image have parallel tangents.

Solution Let us consider a line of curvature Γ parameterized by its arc length.Its unit tangent at some point P is by definition a principal direction ei in P . Letκi denote the corresponding principal curvature. Since principal directions are theeigenvectors of the differential of the Gauss map, the derivative of the unit surfacenormal along Γ is dNei = κiei. This is also the tangent to the Gaussian image ofthe principal curve, and the result follows.

20.6. Use the fact that the Gaussian image of a parabolic curve is the envelope of theasymptotic curves intersecting it to give an alternate proof that a pair of cusps iscreated (or destroyed) in a lip or beak-to-beak event.

Solution Lip and beak-to-beak events occur when the Gaussian image of the oc-cluding contour becomes tangent to the fold associated with a parabolic point. Letus assume that the fold is convex at this point (a similar reasoning applies when thefold is concave, but the situation becomes more complicated at inflections). Thereexists some neighborhood of the tangency point such that any great circle inter-secting the fold in this neighborhood will intersect it exactly twice. As illustratedby the diagram below, two of the asymptotic curve branches tangent to the fold atthe intersections admit a great circle bitangent to them.

circlesGreat

Gauss mapFold of the

Gaussian image of asymptotic curves

This great circle also intersects the fold exactly twice, and since it is tangent to theasymptotic curves, it is orthogonal to the corresponding asymptotic direction. Inother words, the viewing direction is an asymptotic direction at the correspondingpoints of the occluding contour, yielding two cusps of the image contour.

20.7. Lip and beak-to-beak events of implicit surfaces. It can be shown (Pae and Ponce,2001) that the parabolic curves of a surface defined implicitly as the zero set of somedensity function F (x, y, z) = 0 are characterized by this equation and P (x, y, z) = 0,

Page 89: 2461059 Computer Vision Solution Manual

89

where Pdef= ∇FTA∇F = 0,∇F is the gradient of F , andA is the symmetric matrix

A def=

FyyFzz − F 2yz FxzFyz − FzzFxy FxyFyz − FyyFxz

FxzFyz − FzzFxy FzzFxx − F 2xz FxyFxz − FxxFyz

FxyFyz − FyyFxz FxyFxz − FxxFyz FxxFyy − F 2xy

.

It can also be shown that the asymptotic direction at a parabolic point is A∇F .(a) Show that AH = Det(H)Id, where H denotes the Hessian of F .(b) Show that cusps of Gauss are parabolic points that satisfy the equation∇P TA∇F =

0. Hint: Use the fact that the asymptotic direction at a cusp of Gauss is tan-gent to the parabolic curve, and that the vector ∇F is normal to the tangentplane of the surface defined by F = 0.

(c) Sketch an algorithm for tracing the lip and beak-to-beak events of an implicitsurface.

Solution

(a) Note that the Hessian can be written as

H =(

h1 h2 h3

)

where h1 =

(

FxxFxyFxz

)

, h2 =

(

FxyFyyFyz

)

, h3 =

(

FxzFyzFzz

)

.

But, if ai denotes the ith coordinate of a, we can rewrite A as

A =

(h2 × h3)1 (h2 × h3)

2 (h2 × h3)3

(h3 × h1)1 (h3 × h1)

2 (h3 × h1)3

(h1 × h2)1 (h1 × h2)

2 (h1 × h2)3

=

(h2 × h3)T

(h3 × h1)T

(h1 × h2)T

,

and it follows that

AH =

(h2 × h3)T

(h3 × h1)T

(h1 × h2)T

(

h1 h2 h3

)

= Det(H)Id

since the determinant of the matrix formed by three vectors is the dot productof the first vector with the cross product of the other two vectors.

(b) The parabolic curve can be thought of as the intersection of the two surfacesdefined by F (x, y, z) = 0 and P (x, y, z) = 0. Its tangent lies in the intersectionof the tangent planes of these two surfaces and is therefore orthogonal tothe normals ∇F and ∇P . For a point to be a cusp of Gauss, this tangentmust be along the asymptotic direction A∇F , and we must therefore have∇FTA∇F = 0, which is automatically satisfied at a parabolic point, and∇PTA∇F = 0, which is the desired condition.

(c) To trace the lip and beak-to-beak events, simply use Algorithm 20.2 to tracethe parabolic curve defined in R3 by the equations F (x, y, z) = 0 and P (x, y, z) =0, computing for each point along this curve the vector A∇F as the corre-sponding asymptotic direction. The cusps of Gauss can be found by adding∇PTA∇F = 0 to these two equations and solving the corresponding systemof three polynomial equations in three unknowns using homotopy continua-tion.

Page 90: 2461059 Computer Vision Solution Manual

90 Chapter 20 Aspect Graphs

20.8. Swallowtail events of implicit surfaces. It can be shown that the asymptotic direc-tions a at a hyperbolic point satisfy the two equations ∇F ·a = 0 and aTHa = 0,where H denotes the Hessian of F . These two equations simply indicate that theorder of contact between a surface and its asymptotic tangents is at least equalto three. Asymptotic tangents along flecnodal curves have order-four contact withthe surface, and this is characterized by a third equation, namely

aTHxaaTHyaaTHza

· a = 0.

Sketch an algorithm for tracing the swallowtail events of an implicit surface.

Solution To trace the swallowtail events, one can use Algorithm 20.2 to tracethe curve defined in R6 by F (x, y, z) = 0, the three equations given above, and|a|2 = 1. There are of course six unknowns in that case, namely x, y, z, andthe three coordinates of a. The corresponding visual events are then given by thevalues of a along the curve. Alternatively, one can use a computer algebra systemto eliminate a among the three equations involving it. This yields an equationS(x, y, z) = 0, and the flecnodal curve can be traced as the zero set of F (x, y, z)and S(x, y, z). The corresponding visual events are then found by computing, foreach flecnodal point, the singular asymptotic directions as the solutions of the threeequations given above. Since these equations are homogeneous, a can be computedfrom any two of them. The singular asymptotic direction is the common solutionof two of the equation pairs.

20.9. Derive the equations characterizing the multilocal events of implicit surfaces. Youcan use the fact that, as mentioned in the previous exercise, the asymptotic direc-tions a at a hyperbolic point satisfy the two equations ∇F ·a = 0 and aTHa = 0.

Solution Triple points are characterized by the following equations in the posi-tions of the contact points xi = (xi, yi, zi)

T (i = 1, 2, 3):

{

F (xi) = 0, i = 1, 2, 3,(x1 − x2) · ∇F (xi) = 0, i = 1, 2, 3,(x2 − x1)× (x3 − x1) = 0.

Note that the vector equation involving the cross product is equivalent to twoindependent scalar equations, thus triple points correspond to curves defined in R9

by eight equations in nine unknowns.Tangent crossings correspond to curves defined in R6 by the following five equationsin the positions of the contact points x1 and x2:

{

F (xi) = 0, i = 1, 2,(x1 − x2) · ∇F (xi), i = 1, 2,(∇F (x1)×∇F (x2)) · (x2 − x1) = 0.

Finally, cusps crossings correspond to curves defined in R6 by the following fiveequations in the positions of the contact points x1 and x2:

F (xi) = 0, i = 1, 2,(x1 − x2) · ∇F (xi) = 0, i = 1, 2,

(x2 − x1)TH(x1)(x2 − x1) = 0,

Page 91: 2461059 Computer Vision Solution Manual

91

where the last equation simply expresses the fact that the viewing direction is anasymptotic direction of the surface in x1.

Programming Assignments

20.10. Write a program to explore multilocal visual events: Consider two spheres withdifferent radii and assume orthographic projection. The program should allow youto change viewpoint interactively as well as explore the tangent crossings associatedwith the limiting bitangent developable.

20.11. Write a similar program to explore cusp points and their projections. You haveto trace a plane curve.

Page 92: 2461059 Computer Vision Solution Manual

C H A P T E R 21

Range Data

PROBLEMS

21.1. Use Eq. (21.1) to show that a necessary and sufficient condition for the coordinatecurves of a parameterized surface to be lines of curvature is that f = F = 0.

Solution The principal directions satisfy the differential equation (21.1) repro-duced here for completeness:

0 =v′

2 −u′v′ u′2

E F Ge f g

= (Fg − fG)v′2 − u′v′(Ge− gE) + u′2(Ef − eF ).

The coordinate curves are lines of curvature when their tangents are along principaldirections. Equivalently, the solutions of Eq. (21.1) must be u′ = 0 and v′ = 0,which is in turn equivalent to Fg − fG = Ef − eF = 0. Clearly, this conditionis satisfied when f = F = 0. Conversely, when Fg − fG = Ef − eF = 0, eitherf = F = 0, or we can write e = λE, f = λF , and g = λG for some scalar λ 6= 0.In the latter case, the normal curvature in any direction t of the tangent plane isgiven by

κt =II(t, t)

I(t, t)=

eu′2+ 2fu′v′ + gv′

2

Eu′2+ 2Fu′v′ +Gv′

2= λ,

i.e., the normal curvature is independent of the direction in which it is measured(we say that such a point is an umbilic). In this case, the principal directionsare of course ill defined. It follows that a necessary and sufficient condition forthe coordinate curves of a parameterized surface to be lines of curvature is thatf = F = 0.

21.2. Show that the lines of curvature of a surface of revolution are its meridians andparallels.

Solution Let us consider a surface of revolution parameterized by x(θ, z) =(r(z) cos θ, r(z) sin θ, z)T , where r(z0) denotes the radius of the circle formed by theintersection of the surface with the plane z = z0. We have xθ = (−r sin θ, r cos θ, 0)Tand xz = (r′ cos θ, r′ sin θ, 1)T , thus F = xθ · xz = 0. Now, normalizing the cross

product of xθ and xz shows that N = (1/√

1 + r′2)(cos θ, sin θ,−r′)T . Finally, wehave xθz = (−r′ sin θ, r′ cos θ, 0)T , and it follows that f = −N ·xθz = 0. Accordingto the previous exercise, the lines of curvature of a surface of revolution are thusits coordinate curves—that is, its meridians and its parallels.

21.3. Step model: Compute zσ(x) = Gσ ∗ z(x), where z(x) is given by Eq. (21.2). Showthat z′′σ is given by Eq. (21.3). Conclude that κ′′σ/κ

′σ = −2δ/h in the point xσ

where z′′σ and κσ vanish.

Solution Recall that the step model is defined by

z =

{

k1x+ c when x < 0,k2x+ c+ h when x > 0.

92

Page 93: 2461059 Computer Vision Solution Manual

93

Convolving it with Gσ yields

zσ = c+h

σ√2π

∫ x

−∞

exp(− t2

2σ2)dt+kx+

δx

σ√2π

∫ x

0

exp(− t2

2σ2)dt+

δσ√2π

exp(− x2

2σ2),

and the first and second derivatives of zσ are respectively

z′σ = k +δ

σ√2π

∫ x

0

exp(− t2

2σ2)dt+

h

σ√2π

exp(− x2

2σ2)

and

z′′σ =1

σ√2π

(δ − hx

σ2) exp(− x2

2σ2).

The latter is indeed Eq. (21.3). Now, we have

κ′σ =z′′′σ

[1 + z′σ2]3/2− 3

z′σz′′σ

[1 + z′σ2]3.

Since z′′σ(xσ) = 0, the second term in the expression above vanishes in xσ. Now, thederivatives of the numerator and denominator of this term obviously also vanishin xσ since all the terms making them up contain z′′σ as a factor. Likewise, thederivative of the denominator of the first term vanished in xσ, and it follows thatκ′′σ/κ

′σ = z′′′′σ /z′′′σ at this point. Now, we can write z′′σ = a exp(−x2/2σ2), with

a = (δ − xh/σ2)/σ√2π. Therefore,

z′′′σ = (a′ − ax

σ2) exp(− x2

2σ2)

and

z′′′′σ = (a′′ − a

σ2(1− x2

σ2)− 2a′x

σ2) exp(− x2

2σ2).

Now, a′′ is identically zero, and a is zero in xσ. It follows that κ′′σ/κ

′σ = −2xσ/σ2 =

−2δ/h at this point.21.4. Roof model: Show that κσ is given by Eq. (21.4).

Solution Plugging the value h = 0 in the expressions for z′σ and z′′σ derived inthe previous exercise shows immediately that

κσ =1

σ√2π

δ exp(−x2/2σ2)[

1 +

(

k +δ

σ√2π

∫ x

0

exp(−t2/2σ2)dt

)2]3/2

,

and using the change of variable u = t/σ in the integral finally shows that

κσ =1

σ√2π

δ exp(− x2

2σ2)

1 +

(

k +δ√2π

∫ x/σ

0

exp(−u2

2)du

)2

3/2.

Page 94: 2461059 Computer Vision Solution Manual

94 Chapter 21 Range Data

21.5. Show that the quaternion q = cos θ2 + sin θ2u represents the rotation R of angle θ

about the unit vector u in the sense of Eq. (21.5).Hint: Use the Rodrigues formula derived in the exercises of chapter 3.

Solution Let us consider some vector α in R3, define β = qαq, and show thatβ is the vector of R3 obtained by rotating α about the vector u by an angle θ.Recall that the quaternion product is defined by

(a+ α)(b+ β)def= (ab− α · β) + (aβ + bα+ α× β).

Thus,

β = [cos θ2 + sin θ2u]α[cos

θ2 − sin θ

2u]

= [cos θ2 + sin θ2u][sin

θ2 (α · u) + cos θ2α− sin θ

2α× u]

= cos2 θ2α+ sin2 θ

2 (u · α)u− 2 sin θ2 cos θ2 (α× u)− sin2 θ

2u× (α× u)

= cos2 θ2α+ sin2 θ

2 (u · α)u− 2 sin θ2 cos θ2 (α× u) + sin2 θ

2 [u×]2α.

But remember from chapter 13 that [u×]2 = uuT − |u|2Id, so

β = [cos2 θ2 − sin2 θ

2 ]α− 2 sin θ2 cos θ2 (α× u) + 2 sin2 θ

2 (u ·α)u= cos θα+ sin θ(u× α) + (1− cos θ)(u · α)u,

which is indeed the Rodrigues formula for a rotation of angle θ about the unitvector u.

21.6. Show that the rotation matrix R associated with a given unit quaternion q = a+α

with α = (b, c, d)T is given by Eq. (21.6).

Solution First, note that any unit quaternion can be written as q = a+α wherea = cos θ2 and α = sin θ

2uT for some angle θ and unit vector u (this is because

a2 + |α|2 = 1). Now, to derive Eq. (21.6), i.e.,

R =

a2 + b2 − c2 − d2 2(bc− ad) 2(bd+ ac)

2(bc+ ad) a2 − b2 + c2 − d2 2(cd− ab)2(bd− ac) 2(cd+ ab) a2 − b2 − c2 + d2

,

all we need to do is combine the result established in the previous exercise withthat obtained in Ex. 3.7, that states that if u = (u, v, w)T , then

R =

u2(1− cos θ) + cos θ uv(1− cos θ)− w sin θ uw(1− cos θ) + v sin θ

uv(1− cos θ) + w sin θ v2(1− cos θ) + cos θ vw(1− cos θ)− u sin θuw(1− cos θ)− v sin θ vw(1− cos θ) + u sin θ w2(1− cos θ) + cos θ

.

Showing that the entries of both matrices are the same is a (slightly tedious) exercisein algebra and trigonometry.Let us first show that the two top left entries are the same. Note that |α|2 =b2 + c2 + d2 = sin2 θ

2 |u|2 = sin2 θ2 ; it follows that

a2+b2−c2−d2 = a2+2b2−sin2 θ

2= cos2

θ

2−sin2 θ

2+2 sin2 θ

2u2 = cos θ+u2(1−cos θ).

The exact same type of resoning applies to the other diagonal entries.Let us not consider the entries corresponding to the first row and the second columnof the two matrices. We have

2(bc− ad) = 2uv sin2 θ

2− 2 cos

θ

2sin

θ

2w = uv(1− cos θ)− w sin θ,

Page 95: 2461059 Computer Vision Solution Manual

95

so the two entries do indeed coincide. The exact same type of resoning applies tothe other non-diagonal entries.

21.7. Show that the matrix Ai constructed in Section 21.3.2 is equal to

Ai =(

0 yTi − y′iT

y′i − yi [yi + y′i]×

)

.

Solution When q = a+ α is a unit quaternion, we can write

y′iq− qyi = (−y′i · α+ ay′i + y′i × α)− (−yi · α+ ayi + α× yi)= (yi − y′i) · α+ a(y′i − yi) + (yi + y′i)× α

=

(

0 yTi − y′iT

y′i − yi [yi + y′i]×

)(

)

= Aiq,

where q has been identified with the 4-vector whose first coordinate is a and theremaining coordinates are those of α. In particular, we have

E =

n∑

i=1

|y′iq− qyi|2 =

n∑

i=1

|Aiq|2 = qTBq, where B =

n∑

i=1

ATi Ai.

21.8. As mentioned earlier, the ICP method can be extended to various types of geometricmodels. We consider here the case of polyhedral models and piecewise parametricpatches.(a) Sketch a method for computing the point Q in a polygon that is closest to

some point P .(b) Sketch a method for computing the pointQ in the parametric patch x : I×J →

R3 that is closest to some point P . Hint: Use Newton iterations.

Solution

(a) Let A denote the polygon. Construct the orthogonal projection Q of P intothe plane that contains A. Test whether Q lies inside A. When A is convex,this can be done by testing whether Q is on the “right” side of all the edgesof A. When A is not convex, one can accumulate the angles between Q andthe successive vertices of A. The point will be inside the polygon when theseangles add to 2π, on the boundary when they add to π, and outside whenthey add to 0. Both methods take linear time. If Q is inside A, it is the closestpoint to P . If it is outside, the closest point must either lie in the interiorof one of the edges (this is checked by projecting P onto the line supportingeach edge) or be one of the vertices, and it can be found in linear time as well.

(b) Just as in the polygonal case, the shortest distance is reached either in theinterior of the patch or on its boundary. We only detail the case wherethe closest point lies inside the patch, since the case where it lies in theinterior of a boundare curve is similar, and the case where it is a vertex isstraightforward. As suggested, starting from some point x(u, v) inside thepatch, we can use Newton iterations to find the closest point, which is theorthogonal projection of P onto the patch. Thus we seek a zero of the (vector)function f(u, v) = N(u, v) × (x(u, v) − P ), where P denotes the coordinatevector of the point P . The Jacobian J of f is easily computed as a function

Page 96: 2461059 Computer Vision Solution Manual

96 Chapter 21 Range Data

of first- and second-order derivatives of x. The increment (δu, δv) is thencomputed as usual by solving

J(

δuδv

)

= −f .

Note that this is an overconstrained system of equations (f has three compo-nents). It can be solved by discarding one of the redundant equations or byusing the pseudoinverse of J to solve for δu and δv.

21.9. Develop a linear least-squares method for fitting a quadric surface to a set of pointsunder the constraint that the quadratic form has unit Frobenius form.

Solution The equation of a quadric surface can be written as

a200x2 + 2a110xy + a020y

2 + 2a011yz + a002z2 + 2a101xz

+2a100x+ 2a010y + 2a001z + a000 = 0,

or

PTQP = 0, where Q =

a200 a110 a101 a100

a110 a020 a011 a010

a101 a011 a002 a001

a100 a010 a001 a000

.

(Note that this is slightly different from the parameterization of quadric surfacesintroduced in chapter 2. The parameterization used here facilitates the use of aconstraint on the Frobenius form of A.)Suppose we observe n points with homogeneous coordinate vectors P 1, . . ., P n.To fit a quadric surface to these points, we minimize

i=1(PTi QP i)

2 with respectto the 10 coefficients defining the symmetric matrix Q, under the constraint thatQ has unit Frobenius norm.This is equivalent to minimizing |Aq|2 under the constraint |q|2 = 1, where

A def=

x21 2x1y1 y2

1 2y1z1 z21 2x1z1 2x1 2y1 2z1 1

x22 2x2y2 y2

2 2y2z2 z22 2x2z2 2x2 2y2 2z2 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

x2n 2xnyn y2

n 2ynzn z2n 2xnzn 2xn 2yn 2zn 1

and qdef= (a200, a110, a020, a011, a002, a101, a100, a010, a001, a000)

T .This is a homogeneous linear least-squares problem that can be solved using theeigenvalue/eigenvector methods described in chapter 3.

21.10. Show that a surface triangle maps onto a patch with hyperbolic edges in α, βspace.

Solution Consider a triangle edge with extremities Q1 and Q2. Any point alongthis edge can be written as Q = (1 − t)Q1 + tQ2. If the corresponding spincoordinates are α and β, we have

β =−−→PQ · n = [(1− t)−−→PQ1 + t

−−→PQ2] · n = (1− t)β1 + tβ2,

where β1 and β2 are the β spin coordinates of Q1 and Q2.Now we have

α = |−−→PQ× n| = |[(1− t)−−→PQ1 + t−−→PQ2]× n| = |(1− t)a1 + ta2|,

Page 97: 2461059 Computer Vision Solution Manual

97

where a1 =−−→PQ1 × n and a2 =

−−→PQ2 × n. Computing t as a function of β and

substituting in this expression for α yields

α = | β2 − ββ2 − β1

a1 +β − β1

β2 − β1a2|.

Squaring this equation and clearing the denominators yields

(β2 − β1)2α2 − |a1 − a2|2β2 + λβ + µ = 0,

where λ and µ are constants. This is indeed the equation of a hyperbola.

Programming Assignments

21.11. Implement molecule-based smoothing and the computation of principal directionsand curvatures.

21.12. Implement the region-growing approach to plane segmentation described in thischapter.

21.13. Implement an algorithm for computing the lines of curvature of a surface from itsrange image. Hint: Use a curve-growing algorithm analogous to the region-growingalgorithm for plane segmentation.

21.14. Implement the Besl–McKay ICP registration algorithm.21.15. Marching squares in the plane: Develop and implement an algorithm for finding

the zero set of a planar density function. Hint: Work out the possible ways a curvemay intersect the edges of a pixel, and use linear interpolation along these edgesto identify the zero set.

21.16. Implement the registration part of the Faugeras–Hebert algorithm.

Page 98: 2461059 Computer Vision Solution Manual

C H A P T E R 22

Finding Templates using Classifiers

PROBLEMS

22.1. Assume that we are dealing with measurements x in some feature space S. Thereis an open set D where any element is classified as class one, and any element inthe interior of S −D is classified as class two.(a) Show that

R(s) = Pr {1→ 2|using s}L(1→ 2) + Pr {2→ 1|using s}L(2→ 1)

=

D

p(2|x)dxL(1→ 2) +

S−D

p(1|x)dxL(2→ 1).

(b) Why are we ignoring the boundary of D (which is the same as the boundaryof S −D) in computing the total risk?

Solution

(a) Straightforward.

(b) Boundary has measure zero.

22.2. In Section 22.2, we said that if each class-conditional density had the same covari-ance, the classifier of Algorithm 22.2 boiled down to comparing two expressionsthat are linear in x.(a) Show that this is true.(b) Show that if there are only two classes, we need only test the sign of a linear

expression in x.22.3. In Section 22.3.1, we set up a feature u, where the value of u on the ith data point

is given by ui = v · (xi − µ). Show that u has zero mean.

Solution The mean of ui is the mean of v · (xi − µ) which is the mean of v · ximinus v · µ; but the mean of v ·xi is v dotted with the mean of xi, which mean isµ.

22.4. In Section 22.3.1, we set up a series of features u, where the value of u on theith data point is given by ui = v · (xi − µ). We then said that the v would beeigenvectors of Σ, the covariance matrix of the data items. Show that the differentfeatures are independent using the fact that the eigenvectors of a symmetric matrixare orthogonal.

22.5. In Section 22.2.1, we said that the ROC was invariant to choice of prior. Provethis.

Programming Assignments

22.6. Build a program that marks likely skin pixels on an image; you should compare atleast two different kinds of classifier for this purpose. It is worth doing this carefullybecause many people have found skin filters useful.

22.7. Build one of the many face finders described in the text.

98

Page 99: 2461059 Computer Vision Solution Manual

C H A P T E R 23

Recognition by Relations BetweenTemplates

99

Page 100: 2461059 Computer Vision Solution Manual

C H A P T E R 24

Geometric Templates from SpatialRelations

PROBLEMS

24.1. Defining a Brooks transform: Consider a 2D shape whose boundary is the curve Γdefined by x : I → R2 and parameterized by arc length. The line segment joining

any two points x1def= x(s1) and x2

def= x(s2) on Γ defines a cross-section of the

shape, with length l(s1, s2) = |x1−x2|. We can thus reduce the problem of studyingthe set of cross-sections of the shape to the study of the topography of the surface Sassociated with the height function h : I2 → R+ defined by h(s1, s2) =

12 l(s1, s2)

2.In this context, the ribbon associated with Γ can be defined (Ponce et al., 1999) asthe set of cross-sections whose end-points correspond to valleys of S, (i.e., accordingto Haralick, 1983 or Haralick, Watson, and Laffey, 1983, the set of pairs (s1, s2)where the gradient ∇h of h is an eigenvector of the Hessian H, and where theeigenvalue associated with the other eigenvector of the Hessian is positive).Let u denote the unit vector such that x1 − x2 = lu. We denote by ti the unittangent in xi (i = 1, 2), and by θi and κi, respectively, the angle between thevectors u and ti, and the curvature in xi. Show that the ribbon associated with Γis the set of cross-sections of this shape whose endpoints satisfy

(cos2 θ1 − cos2 θ2) cos(θ1 − θ2) + l cos θ1 cos θ2(κ1 sin θ1 + κ2 sin θ2) = 0.

Solution If v is a unit vector such that u and v form a right-handed orthonornalcoordinate system for the plane, we can write ti = cos θiu + sin θiv and ni =− sin θiu+ cos θiv for i = 1, 2. The gradient of h is

∇h =

(

t1 · (x1 − x2)−t2 · (x1 − x2)

)

= l

(

cos θ1− cos θ2

)

,

and its Hessian is

H =

(

1 + κ1n1 · (x1 − x2) −t1 · t2−t1 · t2 1− κ2n2 · (x1 − x2)

)

=

(

1− lκ1 sin θ1 − cos(θ1 − θ2)− cos(θ1 − θ2) 1 + lκ2 sin θ2

)

.

We now write that the gradient is an eigenvector of the Hessian, or if “×” denotesthe operator associating with two vectors in R2 the determinant of their coordi-nates,

0 = (H∇h)×∇h

= l2(

(1− lκ1 sin θ1) cos θ1 + cos θ2 cos(θ1 − θ2)− cos θ1 cos(θ1 − θ2)− (1 + lκ2 sin θ2) cos θ2

)

×(

cos θ1− cos θ2

)

,

100

Page 101: 2461059 Computer Vision Solution Manual

101

which simplifies immediately into

(cos2 θ1 − cos2 θ2) cos(θ1 − θ2) + l cos θ1 cos θ2(κ1 sin θ1 + κ2 sin θ2) = 0.

24.2. Generalized cylinders: The definition of a valley given in the previous exerciseis valid for height surfaces defined over n-dimensional domains and valleys formcurves in any dimension. Briefly explain how to extend the definition of ribbonsgiven in that exercise to a new definition for generalized cylinders. Are difficultiesnot encountered in the two-dimensional case to be expected?

Solution Following the ideas from the previous exercise, it is possible to definethe generalized cylinder associated with a volume V by the valleys of a heightfunction defined over a three-dimensional domain: for example, we can pick someparameterization Π of the three-dimensional set of all planes by three parameters(s1, s2, s3), and define h(s1, s2, s3) as the area of the region where V and the planeΠ(s1, s2, s3) intersect. The valleys (and ridges) of this height function are charac-terized as before by (H∇h) × ∇h = 0, where “×” denotes this time the operatorassociating with two vectors their cross product. They form a one-dimensional setof cross-sections of V that can be taken as the generalized cylinder description ofthis volume.There are some difficulties with this definition that are not encountered in the two-dimensional case: In particular, there is no natural parameterization of the cross-sections of a volume by the points on its boundary, and the valleys found usinga plane parameterization depend on the choice of this parameterization. More-over, the cross-section of a volume by a plane may consist of several connectedcomponents. See Ponce et al. (1999) for a discussion.

24.3. Skewed symmetries: A skewed symmetry is a Brooks ribbon with a straight ax-is and generators at a fixed angle θ from the axis. Skewed symmetries play animportant role in line-drawing analysis because it can be shown that a bilaterallysymmetric planar figure projects onto a skewed symmetry under orthographic pro-jection (Kanade, 1981). Show that two contour points P1 and P2 forming a skewedsymmetry verify the equation

κ2

κ1= −

[

sinα2

sinα1

]3

,

where κi denotes the curvature of the skewed symmetry’s boundary in Pi (i = 1, 2),and αi denotes the angle between the line joining the two points and the normalto this boundary (Ponce, 1990).Hint: Construct a parametric representation of the skewed symmetry.

Solution Given two unit vectors u and v separated by an angle of θ, we param-eterize a skewed symmetry by

{

x1 = sv − r(s)u,x2 = sv + r(s)u,

where u is the generator direction, v is the skew axis direction, and x1 and x2

denote the two endpoints of the ribbon generators. Differentiating x1 and x2 withrespect to s yields

{

x′1 = v − r′u,x′2 = v + r′u,

Page 102: 2461059 Computer Vision Solution Manual

102 Chapter 24 Geometric Templates from Spatial Relations

and{

x′′1 = −r′′u,x′′2 = r′′u.

Let us define αi as the (unsigned) angle between the normal in xi (i = 1, 2) andthe line joining the two points x1 and x2. We have

sinα1 =1

|x′1||u× x′1| =

sin θ√

1− 2r′ cos θ + r′2,

sinα2 =1

|x′2||u× x′2| =

sin θ√

1 + 2r′ cos θ + r′2,

where “×” denotes the operator associating with two vectors in R2 the determinantof their coordinates.Now remember from Ex. 19.4 that the curvature of a parametric curve is κ =|x′ × x′′|/|x′|3. Using the convention that the curvature is positive when theribbon boundary is convex, we obtain

κ1 =− r′′ sin θ

(1− 2r′ cos θ + r′2)3/2,

κ2 =r′′ sin θ

(1 + 2r′ cos θ + r′2)3/2,

and the result follows immediately.

Programming Assignments

24.4. Write an erosion-based skeletonization program. The program should iterativelyprocess a binary image until it does not change anymore. Each iteration is dividedinto eight steps. In the first one, pixels from the input image whose 3 × 3 neigh-borhood matches the left pattern below (where “*” means that the correspondingpixel value does not matter) are assigned a value of zero in an auxiliary image; allother pixels in that picture are assigned their original value from the input image.

00 0

* 1 *

1 1 1

0 0

0 1 1

1 1

*

*

The auxiliary picture is then copied into the input image, and the process is re-peated with the right pattern. The remaining steps of each iteration are similarand use the six patterns obtained by consecutive 90-degree rotations of the originalones. The output of the program is the 4-connected skeleton of the original region(Serra, 1982).

24.5. Implement the FORMS approach to skeleton detection.24.6. Implement the Brooks transform.24.7. Write a program for finding skewed symmetries. You can implement either (a)

a naive O(n2) algorithm comparing all pairs of contour points, or (b) the O(kn)projection algorithm proposed by Nevatia and Binford (1977). The latter methodcan be summarized as follows: Discretize the possible orientations of local ribbonaxes; for each of these k directions, project all contour points into buckets andverify the local skewed symmetry condition for points within the same bucket only;finally, group the resulting ribbon pairs into ribbons.

Page 103: 2461059 Computer Vision Solution Manual

C H A P T E R 25

Application: Finding in DigitalLibraries

103

Page 104: 2461059 Computer Vision Solution Manual

C H A P T E R 26

Application: Image-Based Rendering

PROBLEMS

26.1. Given n+1 points P0, . . . , Pn, we recursively define the parametric curve P ki (t) by

P 0i (t) = Pi and

P ki (t) = (1− t)P k−1i (t) + tP k−1

i+1 (t) for k = 1, . . . , n and i = 0, . . . , n− k.

We show in this exercise that Pn0 (t) is the Bezier curve of degree n associated withthe n + 1 points P0, . . . , Pn. This construction of a Bezier curve is called the de

Casteljeau algorithm.(a) Show that Bernstein polynomials satisfy the recursion

b(n)i (t) = (1− t)b(n−1)

i (t) + tb(n−1)i−1 (t)

with b(0)0 (t) = 1 and, by convention, b

(n)j (t) = 0 when j < 0 or j > n.

(b) Use induction to show that

P ki (t) =

k∑

j=0

b(k)j (t)Pi+j for k = 0, . . . , n and i = 0, . . . , n− k.

Solution Let us recall that the Bernstein polynomials of degree n are defined by

b(n)i (t)

def=(

ni

)

ti(1− t)n−i (i = 0, . . . , n).

(a) Writing

(1− t)b(n−1)i (t) + tb

(n−1)i−1 (t) =

(

n−1i

)

ti(1− t)n−i +(

n−1i−1

)

ti(1− t)n−i

=(n− 1)!

(n− i)!i! [(n− i) + i]ti(1− t)n−i = b(n)i (t).

shows that the recursion is satisfied when i > 0 and i < n. It also holdswhen i = 0 since, by definition, b

(n)0 (t) = (1 − t)n = (1 − t)b(n−1)

0 (t) and,

by convention, b(n−1)−1 (t) = 0. Likewise, the recursion is satisfied when i = n

since, by definition, b(n)n (t) = tn = tb

(n−1)n−1 (t) and, by convention, b

(n−1)n (t) =

0.

(b) The induction hypothesis is obviously true for k = 0 since, by definition,P 0i (t) = Pi for i = 0, . . . , n. Suppose it is true for k = l − 1. We have, by

definition,P li (t) = (1− t)P l−1

i (t) + tP l−1i+1 (t).

Thus, according to the induction hypothesis,

P li (t) = (1− t)∑l−1

j=0 b(l−1)j (t)Pi+j + t

∑l−1j=0 b

(l−1)j (t)Pi+1+j

= (1− t)∑l−1

j=0 b(l−1)j (t)Pi+j + t

∑lm=1 b

(l−1)m−1 (t)Pi+m,

104

Page 105: 2461059 Computer Vision Solution Manual

105

where we have made the change of variables m = j+1 in the second summa-

tion. Using (a) and the convention that b(n)j (t) = 0 when j < 0 or j > n, we

obtain

P li (t) =

l∑

j=0

[(1− t)b(l−1)j (t) + tb

(l−1)j−1 (t)]Pi+j =

l∑

j=0

b(l)j (t)Pi+j ,

which concludes the inductive proof. In particular, picking k = n and i = 0shows that Pn0 (t) is the the Bezier curve of degree n associated with the n+1points P0, . . . , Pn.

26.2. Consider a Bezier curve of degree n defined by n+1 control points P0, . . . , Pn. Weaddress here the problem of constructing the n+ 2 control points Q0, . . . , Qn+1 ofa Bezier curve of degree n + 1 with the same shape. This process is called degree

elevation. Show that Q0 = P0 and

Qj =j

n+ 1Pj−1 + (1− j

n+ 1)Pj for j = 1, . . . , n+ 1.

Hint: Write that the same point is defined by the barycentric combinations associ-ated with the two curves, and equate the polynomial coefficients on both sides ofthe equation.

Solution We write

P (t) =

n∑

j=0

(

n

j

)

tj(1− t)n−jPj =n+1∑

j=0

(

n+ 1

j

)

tj(1− t)n+1−jQj .

To equate the polynomial coefficients of both expressions for P (t), we multiply eachBernstein polynomial in the first expression by t+1− t = 1. With the usual changeof variables in the second line below, this yields

P (t) =∑n

j=0

(

nj

)

tj+1(1− t)n−jPj +∑n

j=0

(

nj

)

tj(1− t)n+1−jPj

=∑n+1

k=1

(

nk−1

)

tk(1− t)n+1−kPk−1 +∑n

j=0

(

nj

)

tj(1− t)n+1−jPj

= (1− t)n+1P0 + tn+1Pn +∑n

j=1[tj(1− t)n+1−j ][

(

nj−1

)

Pj−1 +(

nj

)

Pj ]

Note that P (0) is the first control point of both arcs, so P0 = Q0. Likewise,P (1) is the last control point, so Pn = Qn+1. This is confirmed by examining thepolynomial coefficients corresponding to j = 0 and j = n+1 in the two expressionsof P (t). Equating the remaining coefficients yields

(

n+ 1

j

)

Qj =

(

n

j − 1

)

Pj−1 +

(

n

j

)

Pj for j = 1, . . . , n,

or(n+ 1)!

(n+ 1− j)!j!Qj =n!

(n+ 1− j)!(j − 1)!Pj−1 +

n!

(n− j)!j!Pj .

This can finally be rewritten as

Qj =j

n+ 1Pj−1 +

n+ 1− jn+ 1

Pj =j

n+ 1Pj−1 + (1− j

n+ 1)Pj .

Note that this is indeed a barycentric combination, which justifies our calculations,and that this expression is valid for j > 0 and j < n+ 1. It is in fact also valid forj = n+ 1 since, as noted before, we have Pn = Qn+1.

Page 106: 2461059 Computer Vision Solution Manual

106 Chapter 26 Application: Image-Based Rendering

26.3. Show that the tangent to the Bezier curve P (t) defined by the n+1 control pointsP0, . . . , Pn is

P ′(t) = n

n−1∑

j=0

b(n−1)j (t)(Pj+1 − Pj).

Conclude that the tangents at the endpoints of a Bezier arc are along the first andlast line segments of its control polygon.

Solution Writing the tangent as

P ′(t) =∑n

j=0 b(n)j

′(t)Pj

=∑n

j=1

(

nj

)

jtj−1(1− t)n−jPj +∑n−1

j=0

(

nj

)

(n− j)tj(1− t)n−j−1Pj

= n∑n

j=1

(

n−1j−1

)

tj−1(1− t)n−jPj + n∑n−1

j=0

(

n−1j

)

tj(1− t)n−j−1Pj

= n∑n−1

k=0

(

n−1k

)

tk(1− t)n−k−1Pk+1 + n∑n−1

j=0

(

n−1j

)

tj(1− t)n−j−1Pj

= n∑n−1

j=0

(

n−1j

)

tj(1− t)n−1−1(Pj+1 − Pj)= n

∑n−1j=0 b

(n−1)j (t)(Pj+1 − Pj)

proves the result (note the change of variable k = j − 1 in the fourth line of theequation).The tangents at the endpoints of the arc correspond to t = 0 and t = 1. Since

b(n−1)j (0) = 0 for j > 0, b

(n−1)j (1) = 0 for j < n−1, b(n−1)

0 (0) = 1, and b(n−1)n−1 (1) =

1, we conclude that

P ′(0) = n(P1 − P0) and P ′(1) = n(Pn − Pn−1),

which shows that the tangents at the endpoints of the Bezier arc along the firstand last line segments of its control polygon.

26.4. Show that the construction of the points Qi in Section 26.1.1 places these pointsin a plane that passes through the centroid O of the points Ci

Solution First it is easy to show that the points Qi are indeed barycentric com-binations of the points Cj : This follows immediately from the fact that, due to theregular and symmetric sampling of angles in the linear combination defining Qi,the sum of the cosine terms is zero. Now let us shows that the points Qi are copla-nar, and more precisely, that any of these points can be written as a barycentriccombination of the points

Q1 =∑p

j=11p

{

1 + cosπ

pcos

(

[2(j − 1)− 1]π

p

)}

Cj ,

Qp−1 =∑p

j=11p

{

1 + cosπ

pcos

(

[2(j + 1)− 1]π

p

)}

Cj ,

Qp =∑p

j=11p

{

1 + cosπ

pcos

(

[2j − 1]π

p

)}

Cj .

We write

Qi =

p∑

j=1

1

p

{

1 + cosπ

pcos

(

[2(j − i)− 1]π

p

)}

Cj = aQ1 + bQ2 + cQ3,

Page 107: 2461059 Computer Vision Solution Manual

107

with a+ b+ c = 1. If θij = [2(j − i)− 1]π/p, this equation can be rewritten as

cos θij = a cos[2(j − 1)− 1]π

p+ b cos[2(j + 1)− 1]

π

p+ c cos[2j − 1]

π

p

= a cos[θij + 2(i− 1)]π

p+ b cos[θij + 2(i+ 1)]

π

p+ c cos[θij + 2i]

π

p

= cos θij

{

a cos[2(i− 1)π

p] + b cos[2(i+ 1)

π

p] + c cos[2i

π

p]

}

− sin θij

{

a sin[2(i− 1)π

p] + b sin[2(i+ 1)

π

p] + c sin[2i

π

p]

}

,

or equivalently, adding the constraint that a, b, and c must add to 1:

cos[2(i− 1)π

p] cos[2(i+ 1)

π

p] + cos[2i

π

p]

sin[2(i− 1)π

p] sin[2(i+ 1)

π

p] + sin[2i

π

p]

1 1 1

(

abc

)

=

(

101

)

.

This system of three equations in three unknowns admits (in general) a uniquesolution, which shows that any Qi can be written as a barycentric combinationof the points Q1, Qp−1, and Qp, and that the points Q1, Q2, . . ., Qp are indeedcoplanar.Now, it is easy to see that the points Qi can be written as

Q1 = λ1C1 + λ2C2 + . . .+ λpCpQ2 = λpC1 + λ1C2 + . . .+ λp−1Cp. . .Qp = λ2C1 + λ3C2 + . . .+ λ1Cp

with λ1+ . . .+λp = 1, and it follows immediately that the centroid of the points Qi

(which obviously belongs to the plane spanned by these points) is also the centroidof the points Ci.

26.5. Facade’s photogrammetric module. We saw in the exercises of chapter 3 that themapping between a line δ with Plucker coordinate vector ∆ and its image δ withhomogeneous coordinates δ can be represented by ρδ = M∆. Here,∆ is a functionof the model parameters, and M depends on the corresponding camera positionand orientation.(a) Assuming that the line δ has been matched with an image edge e of length

l, a convenient measure of the discrepancy between predicted and observeddata is obtained by multiplying by l the mean squared distance separating thepoints of e from δ. Defining d(t) as the signed distance between the edge pointp = (1− t)p0 + tp1 and the line δ, show that

E =

∫ 1

0

d2(t)dt =1

3(d(0)2 + d(0)d(1) + d(1)2).

where d0 and d1 denote the (signed) distances between the endpoints of e andδ.

(b) If p0 and p1 denote the homogeneous coordinate vectors of these points, showthat

d0 =1

|[M∆]2|pT0 M∆ and d1 =

1

|[M∆]2|pT1 M∆,

Page 108: 2461059 Computer Vision Solution Manual

108 Chapter 26 Application: Image-Based Rendering

where [a]2 denotes the vector formed by the first two coordinates of the vectora in R3

(c) Formulate the recovery of the camera and model parameters as a non-linearleast-squares problem.

Solution

(a) Let us write the equation of δ as n · p = D, where n is a unit vector and Dis the distance between the origin and δ. The (signed) distance between thepoint p = (1− t)p0 + tp1 and δ is

d(t) = n · p−D = n · [(1− t)p0 + tp1]− [(1− t) + t]D = (1− t)d(0) + td(1).

We have therefore

E =∫ 1

0d2(t)dt =

∫ 1

0[(1− t)2d(0)2 + 2(1− t)td(0)d(1) + t2d(1)2]dt

=[

− 13 (1− t)3

]1

0d(0)2 +

[

t2 − 23 t

3]1

0d0d1 +

[

13 t

3]1

0d(1)2

= 13 (d(0)

2 + d(0)d(1) + d(1)2).

(b) With the same notation as before, we can write δT = (nT , D)T . Since ρδ =M∆ and n is a unit vector, it follows immediately that

d(0) = n·p0−D =1

|[M∆]2|pT0 M∆ and d(1) = n·p1−D =

1

|[M∆]2|pT1 M∆.

(c) Given n matches established across m images between the lines ∆j (j =1, . . . , n) and the corresponding image edges ej with endpoints pj0 and pj1and lengths lj , we can formulate the recovery of the camera parametersMi

and the model parameters associated with the line coordinate vectors ∆j as

the mimimization of the mean-squared error 1mn

∑mi=1

∑njlj3 f

2ij , where

fijdef=

1

|[Mi∆j ]2|

(pTj0Mi∆j)2 + (pTj0Mi∆j)(pTj1Mi∆j) + (pTj1Mi∆j)2,

with respect to the unknown parameters (note that the term under the radicalis positive since it is equal—up to a positive constant—to the integral of d2).It follows that the recovery of these parameters can be expressed as a (non-linear) least-squares problem.

26.6. Show that a basis for the eight-dimensional vector space V formed by all affineimages of a fixed set of points P0, . . . , Pn−1 can be constructed from at least twoimages of these points when n ≥ 4.Hint: Use the matrix

u(1)0 v

(1)0 . . . u

(m)0 v

(m)0

. . . . . . . . .

u(1)n−1 v

(1)n−1 . . . u

(m)n−1 v

(m)n−1

,

where (u(j)i , v

(j)i ) are the coordinates of the projection of the point Pi into image

number j.

Solution The matrix introduced here is simply the transpose of the data matrixused in the Tomasi-Kanade factorization approach of chapter 12. With at least two

Page 109: 2461059 Computer Vision Solution Manual

109

views of n ≥ 4 points, the singular value decomposition of this matrix can be usedto estimate the points P i (i = 0, . . . , n− 1) and construct the matrix

P T0 0T 1 0

0T P T0 0 1

. . . . . . . . . . . .

P Tn−1 0T 1 0

0T P Tn−1 0 1

whose columns span the eight-dimensional vector space V formed by all images ofthese n points.

26.7. Show that the set of all projective images of a fixed scene is an eleven-dimensionalvariety.

Solution Writing

(

p0

. . .pn−1

)

=

m1 · P 0

m3 · P 0

m2 · P 0

m3 · P 0

. . .m1 · P 0

m3 · P 0

m2 · P 0

m3 · P 0

shows that the set of all images of a fixed scene forms a surface embedded inR2n and defined by rational equations in the row vectors m1, m2 and m3 of theprojection matrix. Rational parametric surfaces are varieties whose dimension isgiven by the number of independent parameters. Since projection matrices are onlydefined up to scale, the dimension of the variety formed by all projective images ofa fixed scene is 11.

26.8. Show that the set of all perspective images of a fixed scene (for a camera withconstant intrinsic parameters) is a six-dimensional variety.

Solution A perspective projection matrix can always be written as ρM = K(

R t)

,where K is the matrix of intrinsic parameters, R is a rotation matrix, and ρ is ascalar accounting for the fact that M is only defined up to a scale factor. Thematrix A formed by the three leftmost columns of M must therefore satisfy thefive polynomial constraints associated with the fact that the columns of the matrixK−1A are orthogonal to each other and have the same length. Since the set of allprojective images of a fixed scene is a variety of dimension 11, it follows that theset of all perspective images is a sub-variety of dimension 11− 5 = 6.

26.9. In this exercise, we show that Eq. (26.7) only admits two solutions.(a) Show that Eq. (26.6) can be rewritten as

{

X2 − Y 2 + e1 − e2 = 0,2XY + e = 0,

(26.1)

where{

X = u+ αu1 + βu2,Y = v + αv1 + βv2,

and e, e1, and e2 are coefficients depending on u1, v1, u2, v2 and the structureparameters.

Page 110: 2461059 Computer Vision Solution Manual

110 Chapter 26 Application: Image-Based Rendering

(b) Show that the solutions of Eq. (26.8) are given by

{

X ′ = 4√

(e1 − e2)2 + e2 cos( 12 arctan(e, e1 − e2)),

Y ′ = 4√

(e1 − e2)2 + e2 sin( 12 arctan(e, e1 − e2)),

and (X ′′, Y ′′) = (−X ′,−Y ′). (Hint: Use a change of variables to rewrite Eq.(26.8) as a system of trigonometric equations.)

Solution Recall that Eq. (26.6) has the form

{

uTRu− vTRv = 0,

uTRv = 0,where R =

(1 + λ2)z2 + α2 λµz2 + αβ α

λµz2 + αβ µ2z2 + β2 βα β 1

.

(a) Let us define the vector α = (α, β, 1)T and note that X = α ·u and Y = α ·v.This allows us to write

R = ααt + z2

(

L 0

0T 0

)

where L =

(

1 + λ2 λµ

λµ µ2

)

.

If u2 = (u1, u2)T and v2 = (v1, v2)

T , it follows that we have indeed

0 = uTRu− vTRv = X2 − Y 2 + e1 − e2 = 0,

0 = uTRv =1

2[2XY + e] = 0,

where e1 = z2uT2 Lu2, e2 = z2vT2 Lv2, and e = 2z2uT2 Lv2.

(b) We can always write X = a cos θ and Y = a sin θ for some a > 0 and θ ∈[0, 2π]. This allows us to rewrite rewrite Eq. (26.8) as

{

a2 cos 2θ = e2 − e1a2 sin 2θ = −e

If follows that a4 = (e1 − e2)2 + e2 and, if φ = arctan(e, e1 − e2), the twosolutions for θ are θ′ = φ/2 and θ′′ = θ′ + π. Thus the solutions of (26.8) are(X ′, Y ′) = (a cos θ′, a sin θ′) and (X ′′, Y ′′) = (a cos θ′′, a sin θ′′) = −(X ′, Y ′).