computer vision: models, learning and inference …cv192/wiki.files/cv192_lec...non-visual tracking...

50
Computer Vision: Models, Learning and Inference Tracking Oren Freifeld and Ron Shapira-Weber Computer Science, Ben-Gurion University June 3, 2019 www.cs.bgu.ac.il/ ~ cv192/ Tracking (ver. 1.00) June 3, 2019 1 / 50

Upload: others

Post on 22-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Computer Vision: Models, Learning and Inference–

Tracking

Oren Freifeld and Ron Shapira-Weber

Computer Science, Ben-Gurion University

June 3, 2019

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 1 / 50

Page 2: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

1 MMSE and Conditional ExpectationsThe Gaussian Case

2 Non-visual Tracking

3 Visual Tracking

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 2 / 50

Page 3: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Reminder

Let A be an n× n matrix.

A is called Positive Definite (PD) if

xTAx > 0 ∀ non-zero x ∈ Rn

A is called Positive Semidefinite (PSD) if

xTAx ≥ 0 ∀ non-zero x ∈ Rn

A is called symmetric ifA = AT

A is called SPD if it is both symmetric and PD.

A is called SPSD if it is both symmetric and PSD.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 3 / 50

Page 4: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

A ≺ B

We say that a matrix A is “smaller” than B, and write

A ≺ B

if B −A is positive definite.

Similarly, we writeA � B

if B −A is semi-positive definite.

Particularly, if Σ1 and Σ2 are two covariance matrices such that

Σ1 ≺ Σ2

than the RV associated with Σ2 has “more variance” than the RVassociated with Σ1.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 4 / 50

Page 5: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Example

Σ1 =

[1 00 1

]and Σ2 =

[4 00 4

](1)

⇒ Σ2 −Σ1 =

[3 00 3

](2)

is SPD, so Σ1 ≺ Σ2.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 5 / 50

Page 6: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Example

Σ1 =

[4 00 1

]and Σ2 =

[1 00 4

](3)

⇒ Σ2 −Σ1 =

[−3 00 3

](4)

is not SPD (it’s symmetric, but not PD). Similarly, Σ1 −Σ2 is also notSPD. Thus, we can’t “order” the two matrices.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 6 / 50

Page 7: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Let X and Y be two random variables.

Let g : R→ R be some function.

Definition (MSE)

Let x̂ = g(y) be an estimate of x. Then

E((X − g(Y ))2) =

∫ ∫(x− g(y))2p(x, y) dxdy

is called the Mean Square Error of the estimator g.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 7 / 50

Page 8: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Definition (MMSE)

The Minimal Mean Square Error (MMSE) is

ming(·)

E((X − g(Y ))2)

(the optimization is over the space of all R→ R functions) and theestimator that achieves it is the MMSE estimator:

X̂MMSE , argming(·)

E((X − g(Y ))2) .

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 8 / 50

Page 9: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Note that ∫E((X − g(y))2|Y = y)p(y)dy

=

∫ (∫(x− g(y))2p(x|Y = y) dx

)p(y)dy

=

∫ ∫(x− g(y))2p(x, y) dxdy = E((X − g(Y ))2)

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 9 / 50

Page 10: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Suppose we estimate x by some constant number m.

Leth(m) , E((X −m)2)︸ ︷︷ ︸∫

(x−m)2p(x) dx

= E(X2)− 2mE(X) +m2 .

h′(m) = −2E(X) + 2m.

h′(m) = 0⇒ m = E(X).

argminm

E((X −m)2) = E(X) =

∫xp(x) dx

minm

E((X −m)2) = var(X) =

∫(x− E(X))2p(x) dx

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 10 / 50

Page 11: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Let y be a realization of Y . Repeating the analysis where now we allowm to depend on y, and replacing means with conditional means:

h(m; y) , E((X −m)2|Y = y)︸ ︷︷ ︸∫(x−m)2p(x|Y=y) dx

= E(X2|Y = y)− 2mE(X|Y = y) +m2 .

h′(m; y) = −2E(X|Y = y) + 2m.

h′(m; y) = 0⇒ m = E(X|Y = y).

argminm

E((X −m)2|Y = y) = E(X|Y = y) =

∫xp(x|Y = y) dx

minm

E((X −m)2|Y = y)

= var(X|Y = y) =

∫(x− E(X|Y = y))2p(x|Y = y) dx

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 11 / 50

Page 12: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

If you get the lowest grade in every exam, then you also have the lowestaverage grade among all the students.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 12 / 50

Page 13: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Similarly, since, for every y,

argminm(y)

E((X −m(y))2|Y = y) = E(X|Y = y)

it follows that

argming(·)

E(X − g(Y ))2 = E(X|Y ) .

That is, the conditional mean achieves the MMSE. In other words, theconditional mean is the optimal estimator in the sense of MSE.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 13 / 50

Page 14: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

To Summarize:

Suppose X and Y and two jointly-distributed scalar RVs.

Observed Y = y, want to estimate x

x̂MMSE , argming(y)∈R

E((X − g(y))2|Y = y) = E(X|Y = y)

ming(y)∈R

E((X − g(y))2|Y = y) = var(X|Y = y)

var(X|Y = y) , E((X − E(X|Y = y))2|Y = y)

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 14 / 50

Page 15: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Fact

This generalizes to the vector case, X ∈ Rn (regardless whether y is ascalar or a vector) where

ming(y)∈Rn

E((X − g(y))T (X − g(y))|Y = y)

= cov(X|Y = y) = E([X − E(X|Y = y)]T [X − E(X|Y = y)]|Y = y)

where the notion of the “smallest matrix” is defined in terms of ≺.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 15 / 50

Page 16: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Fact

E(X̂MMSE) = E(E(X|Y )) = E(X) (5)

(the law of iterated expectation)

In other words, the estimation error, X − X̂MMSE, has zero mean:

E(X − X̂MMSE) = E(X)− E(X̂MMSE) = 0 .

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 16 / 50

Page 17: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Let ε = X − X̂MMSE.

E(ε|Y ) = 0.

Proof.

E(ε|Y ) = E(X − X̂MMSE|Y ) = E(X|Y )− E(X̂MMSE|Y )) =

E(X|Y )− E(E(X|Y )|Y ) = E(X|Y )− E(X|Y ) = 0

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 17 / 50

Page 18: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Let ε = X − X̂MMSE. For any function g(Y ), we have

E(εg(Y )) = 0 .

Proof.

E(εg(Y )|Y ) = g(Y )E(ε) = g(Y ) · 0 = 0.

Then, by the law of iterated expectation:

E(εg(Y )) = E(E(εg(Y )|Y )) = 0 .

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 18 / 50

Page 19: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

The estimation error, ε = X − X̂MMSE, and the estimator, X̂MMSE areuncorrelated.

Proof.

cov(εX̂MMSE) = E(εX̂MMSE)− E(ε)E(X̂MMSE)

= E(εX̂MMSE)− 0 · E(X̂MMSE) = E(εX̂MMSE) = E(εg(Y )) = 0

where used the fact that X̂MMSE = g(Y ); i.e., it is a function of Y .

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 19 / 50

Page 20: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations

Since cov(εX̂MMSE) = 0, it follows that

var(X) = var(X̂MMSE) + var(ε)

= var(E(X|Y )) + E((X − E(X|Y ))2)

= var(E(X|Y )) + EY (EX|Y [(X − E(X|Y ))2|Y = y])

= var(E(X|Y ) + E(var(X|Y ))

(AKA law of total variance)

Also, starting from the first line:

E(X2)− (E(X))2 = E(X̂2MMSE)− (E(X̂MMSE))

2 + E(ε2)− (E(ε))2

E(X2)− (E(X))2 = E(X̂2MMSE)− (E(X))2 + E(ε2)

E(X2) = E(X̂2MMSE) + E(ε2)

This is just Pythagoras’ theorem – note that (X,Y ) 7→ E(XY ) is aninner product, so X 7→

√E(X2) is the corresponding induced norm.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 20 / 50

Page 21: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations The Gaussian Case

Conditional Expectations For Gaussians

Earlier we mentioned that Gaussians are closed under conditioning.

Fact (expressions for the conditional mean and conditional covariance)

If X and Y are jointly Gaussians, then E(X|Y = y) is a “linear” (affine,really) function of the measurement, y. Particularly:[

XY

]∼ N

([µX

µY

],

[ΣX ΣXY

ΣTXY ΣY

])⇒X|Y = y ∼ N

(µX|y,ΣX|y

)where

µX|y = µX + ΣXY Σ−1Y (y − µY )

and

ΣX|y = ΣX −ΣXY Σ−1Y ΣTXY

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 21 / 50

Page 22: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations The Gaussian Case

⇒In the Gaussian case, x̂MMSE is a “linear” function of y:

µX|y = µX + ΣXY Σ−1Y (y − µY ) = ΣXY Σ−1Y y + (µX −ΣXY Σ−1Y µY )

ΣX|y’s does not depend on y:

ΣX|y = ΣX −ΣXY Σ−1Y ΣTXY

Both these properties hold for Gaussians, but not in general.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 22 / 50

Page 23: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations The Gaussian Case

Conditional Expectations For Gaussians

Fact

The expression for ΣX|y is equivalent to taking

[ΣX ΣXY

ΣTXY ΣY

]−1,

dropping the the rows and columns that correspond to y, and invert backthe remaining block.

In Numpy, an easy way to drop rows/cols is using np.delete.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 23 / 50

Page 24: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations The Gaussian Case

Example

Let x =[X1 X2 X3

]Tbe a Gaussian RV with a precision matrix Q.

Then the covariance of[X1 X3

]Tconditioned on X2 = x2 is[

Q11 Q13

Q31 Q33

]−1.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 24 / 50

Page 25: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

MMSE and Conditional Expectations The Gaussian Case

Conditional Expectations For Gaussians

ΣX|y � ΣX (conditioning does not increase uncertainty) – this holds ingeneral, not just for Gaussians.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 25 / 50

Page 26: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

A continuous-state discrete-time setting.

Time:1, 2, . . . , t

Hidden state at time t:xt

Hidden states till time t

x1:t = [x1,x2, . . . ,xt]

Measurement at time t:yt

Measurements till time t:

y1:t = [y1,y2, . . . ,yt]

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 26 / 50

Page 27: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

The simplest case in a continuous-state discrete-time setting.

Linear dynamics with iid additive Gaussian noise:

xt = Axt−1 + ηx ηx ∼ N (0,Σηx)

– AKA called a first-order Auto-Regressive (AR) model – where thematrix A is known, deterministic, and doesn’t depend on t.

Linear observation model with iid additive Gaussian noise:

yt = Bxt + ηy ηy ∼ N (0,Σηy)

where the matrix B is known, deterministic, and doesn’t depend on t.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 27 / 50

Page 28: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

Gaussians RV’s are closed under affine transformations, marginalization,and conditioning ⇒ everything here is Gaussian: e.g.:

p(x1:t,y1:t)

p(x1:t)

p(y1:t)

p(x1:t|y1:t)p(xt|y1:t)p(xt|x1:(t−1))

p(xt|x1:(t−1),y1:t)

p(xt|x1:(t−1),y1:(t−1))

p(xt|y1:(t−1))

Moreover, all the associated means and covariances have closed from.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 28 / 50

Page 29: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

p(x1:t,y1:t): an MRF with an “HMM-like” graph – but the term‘HMM” is usually used when the hidden states are discrete.

p(x1:t): a Markov-chain structure. E.g.:

p(xt|x1:(t−1)) = p(xt|xt−1)

p(y1:t): graph is fully connected

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 29 / 50

Page 30: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

p(x1:t|y1:t): a Markov-chain structure. In fact:

p(xt|x1:(t−1),y1:t)MC= p(xt|xt−1,y1:t)

xt⊥⊥y1:(t−1)|xt−1= p(xt|xt−1,yt)

p(x1:t|y1:(t−1)): a Markov-chain structure. In fact:

p(xt|x1:(t−1),y1:(t−1))MC= p(xt|xt−1,y1:(t−1))

xt⊥⊥y1:(t−1)|xt−1= p(xt|xt−1)

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 30 / 50

Page 31: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

Because everything is Gaussian here, the MMSE estimators of xt|y1:tand xt|y1:(t−1) are given in terms of the corresponding conditionalexpectations.

Turns out:

(µxt|y1:(t−1),Σxt|y1:(t−1)

) = func(µxt−1|y1:(t−1),Σxt−1|y1:(t−1)

)

(µxt|y1:t ,Σxt|y1:t) = func(µxt−1|y1:(t−1),Σxt−1|y1:(t−1)

,yt)

and these recursive computations have closed forms (omitted here).

These computations are known as the Kalman Filter. Note it is a linearfilter.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 31 / 50

Page 32: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Classical (Non-visual) Tracking

The Kalman Filter gives us more than mere point estimates of xt|y1:tand xt|y1:(t−1); rather, it gives us an entire posterior distribution that isbeing propagated in time.

Note this distribution, being Gaussian, is unimodal. This is a limitation.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 32 / 50

Page 33: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Convolution of Probability Density Functions

Fact

If X and Y are two independent RVs, and Z = X + Y , then

pZ = pX ∗ pY

where ∗ denotes convolution.

Particularly, if, x ∼ p(x) and η ∼ N (0, σ2) are independent, then thedensity of y = x+ η is a “blurred” version of p(x).This is sometimes called diffusion.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 33 / 50

Page 34: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Kalman Filter as Probability Density Propagation

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 34 / 50

Page 35: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Kalman Filter: Pros and Cons

Pros: Optimal for linear dynamics and Gaussian models; simple, widelyknown, efficient

Cons: Can’t handle multi-modal distribution, supports only singlehypothesis; restricted to linear models

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 35 / 50

Page 36: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Nonlinear Extensions of the Kalman Filter

The Extended Kalman Filter (EKF) is a nonlinear filter that designed tohandle nonlinear differentiable dynamics and observation models;essentially, the system is being linearized around the current estimate.

The Unscented Kalman Filter (UKF), which uses deterministic sampling,better handles highly-nonlinear dynamics and observation models (anddifferentiability is not assumed)

In both cases, however, the underlying distribution is still assumed to beunimodal.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 36 / 50

Page 37: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

More General Probability Density Propagation

Figure from Michael Isard and Andrew Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 37 / 50

Page 38: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Particle Filter

Particle Filter (AKA Sequential Monte Carlo) provides an alternative toKalman filter that is also easy to implement, but can also handlemultiple modes, and does not assume linearity/differentiability.

It is based on a discrete approximation of p(xt|xt−1,yt) via a set of“particles” which are being propagated across time.

The main downside of the Particle Filter is that it doesn’t scale well withthe dimensionality of x.

Usually more effective than Kalman filter in visual tracking.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 38 / 50

Page 39: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Factored Sampling

Consider first the static case (i.e., there is no t).

(si)Ni=1 points, called “particles”, are sampled iid from a prior, p(x).

Each si is assigned weight, πi (depicted here by the blob’s size) inproportion to the likelihood, p(y|x = si).

The weighted point set then serves as an approximated representation ofthe posterior, p(x|y).Figure from Michael Isard and Andrew Blake, IJCV ’98

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 39 / 50

Page 40: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

Propagating between Consecutive Times

Given a weighted set of particles, we would like to evolve it in time.

Resampling:Sample N new particles, (sti)

Ni=1, with replacement, from (st−1i )Ni=1

according to the discrete distribution (πt−1i )Ni=1

Deterministic drift. E.g., apply a linear transformation to each particle.sti ← AstiDiffuse. I.e., add noise. E.g.,sti+ = nti where nti is Gaussian IID noise.

Weight by the new likelihood: πti ∝ pt(yt|x = sti)

Remark

Usually we will have expressions defined in terms of log of unnormalizedπti ’s – so don’t forget the log-sum-exp trick.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 40 / 50

Page 41: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Non-visual Tracking

CONDENSATION

Figure from Michael Isard and Andrew Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 41 / 50

Page 42: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

We will restrict discussion to:

2D-based tracking in a single camera;

a probabilistic formulation where the state of interest is defined via asmall number of parameters.Example: track a bounding box, or another shape defined via a smallnumber of a parameters.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 42 / 50

Page 43: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

Visual Tracking

In Computer Vision, it is not always clear what yt is

This leads to complicated expressions of p(yt|xt), often with noclosed-form.

In turn, this complicates p(xt|xt−1,yt)

Motivates the need for more flexible methods

The CONDENSATION1 algorithm (Isard and Blake, IJCV ’98), relatedto the Particle Filter

Can handle multiple modes, recovering from failures (not always. . .),“only” needs a way to sample from p(xt|xt−1) and evaluate, apossibly-unnormalized, p(yt|xt)

There are also many other visual-tracking methods

1Conditional Density Propagation for Visual Trackingwww.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 43 / 50

Page 44: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

CONDENSATION Example

Figure from Isard and Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 44 / 50

Page 45: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

How to get Observations?

Some possible approaches:

Background modeling

Tracking lines/contours/features

Tracking-by-detection (e.g. using template matching)

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 45 / 50

Page 46: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

Process used in Isard and Blake for Contour Tracking

Figure from Isard and Blake, IJCV ’98www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 46 / 50

Page 47: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

Isard and Blake

See demos

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 47 / 50

Page 48: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

Another Example for Parameterization

Articulated parts

Figure from Sidenbladh, Black and Fleet, ECCV 2000www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 48 / 50

Page 49: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

Bigger Problem: Data Association

What if want to explicitly model multiple objects being tracked?Which measurement goes with which track? (or clutter)

Often heuristics are used

Can be done in a principled way, but the details are not trivial

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 49 / 50

Page 50: Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...Non-visual Tracking Classical (Non-visual) Tracking The simplest case in a continuous-state discrete-time

Visual Tracking

Version Log

3/6/2019, ver 1.00.

www.cs.bgu.ac.il/~cv192/ Tracking (ver. 1.00) June 3, 2019 50 / 50