invariant large margin nearest neighbour classifier m. pawan kumar philip torr andrew zisserman

Invariant Large Margin Nearest Neighbour Classifier

M. Pawan Kumar

Philip Torr

Andrew Zisserman

Aim

• To learn a distance metric for invariant

nearest neighbour classification

Training data

Aim



Target pairs

Aim



Impostor pairs

Problem : Euclidean distance may not provide correct nearest neighbours

Solution : Learn a mapping to new space

Aim



• Bring Target pairs closer

• Move Impostor pairs away

Aim

Euclidean Distance Learnt Distance



Aim



TransformationTrajectories

Learn a mapping to new space

Aim



• Bring Target Trajectory pairs closer

• Move Impostor Trajectory pairs away

Aim

Euclidean Distance Learnt Distance



MotivationFace Recognition in TV Video

I1

I2

I3

I4

In

.

.

.

FeatureVector

Euclidean distance may not give correct nearest neighbours

Learn a distance metric

MotivationFace Recognition in TV Video

Invariance to changes in position of features

• Large Margin Nearest Neighbour (LMNN)

• Preventing Overfitting

• Polynomial Transformations

• Invariant LMNN (ILMNN)

• Experiments

Outline

LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005

Learns a distance metric for Nearest Neighbour classification

• Learns a mapping L x Lx

• Bring target pairs closer

• Move impostor pairs away

xi xj

xk



Distance between xi and xj : D(i,j) = (xi-xj)T LTL (xi-xj)

xi xj

xk



Distance between xi and xj : D(i,j) = (xi-xj)T M (xi-xj)

min Σij D(i,j)

subject to M 0

Convex Semidefinite Program(SDP)

M 0

xi xj

xk

Global minimum



D(i,k) – D(i,j) ≥ 1 - eijk eijk ≥ 0

min Σijk eijk

subject to M 0

Convex SDP

xi xj

xk



min Σij D(i,j) + ΛH Σijk eijk

subject to M 0

D(i,k) – D(i,j) ≥ 1- eijk

eijk ≥ 0Solve to obtain optimum M

Complexity : Polynomial in number of points


Advantages

• Trivial extension to multiple classes

• Efficient polynomial time solution

Disadvantages

• Large number of degrees of freedom – overfitting ??

• Does not model invariance of data





• Experiments

Outline

L2 Regularized LMNN ClassifierRegularize Frobenius norm of L

• ||L||2 = Σ Mii

min Σij D(i,j) + ΛH Σijk eijk + ΛR Σi Mii

subject to M 0


eijk ≥ 0

L2-LMNN

Diagonal LMNNLearn a diagonal L matrix => Learn a diagonal M matrix

min Σij D(i,j) + ΛH Σijk eijk

subject to M 0


eijk ≥ 0 Mij = 0, i ≠ j

Linear Program

D-LMNN

Diagonally Dominant LMNNMinimize 1-norm of off-diagonal element of M

min Σij D(i,j) + ΛH Σijk eijk + ΛR Σij tij

subject to M 0


eijk ≥ 0

tij ≥ Mij, tij ≥ -Mij , i ≠ j

DD-LMNN

LMNN Classifier

What about invariance to known transformations?

Append input data with transformed versions

Inefficient Inaccurate

Can we add invariance to LMNN?

• No – Not for a general transformation

• Yes - For some types of transformations





• Experiments

Outline

Polynomial Transformations

x =a

bRotate x by an angle θ

a

b

cos θ

sin θ

-sin θ

cos θ

1-θ2/2 -(θ-θ3/6) a

b(θ-θ3/6) 1-θ2/2 Taylor’s Series

Polynomial Transformations

x =a

bRotate x by an angle θ

a

b

cos θ

sin θ

-sin θ

cos θ

a 1θ

b -a/2 b/6

b a -b/2 -a/6 θ2

θ3

X θ

T(θ,x) = X θ

Why are Polynomials Special?

≡ P 0

θ1

θ2

(θ1 ,θ2)

DISTANCE

Sum of squares of polynomials

SD-Representability of Polynomials Lasserre, 2001

Why are Polynomials Special?

≡ P’ 0

θ1

θ2

DISTANCE

Sum of squares of polynomials





• Experiments

Outline

ILMNN ClassifierLearns a distance metric for invariant Nearest Neighbour classification


• Bring target trajectories closer

• Move impostor trajectories away

Polynomial trajectories

xi xj

xk



• Bring target trajectories closer

• Move impostor trajectories away


M 0

Minimize maximum distance

Maximize minimum distance

xi xj

xk


• Use SD-Representability. One Semidefinite Constraint.


• Solve for M in polynomial time.

• Add regularizers to prevent overfitting.

xi xj

xk





• Experiments

Outline

DatasetFaces from an episode of “Buffy – The Vampire Slayer”

11 Characters

* Thanks to Josef Sivic and Mark Everingham

24,244 Faces (with ground truth labelling*)

Dataset SplitsExperiment 1

Experiment 2

• Random permutation of dataset

• 30% training

• 30% validation (to estimate ΛH and ΛR)

• 40% testing

• First 30% training

• Next 30% validation

• Last 40% testing

Suitable forNearest Neighbour-type

Classification

Not so suitable forNearest Neighbour-type

Classification

Incorporating InvarianceInvariance of feature position to Euclidean Transformation

-5o ≤ θ ≤ 5o

-3 ≤ tx ≤ 3 pixels

-3 ≤ ty ≤ 3 pixels

Approximated to degree 2 polynomial using Taylor’s series

Derivatives approximated as image differences

Image Rotated Image

Incorporating InvarianceInvariance of feature position to Euclidean Transformation

-5o ≤ θ ≤ 5o

-3 ≤ tx ≤ 3 pixels

-3 ≤ ty ≤ 3 pixels

Approximated to degree 2 polynomial using Taylor’s series

Derivatives approximated as image differences

Smooth Image Smooth Image

- =Derivative

Training the Classifiers

Within-shot Faces

Problem : Euclidean distance provides 0 error

Solution : Cluster.

Training the Classifiers

Efficiently solve SDP using Alternative Projection

Bauschke and Borwein, 1996

Problem : Euclidean distance provides 0 error

Solution : Cluster. Train using cluster centres.

Testing the Classifiers

Map all training points using L

Map the test point using L

Find nearest neighbours. Classify.

Measure Accuracy = No. of True Positives

No. of Test Faces

Timings

Method Training Testing

kNN-E - 62.2 s

L2-LMNN 4 h 62.2 s

D-LMNN 1 h 53.2 s

DD-LMNN 2 h 50.5 s

L2-ILMNN 24 h 62.2 s

D-ILMNN 8 h 48.2 s

DD-ILMNN 24 h 51.9 s

M-SVM 300 s 446.6 sSVM-KNN - 2114.2 s

Accuracy

Method Experiment 1 Experiment 2

kNN-E 83.6 26.7

L2-LMNN 61.2 22.6

D-LMNN 85.6 24.3

DD-LMNN 84.4 24.5

L2-ILMNN 65.9 24.0

D-ILMNN 87.2 32.0

DD-ILMNN 86.6 29.8

M-SVM 62.3 30.0SVM-KNN 75.5 28.1

True Positives

Conclusions

• Regularizers for LMNN

• Adding invariance to LMNN

• More accurate than Nearest Neighbour

• More accurate than LMNN

Future Research

• D-LMNN and D-ILMNN for Chi-squared distance

• D-LMNN and D-ILMNN for dot product distance

• Handling missing data –

Sivaswamy, Bhattacharya, Smola, JMLR – 2006

• Learning local mappings (adaptive kNN)

Questions ??

False Positives

Precision-Recall Curves

Experiment 1

Precision-Recall Curves

Experiment 2

invariant large margin nearest neighbour classifier m. pawan kumar philip torr andrew zisserman

Documents

neighbour classifierm

tv videoeuclidean distance

saul nips

miimin ij di

matrixmin ij di

xixjmin ij di

invariant large margin

new spaceaimto