invariant large margin nearest neighbour classifier m. pawan kumar philip torr andrew zisserman
Post on 15-Jan-2016
215 views
TRANSCRIPT
Invariant Large Margin Nearest Neighbour Classifier
M. Pawan Kumar
Philip Torr
Andrew Zisserman
Aim
• To learn a distance metric for invariant
nearest neighbour classification
Training data
Aim
• To learn a distance metric for invariant
nearest neighbour classification
Target pairs
Aim
• To learn a distance metric for invariant
nearest neighbour classification
Impostor pairs
Problem : Euclidean distance may not provide correct nearest neighbours
Solution : Learn a mapping to new space
Aim
• To learn a distance metric for invariant
nearest neighbour classification
• Bring Target pairs closer
• Move Impostor pairs away
Aim
Euclidean Distance Learnt Distance
• To learn a distance metric for invariant
nearest neighbour classification
Aim
Euclidean Distance Learnt Distance
• To learn a distance metric for invariant
nearest neighbour classification
Aim
Euclidean Distance Learnt Distance
• To learn a distance metric for invariant
nearest neighbour classification
Aim
Euclidean Distance Learnt Distance
• To learn a distance metric for invariant
nearest neighbour classification
Aim
Euclidean Distance Learnt Distance
• To learn a distance metric for invariant
nearest neighbour classification
Aim
• To learn a distance metric for invariant
nearest neighbour classification
TransformationTrajectories
Learn a mapping to new space
Aim
• To learn a distance metric for invariant
nearest neighbour classification
• Bring Target Trajectory pairs closer
• Move Impostor Trajectory pairs away
Aim
Euclidean Distance Learnt Distance
• To learn a distance metric for invariant
nearest neighbour classification
MotivationFace Recognition in TV Video
I1
I2
I3
I4
In
.
.
.
FeatureVector
Euclidean distance may not give correct nearest neighbours
Learn a distance metric
MotivationFace Recognition in TV Video
Invariance to changes in position of features
• Large Margin Nearest Neighbour (LMNN)
• Preventing Overfitting
• Polynomial Transformations
• Invariant LMNN (ILMNN)
• Experiments
Outline
LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005
Learns a distance metric for Nearest Neighbour classification
• Learns a mapping L x Lx
• Bring target pairs closer
• Move impostor pairs away
xi xj
xk
LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005
Learns a distance metric for Nearest Neighbour classification
Distance between xi and xj : D(i,j) = (xi-xj)T LTL (xi-xj)
xi xj
xk
LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005
Learns a distance metric for Nearest Neighbour classification
Distance between xi and xj : D(i,j) = (xi-xj)T M (xi-xj)
min Σij D(i,j)
subject to M 0
Convex Semidefinite Program(SDP)
M 0
xi xj
xk
Global minimum
LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005
Learns a distance metric for Nearest Neighbour classification
D(i,k) – D(i,j) ≥ 1 - eijk eijk ≥ 0
min Σijk eijk
subject to M 0
Convex SDP
xi xj
xk
LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005
Learns a distance metric for Nearest Neighbour classification
min Σij D(i,j) + ΛH Σijk eijk
subject to M 0
D(i,k) – D(i,j) ≥ 1- eijk
eijk ≥ 0Solve to obtain optimum M
Complexity : Polynomial in number of points
LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005
Advantages
• Trivial extension to multiple classes
• Efficient polynomial time solution
Disadvantages
• Large number of degrees of freedom – overfitting ??
• Does not model invariance of data
• Large Margin Nearest Neighbour (LMNN)
• Preventing Overfitting
• Polynomial Transformations
• Invariant LMNN (ILMNN)
• Experiments
Outline
L2 Regularized LMNN ClassifierRegularize Frobenius norm of L
• ||L||2 = Σ Mii
min Σij D(i,j) + ΛH Σijk eijk + ΛR Σi Mii
subject to M 0
D(i,k) – D(i,j) ≥ 1- eijk
eijk ≥ 0
L2-LMNN
Diagonal LMNNLearn a diagonal L matrix => Learn a diagonal M matrix
min Σij D(i,j) + ΛH Σijk eijk
subject to M 0
D(i,k) – D(i,j) ≥ 1- eijk
eijk ≥ 0 Mij = 0, i ≠ j
Linear Program
D-LMNN
Diagonally Dominant LMNNMinimize 1-norm of off-diagonal element of M
min Σij D(i,j) + ΛH Σijk eijk + ΛR Σij tij
subject to M 0
D(i,k) – D(i,j) ≥ 1- eijk
eijk ≥ 0
tij ≥ Mij, tij ≥ -Mij , i ≠ j
DD-LMNN
LMNN Classifier
What about invariance to known transformations?
Append input data with transformed versions
Inefficient Inaccurate
Can we add invariance to LMNN?
• No – Not for a general transformation
• Yes - For some types of transformations
• Large Margin Nearest Neighbour (LMNN)
• Preventing Overfitting
• Polynomial Transformations
• Invariant LMNN (ILMNN)
• Experiments
Outline
Polynomial Transformations
x =a
bRotate x by an angle θ
a
b
cos θ
sin θ
-sin θ
cos θ
1-θ2/2 -(θ-θ3/6) a
b(θ-θ3/6) 1-θ2/2 Taylor’s Series
Polynomial Transformations
x =a
bRotate x by an angle θ
a
b
cos θ
sin θ
-sin θ
cos θ
a 1θ
b -a/2 b/6
b a -b/2 -a/6 θ2
θ3
X θ
T(θ,x) = X θ
Why are Polynomials Special?
≡ P 0
θ1
θ2
(θ1 ,θ2)
DISTANCE
Sum of squares of polynomials
SD-Representability of Polynomials Lasserre, 2001
Why are Polynomials Special?
≡ P’ 0
θ1
θ2
DISTANCE
Sum of squares of polynomials
• Large Margin Nearest Neighbour (LMNN)
• Preventing Overfitting
• Polynomial Transformations
• Invariant LMNN (ILMNN)
• Experiments
Outline
ILMNN ClassifierLearns a distance metric for invariant Nearest Neighbour classification
• Learns a mapping L x Lx
• Bring target trajectories closer
• Move impostor trajectories away
Polynomial trajectories
xi xj
xk
ILMNN ClassifierLearns a distance metric for invariant Nearest Neighbour classification
• Learns a mapping L x Lx
• Bring target trajectories closer
• Move impostor trajectories away
Polynomial trajectories
M 0
Minimize maximum distance
Maximize minimum distance
xi xj
xk
ILMNN ClassifierLearns a distance metric for invariant Nearest Neighbour classification
• Use SD-Representability. One Semidefinite Constraint.
Polynomial trajectories
• Solve for M in polynomial time.
• Add regularizers to prevent overfitting.
xi xj
xk
• Large Margin Nearest Neighbour (LMNN)
• Preventing Overfitting
• Polynomial Transformations
• Invariant LMNN (ILMNN)
• Experiments
Outline
DatasetFaces from an episode of “Buffy – The Vampire Slayer”
11 Characters
* Thanks to Josef Sivic and Mark Everingham
24,244 Faces (with ground truth labelling*)
Dataset SplitsExperiment 1
Experiment 2
• Random permutation of dataset
• 30% training
• 30% validation (to estimate ΛH and ΛR)
• 40% testing
• First 30% training
• Next 30% validation
• Last 40% testing
Suitable forNearest Neighbour-type
Classification
Not so suitable forNearest Neighbour-type
Classification
Incorporating InvarianceInvariance of feature position to Euclidean Transformation
-5o ≤ θ ≤ 5o
-3 ≤ tx ≤ 3 pixels
-3 ≤ ty ≤ 3 pixels
Approximated to degree 2 polynomial using Taylor’s series
Derivatives approximated as image differences
Image Rotated Image
Incorporating InvarianceInvariance of feature position to Euclidean Transformation
-5o ≤ θ ≤ 5o
-3 ≤ tx ≤ 3 pixels
-3 ≤ ty ≤ 3 pixels
Approximated to degree 2 polynomial using Taylor’s series
Derivatives approximated as image differences
Smooth Image Smooth Image
- =Derivative
Training the Classifiers
Within-shot Faces
Problem : Euclidean distance provides 0 error
Solution : Cluster.
Training the Classifiers
Efficiently solve SDP using Alternative Projection
Bauschke and Borwein, 1996
Problem : Euclidean distance provides 0 error
Solution : Cluster. Train using cluster centres.
Testing the Classifiers
Map all training points using L
Map the test point using L
Find nearest neighbours. Classify.
Measure Accuracy = No. of True Positives
No. of Test Faces
Timings
Method Training Testing
kNN-E - 62.2 s
L2-LMNN 4 h 62.2 s
D-LMNN 1 h 53.2 s
DD-LMNN 2 h 50.5 s
L2-ILMNN 24 h 62.2 s
D-ILMNN 8 h 48.2 s
DD-ILMNN 24 h 51.9 s
M-SVM 300 s 446.6 sSVM-KNN - 2114.2 s
Accuracy
Method Experiment 1 Experiment 2
kNN-E 83.6 26.7
L2-LMNN 61.2 22.6
D-LMNN 85.6 24.3
DD-LMNN 84.4 24.5
L2-ILMNN 65.9 24.0
D-ILMNN 87.2 32.0
DD-ILMNN 86.6 29.8
M-SVM 62.3 30.0SVM-KNN 75.5 28.1
Accuracy
Method Experiment 1 Experiment 2
kNN-E 83.6 26.7
L2-LMNN 61.2 22.6
D-LMNN 85.6 24.3
DD-LMNN 84.4 24.5
L2-ILMNN 65.9 24.0
D-ILMNN 87.2 32.0
DD-ILMNN 86.6 29.8
M-SVM 62.3 30.0SVM-KNN 75.5 28.1
Accuracy
Method Experiment 1 Experiment 2
kNN-E 83.6 26.7
L2-LMNN 61.2 22.6
D-LMNN 85.6 24.3
DD-LMNN 84.4 24.5
L2-ILMNN 65.9 24.0
D-ILMNN 87.2 32.0
DD-ILMNN 86.6 29.8
M-SVM 62.3 30.0SVM-KNN 75.5 28.1
Accuracy
Method Experiment 1 Experiment 2
kNN-E 83.6 26.7
L2-LMNN 61.2 22.6
D-LMNN 85.6 24.3
DD-LMNN 84.4 24.5
L2-ILMNN 65.9 24.0
D-ILMNN 87.2 32.0
DD-ILMNN 86.6 29.8
M-SVM 62.3 30.0SVM-KNN 75.5 28.1
True Positives
Conclusions
• Regularizers for LMNN
• Adding invariance to LMNN
• More accurate than Nearest Neighbour
• More accurate than LMNN
Future Research
• D-LMNN and D-ILMNN for Chi-squared distance
• D-LMNN and D-ILMNN for dot product distance
• Handling missing data –
Sivaswamy, Bhattacharya, Smola, JMLR – 2006
• Learning local mappings (adaptive kNN)
Questions ??
False Positives
Precision-Recall Curves
Experiment 1
Precision-Recall Curves
Experiment 1
Precision-Recall Curves
Experiment 1
Precision-Recall Curves
Experiment 2
Precision-Recall Curves
Experiment 2
Precision-Recall Curves
Experiment 2