multiclass svm and applications in object classification yuval kaminka, einat granot advanced topics...
TRANSCRIPT
Multiclass SVM and Applications in Object
Classification
Yuval Kaminka, Einat GranotAdvanced Topics in Computer Vision Seminar
Faculty of Mathematics and Computer ScienceWeizmann Institute
May 2007
Outline
Motivation and Introduction
Classification Algorithms K-Nearest neighbors (KNN) SVM
Multiclass SVM
DAGSVM
SVM-KNN
Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)
Object Classification
?
Motivation – Human Visual System
Large Number of Categories (~30,000)
Discriminative Process
Small Set of Examples
Invariance to transformation
Similarity to Prototype instead of Features
Similarity to Prototypes Vs Features
No need for Feature Space
Easy to enlarge number of categories
Includes spatial relation between features
Similarity is defined by Distance Function
Easy to adjust to different types (Shape, Texture)
Can include invariance to intra-class transformations
Distance Function
D( , (
Distance Function – simple example
2.12731 . . . .
D( , ( = || 2.1, 27, 31, 15, 8 . . .
- ||13,45,22.5, 78, 91 . . .
D( , ( =?
MNR
Outline
Motivation and Introduction
Classification Algorithms K-Nearest neighbors (KNN) SVM
Multiclass SVM
DAGSVM
SVM-KNN
Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)
A Classic Classification Problem
X4
X2
X5
X6
X7
X3
q
X1
Training Set S: (X1..Xn), with class label (Y1.. Yn)
Given a query image q, determine its label
Nearest Neighbor (NN)
?
K-Nearest Neighbor (KNN)
?
K = 3
K-NN Pros
Simple, yet outperforms other methods
Low Complexity: O(D ּn) D - the cost per one distance function calculation
No need for Feature Space definition
No computational cost for adding new categories
n ∞ ==> Error Rate Bayes optimal
K-NN Cons
P. Vincent et al., K-local hyperplane and convex distance nearest neighbor algorithms, NIPS 2001
Complete Set Missing Set
NN SVM
Outline
Motivation and Introduction
Classification Algorithms K-Nearest neighbors (KNN) SVM
Multiclass SVM
DAGSVM
SVM-KNN
Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)
SVM
Two class classification algorithm
We’re looking for a hyperplane that best separates the classes
Class 1
Class 2
Some of the slides on SVM are adapted with permission from Martin Law’s presentation on SVM
SVM - Motivation
Class 1
Class 2
Class 1
Class 2
As far away as possible from the data of both classes
SVM – A learning algorithm
KNN – simple classification, no training
Class 1
Class 2
SVM – a learning algorithm
Two Phases:1. Training – find the hyperplane2. Classification – label a new query
SVM – Training Phase
Class 1
Class 2
W
wTx+b=0
~b
1. Classify correctly the classes2. Give maximum margins
We’re looking for (w,b) that will:
1 .Correct classification
Class 1
Class 2
wTx+b=0
Correct classification: wTxi+b>0 for green, and wTxi+b<0 for red
Assume the labels {y1.. yn} are from the set {-1,1}:
}x1, ..., xn{our training
set
0)( bxwy iT
i
2 .Margin maximization
Class 1
Class 2
m
m? =
2 .Margin maximization
w
1z
Class 1
Class 2|wTz+b|
||w||
We can scale (w,b) (w,b), >0
Won’t change classification: wTx+b>0 wTx+b>0
Get a desired distance: |wTz+b|=a =1/a, |wTz+b|=1
w
1w
m2
m
SVM as an Optimization Problem
Maximize margins
Correct Classification
We can find n, such that:
wm
2
Solve optimization problem with constraints
Langrangian multipliers
C.J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition, 1998.
Support Vectors
Class 1
Class 2
xi with i>0 are called support vectors (SV)
w is determined only by the SV
=0
=0
=0
=0
=0
=0 =0
>0
>0
>0
SVi
iii xyw
n
iiii xyw
1
SVM – Classification phase
Class 1
Compute wTq+b
SVi
Tiii bqxy
Class 2
q
SVi
iii xyw
Classify as class 1 if positive, and class 2 otherwise
Upgrade SVM
1. In order to find 1.. n we need to calculate xiTxj i,j
2. In order to classify a query q we need to calculate:
We only need to calculate inner products
SVi
Tiii
T bqxybqw
Feature Expansion
)(NOR
( )
( )
( )( )( )
( )
( )( )
(.)( )
( )
( )( )
( )
( )
( )( )
( )
Extended spaceInput space
)( kNOR
(.)( 1 , x , y , xy , x2 , y2 )(x , y)
Problem: too expensive!
Solution: The Kernel Trick
)()(),( jT
iji xxxxK
Find a kernel function K such that:
We only need to calculate inner products
( )
( )
( )( )( )
( )
( )( )
(.)( )
( )
( )( )
( )
( )
( )( )
( )
2)1(),( jiji xxxxK
The Kernel Trick
1. In order to find 1.. n we need to calculate xiTxj i,j
Build a kernel matrix MnXn: M[i,j]= (xi)T(xj)=K(xi,xj)
2. In order to classify a query q we need to calculate wTq+b:
SVi
iiiSVi
Tiii
T bqxKybqxybqw ),()()(
We only need to calculate inner products
Inner product Distance Function
yxyxyyxxyx ,,,2
1,
22200
2
1yxyx
),()0,()0,(2
1yxdydxd
From “origin” Pairwise distance
In our case: convert to distance function
We only need to calculate inner products
SVM Pros and Cons
Pros: Easy to integrate different distance functions
Fast classification of new objects (depends on SV)
Good performance even with small set of examples
Cons: Slow training ( O(n2), n=# of vectors in training set )
Separates only 2 classes
Outline
Motivation and Introduction
Classification Algorithms K-Nearest neighbors (KNN) SVM
Multiclass SVM
DAGSVM
SVM-KNN
Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)
Multiclass SVM
Class 1Class 2
Class 3
Class 5
Class 4
Extend SVM for multi-classes separationNc = number of classes
Two approaches
Class 1Class 2
Class 3
Class 4
Combine multi-binary-classifiers
Generate one function based on single optimization problem
1-vs-rest 1-vs-1 DAGSVM
1-vs-rest
Class 1Class 2
Class 3Class 4
1-vs-rest
Class 1 Class 2
Class 3 Class 4
w1
w3
w4
w2
Nc classifiers
1-vs-rest
Class 1 Class 2
Class 3 Class 4
q
w1
w3
w4
w2
w1Tq+b1 ~ Similarity(q,SV1)
~ Similarity(q,SV3)~ Similarity(q,SV2)
~ Similarity(q,SV4)
1-vs-rest
Class 1 Class 2
Class 3 Class 4
q
w1
w3
w4
w2
Label(q)=
argmax1≤i ≤Nc{Sim(q,SVi)}
1-vs-1
Class 1Class 2
Class 3Class 4
1-vs-1
Class 1Class 2
Class 3
Class 4
W1,2
W1,3
W1,4
W2,3
W3,4
W2,4
Nc(Nc-1)/2 classifiers
1-vs-1 with Max Wins
Class 1Class 2
Class 3
Class 4
W1,2
W1,3
W1,4
W2,3
W3,4
W2,4
q
Sign(w1,2Tq+b1,2) ~ 1 or 2 ?
☺☺☺ ☺
☺☺
~ 1 or 3 ?
~ 1 or 4 ?
~ 2 or 3 ?
~ 3 or 4 ?
~ 2 or 4 ?
1-vs-1 with Max Wins
Class 1Class 2
Class 3
Class 4
W1,2
W1,3
W1,4
W2,3
W3,4
W2,4
q
☺☺☺ ☺
☺☺
What did we have so far?
1-vs-rest1-vs-1
# of classifiers (each need to be
trained and tested)NcNc(Nc-1)/2
# of vectors for training
(per classifier)
n(all vectors)
~2n/Nc
No bound on generalization error
Class 1Class 2
Class 3Class 4
Class 1 Class 2
Class 3Class 4
DAGSVM
Class 1Class 2
Class 3
Class 4
W1,2
W1,3
W1,4
W2,3
W3,4
W2,4
1-vs-1 Decision DAG (DDAG)
1 vs 4
3 vs 4
2 vs 4 1 vs 3
2 vs 3 1 vs 2
1 2 3 4
3 4
2 3 4
1 2
1 2 3
2 3
not 1
not 1not 2 not 3
not 4
not 4
4 123
J. C. Platt et al., Large margin DAGs for multiclass classification. NIPS, 1999.
DDAG on Nc Classes
DAG
Nc leaves, one per class
Single root node
In every node: Binary decision
function
Nc(Nc-1)/2 internal nodes
1 vs 4
3 vs 4
2 vs 4 1 vs 3
2 vs 3 1 vs 2
1 2 3 4
3 4
2 3 4
1 2
1 2 3
2 3
not 1
not 1not 2 not 3
not 4
not 4
4 123
Classification using DDAG
Class 1Class 2
Class 3
Class 4
W1,2
W1,3
W1,4
W2,3
W3,4
W2,4
q1 vs 4
3 vs 4
2 vs 4 1 vs 3
2 vs 3 1 vs 2
1 2 3 4
3 4
2 3 4
1 2
1 2 3
2 3
not 1
not 1not 2 not 3
not 4
not 4
4 123
~ 1 or 4 ?
~ 1 or 3 ?
~ 1 or 2 ?
DAGSVM
Pros: Only Nc-1 classifiers to be tested
Every classifier uses a small set of vectors for training
Bound on generalization error (~margins size)
Cons: Less vectors for training worse classifier?
Nc(Nc-1)/2 classifiers to be trained
DAGSVM Complexity
For training: Assume that every class contains ~n/Nc instances Nc(Nc-1)/2 classifiers, each using ~2n/Nc vectors:
For classifying new objects: Nc-1 classifiers, each is tested once M = max number of SV
)( 2nDO 22 2)2
(2
)1(n
N
nNN
c
cc
)( cNMDO
Multiclass SVM - Summary Training:
Classification:
Error rates: Bound of generalization error - only on DAGSVM In practice – 1-vs-1 and DAGSVM
The “one big optimization” methods Similar error rates Very slow training – limited to small data sets
DAGSVM / 1-vs-11-vs-rest
O(D ּn2)O(D ּNcּn2)
DAGSVM / 1-vs-rest1-vs-1
O(D ּM ּNc)O(D ּM ּNc2)
So what do we have?
Nearest Neighbor (KNN) Fast Suitable for multi-class Easy to integrate different distance functions Problematic with few samples
SVM Good performance even with small set of examples Easy to integrate different distance functions No natural extension to multi-class Slow to train
Class 1
Class 2
SVM KNN - From coarse to fine
Suggestion Hybrid system
KNN SVM
Zhang et al, SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, 2006
Outline
Motivation and Introduction
Classification Algorithms K-Nearest neighbors (KNN) SVM
Multiclass SVM
DAGSVM
SVM-KNN
Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)
SVM KNN – General Algorithm
1. Calculate distance from query to training images
Training images and query
Query image
KNN
Class 1Class 2
Class 3
SVM KNN – General Algorithm
1. Calculate distance from query to training images
2. Pick K nearest neighbors
Training images and query
Query image
KNN
Class 1Class 2
Class 3
SVM KNN – General Algorithm
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM
Training images and query
Query image
SVM
Class 1Class 2
Class 3
SVM works well with few samples
SVM KNN – General Algorithm
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
Training images and query
Query image
Query image Class 2
SVM
Class 1Class 2
Class 3
Training + Classification
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
KNN
SVM
Classic process: Training Classification
SVM-KNN Coarse Classification Training Final classification
Details Details Details
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
KNN
SVM
Kpotential
Calculating distance is a heavy task Compute crude distance – faster
Finding Kpotential images Ignore all other images
Compute accurate distance Only relative to the Kpotential images
Accurate
L2
Details Details Details
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
Complexity: Crude distance
Accurate distance
)( potentialAccurate KDO
KNN
SVM
)( nDO Crude
Kpotential
Accurate
L2
Details Details Details
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
If K neighbors are from the same class Done
KNN
SVM
Details Details Details
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
Construct pairwise inner product matrix
Improvement – cache distance calculation
),()0,()0,(2
1, yxdydxdyx
KNN
SVM
Details Details Details
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
Selected SVM: DAGSVM (faster)
Complexity:
)( 2KDO Accurate
KNN
SVM
1 vs 4
3 vs 4
2 vs 4 1 vs 3
2 vs 3 1 vs 2
Complexity
1. Calculate distance from query to training images
2. Pick K nearest neighbors3. Run SVM4. Label !
Total complexity
DAGSVM training complexity
)( 2KDKDnDO AccuratepotentialAccuratecrude
)( 2nDO Accurate
KNN
SVM
SVM KNN – continuum
Defining an SVM-KNN continuum:
K = 1 K = n (#images)
NN SVMSVMKNN
Biological motivation Human visual system
More than MAJ
SVM KNN Summary
Similarity to prototypes
Combining Advantages from both methods NN – Fast, suitable for multiclass SVM – performs well with few samples and classes
Compatible with many types of distance functions
Biological motivation: Human visual system Discriminative process
Outline
Motivation and Introduction
Classification Algorithms K-Nearest neighbors (KNN) SVM
Multiclass SVM
DAGSVM
SVM-KNN
Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)
Distance functions
ShapeTexture
D( , ( = ??
Training images and query
Query imageClass 1Class 2
Class 3
Understanding the need - Shape
Well, which is it??
Capturing the shape Distance 1: Shape context Distance 2: Tangent distance
query
Distance 1: Shape context
1. Find point correspondences2. Estimate transformation3. Distance
correspondence quality transformation quality
prototype query
Belongie et al., Shape matching and object recognition using shape contexts, IEEE Trans. (2002(
Find correspondences
Detector - Use edge pointsDescriptor - Create “Landscape”
Relationship to other edge points Histogram of orientations and distances
Count = 5
Count = 6
prototype query
Find correspondence
Detector - Use edge pointsDescriptor - Create “Landscape”
Relationship to other edge points Histogram of orientations and distances
Matching compare histograms ( )
prototype query
2
Distance 1: Shape context
1. Find point correspondences2. Estimate transformation3. Distance
correspondence quality transformation (quality, magnitude)
prototype query
MNIST – Digit DB
70,000 handwritten digitsEach image 28x28
MNIST results
Human error rate – 0.2%Better methods exist < 1%
Error rate (%)
Distance 2: Tangent distance
Distance includes invariance to small changes small rotations translations thickening
Simard et al., Transformation invariance in pattern recognition-tangent distance and tangent propagation . Neural Networks (1998(
Prototype query
Space induced by rotation
α=0
α= -1α= -2
α=1
Pixel space
Dimension = 1
Rotation function
Tangent distance – Visual intuition
Pixel space
Desired distance
P
Q
Prototype Image
Query Image
SP
SQ
But – calculating distance between non linear curves can be difficult
Solution:Use linear approximation
The Tangent
The Tangent
Euclidian distance (L2(
Tangent Distance - General
For every image, create surface allowing transformations
Rotations Translations Thickness, etc.
Find a linear approximation - the tangent plane
Distance Calculate distance between linear planes
Has efficient solutions
7 dimensions
9298 handwritten digits taken from mail envelopes
Each image 16x16
USPS – digit DB
USPS results
Human error rate – 2.5%For L2 –
not optimal DAGSVM has similar results
For tangent NN similar results
PQ
Understanding Texture
How to represent Texture??
Texture samples
Filter responses
for pixel
P1
Texture representation
Represent using responses to a filter bank
0.6
-0.2
.
.
.
0.4
Filter bank – 48 filters
48 Filter responses
for pixel
0.1
0.8
.
.
.
0.3
P2
Filter responses
for pixel
-0.4
-0.7
.
.
.
0.17
P3
….
Texture patch
P1P2
P3
Filter responses in
48-dimensional space
Introducing Textons
Filter responses – points in 48 dimensional spaceA texture patch – spatially repeating
Representation is redundant Select representative responses (K-means)
Textons !
Texture patch
T. Leung, J. Malik Representing and recognizing the visual appearance of materials using three-dimensional textons (2001(
Correspond to pixels
of one image
3.0
...
1.0
4.0
...
6.0
Universal textons
Texton Filter responses in 48-dim space
T1 T2
T3T4
Prototype textures Filter bank
“Building blocks“ for all textures
Filter bank
Distance 3: of Texton histograms
For a query texture1. Create filter responses2. Build texton histogram (using universal textons)
2
Filter responses in 48-dim space
T1 T2
T3T4
Query texture
T1 T2 T3 T4
Query Texton histogram
Distance 3: of Texton histograms
For a query texture1. Create texton histogram2. Build texton histogram (using universal textons)3. Distance compare histograms ( )
2
Prototype Texton histogram
T
1
T
2
T
3
T
4
T
1
T
2
T
3
T
4
T
1
T
2
T
3
T
4
T
1
T
2
T
3
T
4
T
1
T
2
T
3
T
4
T
1
T
2
T
3
T
4
2
T1 T2 T3 T4
Query Texton histogram
Prototype texturesQuery texture
CUReT – texture DB
61 texturesDifferent view pointsDifferent illuminations
CUReT Results
(comparing texton histograms) T
1
T
2
T
3
T
4
Caltech-101 DB
102 categories variations in color, pose, illumination
Distance function combination of texture and shape 2 algorithms Algo. A, Algo. B
Samples from the Caltech-101 DB
Caltech-101 Results
(15 training images(
Still a long way to go…
Algo. B:Using only DAGSVM
(no KNN(
Correct rate(%)
66% correct
Motivation – Human Visual System
Large Number of Categories (~30,000)
Discriminative Process
Small Set of Examples
Invariance to transformation
Similarity to Prototype instead of Features
Summary
Popular methods NN SVM DAGSVM - extension to multi-class SVM
The hybrid method – SVM KNN Motivated by human perception (??) Improved complexity Better methods exist?
A taste of the distance Shape, Texture
Results classification method distance function
T
1
T
2
T
3
T
4PQ
Class 1
Class 2
1 vs 4
3 vs 4
2 vs 4 1 vs 3
2 vs 3 1 vs 2
References
H. Zhang, A. C. Berg, M. Maire and J. Malik. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. IEEE, Vol. 2, pages 2126-2136, 2006.
P. Vincent and Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. NIPS, pages 985-992, 2001.
J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. NIPS, pages 547-553, 1999.
C. Hsu and C. Lin. A comparison of methods for multiclass support vector machines. IEEE, Vol. 13, pages 415-425, 2002.
T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Computation Vision, 43(1):29-44, 2001.
P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks: Tricks of the Trade, pages 239-274, 1998.
S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE, Vol. 24, pages 509-522, 2002.
Thank You!