multiclass svm and applications in object classification yuval kaminka, einat granot advanced topics...

Multiclass SVM and Applications in Object

Classification

Yuval Kaminka, Einat GranotAdvanced Topics in Computer Vision Seminar

Faculty of Mathematics and Computer ScienceWeizmann Institute

May 2007

Outline

Motivation and Introduction

Classification Algorithms K-Nearest neighbors (KNN) SVM

Multiclass SVM

DAGSVM

SVM-KNN

Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

Object Classification

?

Motivation – Human Visual System

Large Number of Categories (~30,000)

Discriminative Process

Small Set of Examples

Invariance to transformation

Similarity to Prototype instead of Features

Similarity to Prototypes Vs Features

No need for Feature Space

Easy to enlarge number of categories

Includes spatial relation between features

Similarity is defined by Distance Function

Easy to adjust to different types (Shape, Texture)

Can include invariance to intra-class transformations

Distance Function

D( , (

Distance Function – simple example

2.12731 . . . .

D( , ( = || 2.1, 27, 31, 15, 8 . . .

- ||13,45,22.5, 78, 91 . . .

D( , ( =?

MNR

Outline



Multiclass SVM

DAGSVM

SVM-KNN


A Classic Classification Problem

X4

X2

X5

X6

X7

X3

q

X1

Training Set S: (X1..Xn), with class label (Y1.. Yn)

Given a query image q, determine its label

Nearest Neighbor (NN)

?

K-Nearest Neighbor (KNN)

?

K = 3

K-NN Pros

Simple, yet outperforms other methods

Low Complexity: O(D ּn) D - the cost per one distance function calculation

No need for Feature Space definition

No computational cost for adding new categories

n ∞ ==> Error Rate Bayes optimal

K-NN Cons

P. Vincent et al., K-local hyperplane and convex distance nearest neighbor algorithms, NIPS 2001

Complete Set Missing Set

NN SVM

Outline



Multiclass SVM

DAGSVM

SVM-KNN


SVM

Two class classification algorithm

We’re looking for a hyperplane that best separates the classes

Class 1

Class 2

Some of the slides on SVM are adapted with permission from Martin Law’s presentation on SVM

SVM - Motivation

Class 1

Class 2

Class 1

Class 2

As far away as possible from the data of both classes

SVM – A learning algorithm

KNN – simple classification, no training

Class 1

Class 2

SVM – a learning algorithm

Two Phases:1. Training – find the hyperplane2. Classification – label a new query

SVM – Training Phase

Class 1

Class 2

W

wTx+b=0

~b

1. Classify correctly the classes2. Give maximum margins

We’re looking for (w,b) that will:

1 .Correct classification

Class 1

Class 2

wTx+b=0

Correct classification: wTxi+b>0 for green, and wTxi+b<0 for red

Assume the labels {y1.. yn} are from the set {-1,1}:

}x1, ..., xn{our training

set

0)( bxwy iT

i

2 .Margin maximization

Class 1

Class 2

m

m? =

2 .Margin maximization

w

1z

Class 1

Class 2|wTz+b|

||w||

We can scale (w,b) (w,b), >0

Won’t change classification: wTx+b>0 wTx+b>0

Get a desired distance: |wTz+b|=a =1/a, |wTz+b|=1

w

1w

m2

m

SVM as an Optimization Problem

Maximize margins

Correct Classification

We can find n, such that:

wm

2

Solve optimization problem with constraints

Langrangian multipliers

C.J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition, 1998.

Support Vectors

Class 1

Class 2

xi with i>0 are called support vectors (SV)

w is determined only by the SV

=0

=0

=0

=0

=0

=0 =0

>0

>0

>0

SVi

iii xyw

n

iiii xyw

1

SVM – Classification phase

Class 1

Compute wTq+b

SVi

Tiii bqxy

Class 2

q

SVi

iii xyw

Classify as class 1 if positive, and class 2 otherwise

Upgrade SVM

1. In order to find 1.. n we need to calculate xiTxj i,j

2. In order to classify a query q we need to calculate:

We only need to calculate inner products

SVi

Tiii

T bqxybqw

Feature Expansion

)(NOR

( )

( )

( )( )( )

( )

( )( )

(.)( )

( )

( )( )

( )

( )

( )( )

( )

Extended spaceInput space

)( kNOR

(.)( 1 , x , y , xy , x2 , y2 )(x , y)

Problem: too expensive!

Solution: The Kernel Trick

)()(),( jT

iji xxxxK

Find a kernel function K such that:


( )

( )

( )( )( )

( )

( )( )

(.)( )

( )

( )( )

( )

( )

( )( )

( )

2)1(),( jiji xxxxK

The Kernel Trick

1. In order to find 1.. n we need to calculate xiTxj i,j

Build a kernel matrix MnXn: M[i,j]= (xi)T(xj)=K(xi,xj)

2. In order to classify a query q we need to calculate wTq+b:

SVi

iiiSVi

Tiii

T bqxKybqxybqw ),()()(


Inner product Distance Function

yxyxyyxxyx ,,,2

1,

22200

2

1yxyx

),()0,()0,(2

1yxdydxd

From “origin” Pairwise distance

In our case: convert to distance function


SVM Pros and Cons

Pros: Easy to integrate different distance functions

Fast classification of new objects (depends on SV)

Good performance even with small set of examples

Cons: Slow training ( O(n2), n=# of vectors in training set )

Separates only 2 classes

Outline



Multiclass SVM

DAGSVM

SVM-KNN


Multiclass SVM

Class 1Class 2

Class 3

Class 5

Class 4

Extend SVM for multi-classes separationNc = number of classes

Two approaches

Class 1Class 2

Class 3

Class 4

Combine multi-binary-classifiers

Generate one function based on single optimization problem

1-vs-rest 1-vs-1 DAGSVM

1-vs-rest

Class 1Class 2

Class 3Class 4

1-vs-rest

Class 1 Class 2

Class 3 Class 4

w1

w3

w4

w2

Nc classifiers

1-vs-rest

Class 1 Class 2

Class 3 Class 4

q

w1

w3

w4

w2

w1Tq+b1 ~ Similarity(q,SV1)

~ Similarity(q,SV3)~ Similarity(q,SV2)

~ Similarity(q,SV4)

1-vs-rest

Class 1 Class 2

Class 3 Class 4

q

w1

w3

w4

w2

Label(q)=

argmax1≤i ≤Nc{Sim(q,SVi)}

1-vs-1

Class 1Class 2

Class 3Class 4

1-vs-1

Class 1Class 2

Class 3

Class 4

W1,2

W1,3

W1,4

W2,3

W3,4

W2,4

Nc(Nc-1)/2 classifiers

1-vs-1 with Max Wins

Class 1Class 2

Class 3

Class 4

W1,2

W1,3

W1,4

W2,3

W3,4

W2,4

q

Sign(w1,2Tq+b1,2) ~ 1 or 2 ?

☺☺☺ ☺

☺☺

~ 1 or 3 ?

~ 1 or 4 ?

~ 2 or 3 ?

~ 3 or 4 ?

~ 2 or 4 ?

1-vs-1 with Max Wins

Class 1Class 2

Class 3

Class 4

W1,2

W1,3

W1,4

W2,3

W3,4

W2,4

q

☺☺☺ ☺

☺☺

What did we have so far?

1-vs-rest1-vs-1

# of classifiers (each need to be

trained and tested)NcNc(Nc-1)/2

# of vectors for training

(per classifier)

n(all vectors)

~2n/Nc

No bound on generalization error

Class 1Class 2

Class 3Class 4

Class 1 Class 2

Class 3Class 4

DAGSVM

Class 1Class 2

Class 3

Class 4

W1,2

W1,3

W1,4

W2,3

W3,4

W2,4

1-vs-1 Decision DAG (DDAG)

1 vs 4

3 vs 4

2 vs 4 1 vs 3

2 vs 3 1 vs 2

1 2 3 4

3 4

2 3 4

1 2

1 2 3

2 3

not 1

not 1not 2 not 3

not 4

not 4

4 123

J. C. Platt et al., Large margin DAGs for multiclass classification. NIPS, 1999.

DDAG on Nc Classes

DAG

Nc leaves, one per class

Single root node

In every node: Binary decision

function

Nc(Nc-1)/2 internal nodes

1 vs 4

3 vs 4

2 vs 4 1 vs 3

2 vs 3 1 vs 2

1 2 3 4

3 4

2 3 4

1 2

1 2 3

2 3

not 1

not 1not 2 not 3

not 4

not 4

4 123

Classification using DDAG

Class 1Class 2

Class 3

Class 4

W1,2

W1,3

W1,4

W2,3

W3,4

W2,4

q1 vs 4

3 vs 4

2 vs 4 1 vs 3

2 vs 3 1 vs 2

1 2 3 4

3 4

2 3 4

1 2

1 2 3

2 3

not 1

not 1not 2 not 3

not 4

not 4

4 123

~ 1 or 4 ?

~ 1 or 3 ?

~ 1 or 2 ?

DAGSVM

Pros: Only Nc-1 classifiers to be tested

Every classifier uses a small set of vectors for training

Bound on generalization error (~margins size)

Cons: Less vectors for training worse classifier?

Nc(Nc-1)/2 classifiers to be trained

DAGSVM Complexity

For training: Assume that every class contains ~n/Nc instances Nc(Nc-1)/2 classifiers, each using ~2n/Nc vectors:

For classifying new objects: Nc-1 classifiers, each is tested once M = max number of SV

)( 2nDO 22 2)2

(2

)1(n

N

nNN

c

cc

)( cNMDO

Multiclass SVM - Summary Training:

Classification:

Error rates: Bound of generalization error - only on DAGSVM In practice – 1-vs-1 and DAGSVM

The “one big optimization” methods Similar error rates Very slow training – limited to small data sets

DAGSVM / 1-vs-11-vs-rest

O(D ּn2)O(D ּNcּn2)

DAGSVM / 1-vs-rest1-vs-1

O(D ּM ּNc)O(D ּM ּNc2)

So what do we have?

Nearest Neighbor (KNN) Fast Suitable for multi-class Easy to integrate different distance functions Problematic with few samples

SVM Good performance even with small set of examples Easy to integrate different distance functions No natural extension to multi-class Slow to train

Class 1

Class 2

SVM KNN - From coarse to fine

Suggestion Hybrid system

KNN SVM

Zhang et al, SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, 2006

Outline



Multiclass SVM

DAGSVM

SVM-KNN


SVM KNN – General Algorithm

1. Calculate distance from query to training images

Training images and query

Query image

KNN

Class 1Class 2

Class 3



2. Pick K nearest neighbors


Query image

KNN

Class 1Class 2

Class 3



2. Pick K nearest neighbors3. Run SVM


Query image

SVM

Class 1Class 2

Class 3

SVM works well with few samples



2. Pick K nearest neighbors3. Run SVM4. Label !


Query image

Query image Class 2

SVM

Class 1Class 2

Class 3

Training + Classification



KNN

SVM

Classic process: Training Classification

SVM-KNN Coarse Classification Training Final classification

Details Details Details



KNN

SVM

Kpotential

Calculating distance is a heavy task Compute crude distance – faster

Finding Kpotential images Ignore all other images

Compute accurate distance Only relative to the Kpotential images

Accurate

L2




Complexity: Crude distance

Accurate distance

)( potentialAccurate KDO

KNN

SVM

)( nDO Crude

Kpotential

Accurate

L2




If K neighbors are from the same class Done

KNN

SVM




Construct pairwise inner product matrix

Improvement – cache distance calculation

),()0,()0,(2

1, yxdydxdyx

KNN

SVM




Selected SVM: DAGSVM (faster)

Complexity:

)( 2KDO Accurate

KNN

SVM

1 vs 4

3 vs 4

2 vs 4 1 vs 3

2 vs 3 1 vs 2

Complexity



Total complexity

DAGSVM training complexity

)( 2KDKDnDO AccuratepotentialAccuratecrude

)( 2nDO Accurate

KNN

SVM

SVM KNN – continuum

Defining an SVM-KNN continuum:

K = 1 K = n (#images)

NN SVMSVMKNN

Biological motivation Human visual system

More than MAJ

SVM KNN Summary

Similarity to prototypes

Combining Advantages from both methods NN – Fast, suitable for multiclass SVM – performs well with few samples and classes

Compatible with many types of distance functions

Biological motivation: Human visual system Discriminative process

Outline



Multiclass SVM

DAGSVM

SVM-KNN


Distance functions

ShapeTexture

D( , ( = ??


Query imageClass 1Class 2

Class 3

Understanding the need - Shape

Well, which is it??

Capturing the shape Distance 1: Shape context Distance 2: Tangent distance

query

Distance 1: Shape context

1. Find point correspondences2. Estimate transformation3. Distance

correspondence quality transformation quality

prototype query

Belongie et al., Shape matching and object recognition using shape contexts, IEEE Trans. (2002(

Find correspondences

Detector - Use edge pointsDescriptor - Create “Landscape”

Relationship to other edge points Histogram of orientations and distances

Count = 5

Count = 6

prototype query

Find correspondence

Detector - Use edge pointsDescriptor - Create “Landscape”

Relationship to other edge points Histogram of orientations and distances

Matching compare histograms ( )

prototype query

2

Distance 1: Shape context

1. Find point correspondences2. Estimate transformation3. Distance

correspondence quality transformation (quality, magnitude)

prototype query

MNIST – Digit DB

70,000 handwritten digitsEach image 28x28

MNIST results

Human error rate – 0.2%Better methods exist < 1%

Error rate (%)

Distance 2: Tangent distance

Distance includes invariance to small changes small rotations translations thickening

Simard et al., Transformation invariance in pattern recognition-tangent distance and tangent propagation . Neural Networks (1998(

Prototype query

Space induced by rotation

α=0

α= -1α= -2

α=1

Pixel space

Dimension = 1

Rotation function

Tangent distance – Visual intuition

Pixel space

Desired distance

P

Q

Prototype Image

Query Image

SP

SQ

But – calculating distance between non linear curves can be difficult

Solution:Use linear approximation

The Tangent

The Tangent

Euclidian distance (L2(

Tangent Distance - General

For every image, create surface allowing transformations

Rotations Translations Thickness, etc.

Find a linear approximation - the tangent plane

Distance Calculate distance between linear planes

Has efficient solutions

7 dimensions

9298 handwritten digits taken from mail envelopes

Each image 16x16

USPS – digit DB

USPS results

Human error rate – 2.5%For L2 –

not optimal DAGSVM has similar results

For tangent NN similar results

PQ

Understanding Texture

How to represent Texture??

Texture samples

Filter responses

for pixel

P1

Texture representation

Represent using responses to a filter bank

0.6

-0.2

.

.

.

0.4

Filter bank – 48 filters

48 Filter responses

for pixel

0.1

0.8

.

.

.

0.3

P2

Filter responses

for pixel

-0.4

-0.7

.

.

.

0.17

P3

….

Texture patch

P1P2

P3

Filter responses in

48-dimensional space

Introducing Textons

Filter responses – points in 48 dimensional spaceA texture patch – spatially repeating

Representation is redundant Select representative responses (K-means)

Textons !

Texture patch

T. Leung, J. Malik Representing and recognizing the visual appearance of materials using three-dimensional textons (2001(

Correspond to pixels

of one image

3.0

...

1.0

4.0

...

6.0

Universal textons

Texton Filter responses in 48-dim space

T1 T2

T3T4

Prototype textures Filter bank

“Building blocks“ for all textures

Filter bank

Distance 3: of Texton histograms

For a query texture1. Create filter responses2. Build texton histogram (using universal textons)

2

Filter responses in 48-dim space

T1 T2

T3T4

Query texture

T1 T2 T3 T4

Query Texton histogram

Distance 3: of Texton histograms

For a query texture1. Create texton histogram2. Build texton histogram (using universal textons)3. Distance compare histograms ( )

2

Prototype Texton histogram

T

1

T

2

T

3

T

4

T

1

T

2

T

3

T

4

T

1

T

2

T

3

T

4

T

1

T

2

T

3

T

4

T

1

T

2

T

3

T

4

T

1

T

2

T

3

T

4

2

T1 T2 T3 T4

Query Texton histogram

Prototype texturesQuery texture

CUReT – texture DB

61 texturesDifferent view pointsDifferent illuminations

CUReT Results

(comparing texton histograms) T

1

T

2

T

3

T

4

Caltech-101 DB

102 categories variations in color, pose, illumination

Distance function combination of texture and shape 2 algorithms Algo. A, Algo. B

Samples from the Caltech-101 DB

Caltech-101 Results

(15 training images(

Still a long way to go…

Algo. B:Using only DAGSVM

(no KNN(

Correct rate(%)

66% correct

Motivation – Human Visual System

Large Number of Categories (~30,000)

Discriminative Process

Small Set of Examples

Invariance to transformation

Similarity to Prototype instead of Features

Summary

Popular methods NN SVM DAGSVM - extension to multi-class SVM

The hybrid method – SVM KNN Motivated by human perception (??) Improved complexity Better methods exist?

A taste of the distance Shape, Texture

Results classification method distance function

T

1

T

2

T

3

T

4PQ

Class 1

Class 2

1 vs 4

3 vs 4

2 vs 4 1 vs 3

2 vs 3 1 vs 2

References

H. Zhang, A. C. Berg, M. Maire and J. Malik. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. IEEE, Vol. 2, pages 2126-2136, 2006.

P. Vincent and Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. NIPS, pages 985-992, 2001.

J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. NIPS, pages 547-553, 1999.

C. Hsu and C. Lin. A comparison of methods for multiclass support vector machines. IEEE, Vol. 13, pages 415-425, 2002.

T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Computation Vision, 43(1):29-44, 2001.

P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks: Tricks of the Trade, pages 239-274, 1998.

S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE, Vol. 24, pages 509-522, 2002.

Thank You!

multiclass svm and applications in object classification yuval kaminka, einat granot advanced topics...

Documents

svm motivation class

w w t x b

class classification

w t z b w

w t x b0 w t x b0

label slide

correct classification

training class