learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Learning A Structured Model For Visual CategoryRecognition

Ashish Gupta

University of Surrey

[email protected]

July 5,2013

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition


Introduction

Introduction : What is Category Recognition?

Feature vector Embedding : Information in Sub-Manifold.

Feature vector distribution: Fuzzy Visual Model.

Estimating semantic structure: Co-clustering.

Sparse Models: Semantically structured.

Summary & Future Work




Motivation

Visual Category?

Robot interacts physical objects.

Object taxonomy based on physicalproperties.

Robot recognizes object usingvisual appearance.




Motivation

Visual Category Model

Appearance variation → scatter of semantically related descriptors in featurespaceCan this scatter distribution be estimated?Can this structure be used to improve the learnt visual model?Visual category model ≈ Visual object model + Estimated structure of visualcategory variation




Approach

Visual Classification Pipeline

Structure in sub-spaces → groups of sub-spaces → dictionary

Structure in dictionary → groups of prototypes → encoding




Approach

Feature Descriptor Matrix

Scene−15 D−SIFT, 500 feature vectors of 128 dimensions

feature vectors

dim

ensio

ns

0

50

100

150

200

250

Matrix of 500 D-SIFT feature descriptors, each of 128 dimensions.




Approach

Encoded Feature Matrix

Conceptual illustration of encoded feature matrix, occurrencehistogram of visual words in images.




Approach

Conceptual Interpretation

Structure estimation can be interpreted as estimation ofsemantically related rows or columns of data matrix. These areprojected to a lower dimensional space such that mutual separationbetween equivalent feature vectors is reduced.




Sub-space Embedding

Feature descriptor space is high dimensional.

Relevant information is embedded in a lower dimensionalsub-manifold.

What is the appropriate lower dimensionality?

Measure efficacy of sub-space embedding method?

Measure information in embedded feature vectors.




Intrinsic Dimensionality

Intrinsic dimensionality p estimation

Correlation Dimension

Number of feature vectors in a hypersphere of radius r is proportional to rp.

Maximum Likelihood Estimate

Expectation of number of feature vectors covered by a hypersphere of growingradius r .

Eigenvalue Estimate

Number of eigenvalues greater than a small threshold value ε.

Geodesic Minimum Spanning Tree

Based on length of GMST of k descriptors in a neighbourhood graph.





Estimated Intrinsic Dimensionality





Subspace Embedding Methods

Global Methods

Principal Components

Multi-DimensionalScaling

Stochastic ProximityEmbedding

Isomap

Diffusion Maps

Local Methods

Locally Linear Embedding

Locality Preserving Projection

Neighbourhood PreservingProjection

Landmark Isomap

t-Stochastic NeighbourhoodEmbedding




Entropic Measure

Entropy Measure Intuition

−10 −5 0 5 10 150

20

40−15

−10

−5

0

5

10

15

x

’swiss’ synthetic data

Y

Z

−1.5−1 −0.5

0 0.51 1.5

−1

−0.5

0

0.5

1−5

0

5

10

X

’intersect’ synthetic data

Y

Z

−400 −200 0 200 400

−500

0

500−300

−200

−100

0

100

200

X

’VOC2006,car’ data

Y

Z

0 10 20 30 40 50 60 70 80 90 1000

0.005

0.01

0.015

0.02

0.025

Bin index

Norm

aliz

ed F

requency

Distribution of pair−wise distances in data

swiss, H=−25.3355

intersect, H=−19.3150

VOC2006,car, H=−33.0302




Empirical Results

Comparison of Embedded Entropy




Empirical Results

Computational Time Complexity




Empirical Results

Classification Performance




Empirical Results

Conclusion

Estimated intrinsic dimensionality was in the neighbourhoodof 14 of the 128-dimensional descriptor.

The performance of LPP in comparison to other embeddingmethods accentuates the importance of modelling structure inlocal distributions.




Fuzzy Visual Model

Structure in distribution of descriptors in feature space?

Issues with K-means clustering in the Bag-of-Words model.

Visual model incorporating Fuzzy logic framework.




Visual Ambiguity

Descriptor assignment has issues of uncertainty andplausibility.

Kernel Codebook uses soft-assignment to resolve theambiguity.




Fuzzy Models

Visual Dictionary

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

times (normalized scale)

acce

lera

tio

n (

no

rma

lize

d s

ca

le)

K−means Hard Partition | Motorcycle Data

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


acce

lera

tio

n (

no

rma

lize

d s

ca

le)

Fuzzy K−Means Partition | Motorcycle Data

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


acce

lera

tio

n (

no

rma

lize

d s

ca

le)

Gustafson−Kessel Fuzzy Partition | Motorcycle Data

L(Z;µC) =r∑

j=1

∑i∈Cj

‖ zi − µCj ‖2

L(Z; D,A) =r∑

i=1

n∑j=1

(αij)m ‖ zj − µCi ‖

2Σ

L(Z; D,A, Σi) =r∑

i=1

n∑j=1

(αij)m ‖ zj − di ‖2Σi




Fuzzy Models

d2Σ(z, µC) = (z−µC)TΣ(z−µC)

Σ =

( 1σ1

)2 0 · · · 0

0 ( 1σ2

)2 · · · 0...

.... . .

...0 0 · · · ( 1

σn)p

d2Σi

(zj , µCi ) = (zj−µCi )TΣi (zj−µCi )

Fi =

∑nj=1(αij)

m(zj − di )(zj − di )T∑n

j=1(αij)m

Σi =(ρi det(Fi ))

1p

Fi




Empirical Results

FKM Classification Performance

MITcoast

MITmountainindustrial

livingroom

MITopencountryPARoffice

MITtallbuilding

CALsuburbstorebedroom

MITforest

MIThighwayMITstreet

MITinsidecitykitchen

visual category

0.5

0.6

0.7

0.8

Acc

Scene15

Bag-of-WordsFuzzy K-means

sheep

horse

bicycl

e

motorbi

ke cow bus

dog cat

perso

n car

visual category

0.45

0.50

0.55

0.60

Acc

VOC2006





Empirical Results

GK Classification Performance

MITcoast

MITmountainindustrial

livingroom

MITopencountryPARoffice

MITtallbuilding

CALsuburbstorebedroom

MITforest

MIThighwayMITstreet

MITinsidecitykitchen

visual category

0.5

0.6

0.7

0.8

Acc

Scene15

Bag-of-WordsGustafson-Kessel

sheep horse bicycle motorbike cow bus dog cat person car

visual category

0.45

0.50

0.55

0.60

Acc

VOC2006





Empirical Results

Dictionary Size

32 64 128 256 512dictionary size

0.58

0.60

0.62

0.64

0.66

Acc

Caltech101


32 64 128 256 512dictionary size

0.58

0.60

0.62

0.64

0.66

Acc

Caltech101


Comparison of BoW with FKM and GK for different sizes ofdictionary.




Empirical Results

Aggregate Performance

VOC2006 VOC2010data set

0.50

0.51

0.52

0.53

0.54

0.55

Acc

Bag-of-WordsFuzzy K-meansGustafson-Kessel

(a) VOC datasets

Caltech101 Caltech256data set

0.60

0.62

0.64

0.66

0.68

Acc

Bag-of-WordsFuzzy K-meansGustafson-Kessel

(b) Caltech datasets

Visual Model Data SetVOC-2006 VOC-2010 Caltech-101 Caltech-256

BoW 0.50825 0.52446 0.60111 0.67606FKM 0.52635 0.53736 0.61928 0.68357G-K 0.52885 0.54224 0.62413 0.68623




Empirical Results

Conclusion

Visual model learnt within the framework of fuzzy logic adaptsto the local distribution of feature vectors.

Learning a better fuzzy membership function is an effectivealternative to learning increasing large dictionaries to adapt toincreasing complexity of visual categories.




Co-clustering for Structure Estimation

What is co-clustering?

Co-clustering for structure in descriptor data matrix.

Co-clustering for structure in encoded feature matrix.




Co-clustering Methods

Co-clustering

Co-clustering is simultaneous and alternative row and columnclustering of a data matrix.

At each step of the optimization routine, the groups of rowsguide column clustering and vice versa.

CX : x1, . . . , xm 7→ x1, . . . , xkCY : y1, . . . , yn 7→ y1, . . . , yl





Co-clustering methods

Information-Theoretic Co-Clustering

Data matrix is considered a joint probability distribution.Minimizes KL-divergence between original data and co-clusteredmatrices.

Sum-Squared Residue Co-Clustering

Alternative k-means clustering of rows and columns. Minimizessquared Euclidean distance between rows and columns from rowand column means respectively.





Information-Theoretic Co-clustering

I (X ;Y )− I (X ; Y ) = dKL(p(X ,Y ), q(X ,Y ))




Multiple Sub-spaces

Mutiple Sub-spaces Intuition

∑i ,j

dE (z•i |Sl , z•j |Sq) >

∑i ,j

dE (z•i , z•j ), l 6= q




Multiple Sub-spaces

Co-clustering descriptor data matrix

Scene−15 D−SIFT, 500 feature vectors of 128 dimensions

feature vectors

dim

ensio

ns

0

50

100

150

200

250

Information−Theoretic Co−Clustering of Scene−15 D−SIFT 500x128 into 10 row and 10 column clusters

feature vectors

dim

ensio

ns

0

50

100

150

200

250




Multiple Sub-spaces

Dictionary on single and multiple sub-spaces

Universal PCA Dictionary : VOC−2006 : D−SIFT : 10 x 500 : PCA + Kmeans

dictionary [500]

dim

en

sio

ns [

10

] P

CA

0

100

200

Universal CC Dictionary : VOC−2006 : D−SIFT : 10 x 500 : SSRCC + Kmeans

dictionary [500]

dim

en

sio

ns [

10

] C

C

0

100

200




Multiple Sub-spaces

Classification performance

VOC2006 VOC2007Data Set

0.50

0.55

0.60

0.65

0.70

F1

Dict: 10x1000MSSD:(i): 5x1000MSSD:(r): 5x1000


0.50

0.55

0.60

0.65

F1

Dict: 10x1000MSSD:(i): 10x1000MSSD:(r): 10x1000

Comparison of classification performance of single and multiple sub-spacedictionaries.




Multiple Sub-spaces

Dictionary projected to multiple sub-spaces

Universal Dictionary : VOC−2006 : D−SIFT : 128x500 : Kmeans

dictionary [500]

dim

ensio

ns [128]

0

50

100

150

200

250

Universal Submanifold Dictionary : VOC−2006 : D−SIFT : 128 (10) x 500 : SSRCC + Kmeans

dictionary [500]

dim

ensio

ns [128], s

ubm

anifold

s [10]

0

50

100

150

200

250




Multiple Sub-spaces

Classification performance


0.50

0.55

0.60

0.65

F1 (5)

Dict: 128x1000SSSD:(i): 128x1000SSSD:(r): 128x1000


0.50

0.55

0.60

0.65

0.70

F1 (50)

Dict: 128x1000SSSD:(i): 128x1000SSSD:(r): 128x1000

Comparison of classification performance of dictionary projected to multiplesub-spaces.




Topic Dictionary

Structure in Dictionary Intuition

Estimating groups of non-contiguous partitions of feature spacethat are semantically related.




Topic Dictionary

Topic Dictionary Concept




Topic Dictionary

Classification Performance

Comparison of classification performance of dictionaries using BoWand ITCC, for VOC2006 and Scene15 datasets.




Topic Dictionary

Dictionary sizes

VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F1

BoW: 100CC:i: 100


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F1

BoW: 500CC:i: 500


0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F1

BoW: 1000CC:i: 1000

Comparative classification performance for different dictionarysizes.




Topic Dictionary

Conclusion

Groups of sub-spaces computed using co-clustering yieldeddictionaries with better classification performance.

Groups of feature space partition (dictionary elements) yieldedimproved classification results.

These estimated groups can be used in learning a semanticallystructured visual model.




Sparse Decomposition




Sparse Visual Model

Sparse model approximates a feature vector as a combinationof a sub-set of an over-complete basis set.

Sparsity is induced by adding a regularization constraint isadded to the coefficients in the loss function.

Degree of sparsity is determined empirically.

Each basis element is considered individually.

Possible structure amongst basis elements is disregarded.




Structured Sparse Model

SSPCA (structure in sub-spaces)

Co-clustered groups of sub-spaces is used to augment Sparse-PCAto compute Structured Sparse-PCA dictionary.

Group Lasso (structure in dictionary)

Co-clustered groups of dictionary elements is used to augmentLasso to compute group Lasso feature encoding.




Sparse Regularization

Sparse regularization : minα

1

n

n∑i=1

L(zi , dαi ) + λΩ(α)

Lasso : minα

1

n

n∑i=1

‖ zi −Dαi ‖2 +λ ‖ αi ‖1

Group Sparsity : minα

1

n

n∑i=1

‖ zi −Dαi ‖2 +λk∑

j=1

‖ αi ‖Gj




Structured Sub-space

Structured Sub-space Dictionary using ITCC

sheep

horse

bicycle

motorbike cow bu

sdog cat

person car

Visual Category

50

60

70

80

90

mAP

VOC2006

Sparse SubspaceStructured Subspace

sheephorsebicycle

aerop

lanecow sof

abusdog cat

perso

ntrain

dining

table

bottleca

r

pottedplan

t

tvmonitor

chairbir

dboat

motorbike

Visual Category

50

60

70

80

90

mAP

VOC2007






Structured Sub-space Dictionary using SSRCC

sheep

horse

bicycl

e

motor

bike cow bu

sdog cat

perso

n car

Visual Category

60

70

80

90

mAP

VOC2006


sheephorsebicycle

aerop

lanecow sof

abusdog cat

perso

ntrain

dining

table

bottleca

r

pottedplan

t

tvmonitor

chairbir

dboat

motorbike

Visual Category

50

60

70

80

90

mAP

VOC2007






Sparse Subspace Structured Sparse Subspace

Data Set ITCC SSRCC

VOC2006 67.5941 70.8295 68.5808

VOC2007 67.9971 68.0783 68.3718

Sparse selection of semantically related set of sub-spacesperforms better than sparse individual selection of sub-spaces.




Structured Sparse Dictionary

Structured Sparse Encoding using ITCC

MITcoa

st

MITmo

untain

indust

rial

livingroo

m

MITopencou

ntry

PARoffic

e

MITtallbu

ilding

CALsu

burbsto

re

bedro

om

MITforest

MIThig

hway

MITstreet

MITins

idecity

kitchen

Visual Category

50

60

70

80

90

mAP

Scene15 ITCC

Sparse EncodingStructured Encoding

sheep

horse

bicycl

e

motorbi

ke cow bus

dog cat

perso

n car

Visual Category

60

70

80

90

100

mAP

VOC2006 ITCC






Structured Sparse Encoding using SSRCC

MITcoa

st

MITmo

untain

indust

rial

livingroo

m

MITopencou

ntry

PARoffic

e

MITtallbu

ilding

CALsu

burbsto

re

bedro

om

MITforest

MIThig

hway

MITstreet

MITinside

city

kitchen

Visual Category

50

55

60

65

70

75

80

85

mAP

Scene15 SSRCC


sheep

horse

bicycl

e

motorbi

ke cow bus

dog cat

perso

n car

Visual Category

60

70

80

90

100

mAP

VOC2006 SSRCC






Sparse Encoding Structured Sparse Encoding

Data Set ITCC SSRCC

VOC-2006 72.8386 73.3977 72.7738

Scene-15 68.5737 79.8794 72.1155

Sparse selection of semantically related set of dictionaryelements performs better than sparse individual selection ofdictionary element.




Summary

Learning semantically relevant structure in feature space usedto compute better visual models.

Analysis of sub-space embedding emphasized modelling localdistributions.

Incorporation of fuzzy logic framework to learn dictionarykernels that adapt to local distributions yielded better visualmodels.

Co-clustering was successful in grouping semantically relatedsub-spaces and feature space partitions.

Estimated groups of sub-spaces and dictionary elements wereused to compute structured sparse visual models, improvingupon regular sparse models.




Future Work

Future Work

Visual models using Fisher Kernel coding, which uses aGaussian kernel, has been very successful. Combining theapproach in Fisher Kernels with the learnt Fuzzy membershipfunctions could potentially improve the visual model.Fuzzy logic based learning algorithms that are more advancedthan Gustafson-Kessel could be explored to learn bettermembership functions.Co-clustering creates a block factorization of the data matrix.Partial membership of rows and columns to the co-clusterswould be the natural next step.Explore ways of using semantic structure to improve featuregeneration techniques like hierarchical models that aim tolearn category specific descriptors.




Future Work

End

Questions...




Appendices

BoW Partitioning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

Bag−of−Words Partition | VOC−2006 | #000017

Figure: Bag-of-Words model and image ‘000017’ in VOC-2006 dataset. The dictionary of size 25 () iscomputed using K-means clustering. The feature vectors () are projected to 2 dimensions using PCA.




Appendices

FKM Partitioning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

Fuzzy K−means Fuzzy Partition | VOC−2006 | #000017

Figure: Fuzzy K-means model and image ‘000017’ in VOC-2006 dataset.




Appendices

GK Partitioning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

Gustafson−Kessel Fuzzy Partition | VOC−2006 | #000017

Figure: Gustafson-Kessel model and image ‘000017’ in VOC-2006 dataset.



learning a structured model for visual category recognition

Technology

ashish gupta

gustafsonkessel

encoded feature

semantically

sparse individual

clustering

data matrix

500 feature