learning a structured model for visual category recognition
DESCRIPTION
Learning a Structured Model for Visual Category Recognition Abstract: This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct representation of training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model. Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several sub-space embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to ambiguity in feature encoding. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits. To estimate structure amongst sub-spaces, co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold computed using principal components. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as `topics'. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the groups of sub-spaces and dictionary elements have semantic relevance. All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two factor matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the vectors of the dictionary or co-efficient matrices. ....TRANSCRIPT
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Learning A Structured Model For Visual CategoryRecognition
Ashish Gupta
University of Surrey
July 5,2013
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Introduction
Introduction : What is Category Recognition?
Feature vector Embedding : Information in Sub-Manifold.
Feature vector distribution: Fuzzy Visual Model.
Estimating semantic structure: Co-clustering.
Sparse Models: Semantically structured.
Summary & Future Work
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Motivation
Visual Category?
Robot interacts physical objects.
Object taxonomy based on physicalproperties.
Robot recognizes object usingvisual appearance.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Motivation
Visual Category Model
Appearance variation → scatter of semantically related descriptors in featurespaceCan this scatter distribution be estimated?Can this structure be used to improve the learnt visual model?Visual category model ≈ Visual object model + Estimated structure of visualcategory variation
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Visual Classification Pipeline
Structure in sub-spaces → groups of sub-spaces → dictionary
Structure in dictionary → groups of prototypes → encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Feature Descriptor Matrix
Scene−15 D−SIFT, 500 feature vectors of 128 dimensions
feature vectors
dim
ensio
ns
0
50
100
150
200
250
Matrix of 500 D-SIFT feature descriptors, each of 128 dimensions.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Encoded Feature Matrix
Conceptual illustration of encoded feature matrix, occurrencehistogram of visual words in images.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Approach
Conceptual Interpretation
Structure estimation can be interpreted as estimation ofsemantically related rows or columns of data matrix. These areprojected to a lower dimensional space such that mutual separationbetween equivalent feature vectors is reduced.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sub-space Embedding
Feature descriptor space is high dimensional.
Relevant information is embedded in a lower dimensionalsub-manifold.
What is the appropriate lower dimensionality?
Measure efficacy of sub-space embedding method?
Measure information in embedded feature vectors.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Intrinsic dimensionality p estimation
Correlation Dimension
Number of feature vectors in a hypersphere of radius r is proportional to rp.
Maximum Likelihood Estimate
Expectation of number of feature vectors covered by a hypersphere of growingradius r .
Eigenvalue Estimate
Number of eigenvalues greater than a small threshold value ε.
Geodesic Minimum Spanning Tree
Based on length of GMST of k descriptors in a neighbourhood graph.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Estimated Intrinsic Dimensionality
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Intrinsic Dimensionality
Subspace Embedding Methods
Global Methods
Principal Components
Multi-DimensionalScaling
Stochastic ProximityEmbedding
Isomap
Diffusion Maps
Local Methods
Locally Linear Embedding
Locality Preserving Projection
Neighbourhood PreservingProjection
Landmark Isomap
t-Stochastic NeighbourhoodEmbedding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Entropic Measure
Entropy Measure Intuition
−10 −5 0 5 10 150
20
40−15
−10
−5
0
5
10
15
x
’swiss’ synthetic data
Y
Z
−1.5−1 −0.5
0 0.51 1.5
−1
−0.5
0
0.5
1−5
0
5
10
X
’intersect’ synthetic data
Y
Z
−400 −200 0 200 400
−500
0
500−300
−200
−100
0
100
200
X
’VOC2006,car’ data
Y
Z
0 10 20 30 40 50 60 70 80 90 1000
0.005
0.01
0.015
0.02
0.025
Bin index
Norm
aliz
ed F
requency
Distribution of pair−wise distances in data
swiss, H=−25.3355
intersect, H=−19.3150
VOC2006,car, H=−33.0302
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Comparison of Embedded Entropy
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Computational Time Complexity
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Classification Performance
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Conclusion
Estimated intrinsic dimensionality was in the neighbourhoodof 14 of the 128-dimensional descriptor.
The performance of LPP in comparison to other embeddingmethods accentuates the importance of modelling structure inlocal distributions.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Visual Model
Structure in distribution of descriptors in feature space?
Issues with K-means clustering in the Bag-of-Words model.
Visual model incorporating Fuzzy logic framework.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Visual Ambiguity
Descriptor assignment has issues of uncertainty andplausibility.
Kernel Codebook uses soft-assignment to resolve theambiguity.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Models
Visual Dictionary
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acce
lera
tio
n (
no
rma
lize
d s
ca
le)
K−means Hard Partition | Motorcycle Data
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acce
lera
tio
n (
no
rma
lize
d s
ca
le)
Fuzzy K−Means Partition | Motorcycle Data
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
times (normalized scale)
acce
lera
tio
n (
no
rma
lize
d s
ca
le)
Gustafson−Kessel Fuzzy Partition | Motorcycle Data
L(Z;µC) =r∑
j=1
∑i∈Cj
‖ zi − µCj ‖2
L(Z; D,A) =r∑
i=1
n∑j=1
(αij)m ‖ zj − µCi ‖
2Σ
L(Z; D,A, Σi) =r∑
i=1
n∑j=1
(αij)m ‖ zj − di ‖2Σi
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Fuzzy Models
d2Σ(z, µC) = (z−µC)TΣ(z−µC)
Σ =
( 1σ1
)2 0 · · · 0
0 ( 1σ2
)2 · · · 0...
.... . .
...0 0 · · · ( 1
σn)p
d2Σi
(zj , µCi ) = (zj−µCi )TΣi (zj−µCi )
Fi =
∑nj=1(αij)
m(zj − di )(zj − di )T∑n
j=1(αij)m
Σi =(ρi det(Fi ))
1p
Fi
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
FKM Classification Performance
MITcoast
MITmountainindustrial
livingroom
MITopencountryPARoffice
MITtallbuilding
CALsuburbstorebedroom
MITforest
MIThighwayMITstreet
MITinsidecitykitchen
visual category
0.5
0.6
0.7
0.8
Acc
Scene15
Bag-of-WordsFuzzy K-means
sheep
horse
bicycl
e
motorbi
ke cow bus
dog cat
perso
n car
visual category
0.45
0.50
0.55
0.60
Acc
VOC2006
Bag-of-WordsFuzzy K-means
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
GK Classification Performance
MITcoast
MITmountainindustrial
livingroom
MITopencountryPARoffice
MITtallbuilding
CALsuburbstorebedroom
MITforest
MIThighwayMITstreet
MITinsidecitykitchen
visual category
0.5
0.6
0.7
0.8
Acc
Scene15
Bag-of-WordsGustafson-Kessel
sheep horse bicycle motorbike cow bus dog cat person car
visual category
0.45
0.50
0.55
0.60
Acc
VOC2006
Bag-of-WordsGustafson-Kessel
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Dictionary Size
32 64 128 256 512dictionary size
0.58
0.60
0.62
0.64
0.66
Acc
Caltech101
Bag-of-WordsFuzzy K-means
32 64 128 256 512dictionary size
0.58
0.60
0.62
0.64
0.66
Acc
Caltech101
Bag-of-WordsGustafson-Kessel
Comparison of BoW with FKM and GK for different sizes ofdictionary.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Aggregate Performance
VOC2006 VOC2010data set
0.50
0.51
0.52
0.53
0.54
0.55
Acc
Bag-of-WordsFuzzy K-meansGustafson-Kessel
(a) VOC datasets
Caltech101 Caltech256data set
0.60
0.62
0.64
0.66
0.68
Acc
Bag-of-WordsFuzzy K-meansGustafson-Kessel
(b) Caltech datasets
Visual Model Data SetVOC-2006 VOC-2010 Caltech-101 Caltech-256
BoW 0.50825 0.52446 0.60111 0.67606FKM 0.52635 0.53736 0.61928 0.68357G-K 0.52885 0.54224 0.62413 0.68623
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Empirical Results
Conclusion
Visual model learnt within the framework of fuzzy logic adaptsto the local distribution of feature vectors.
Learning a better fuzzy membership function is an effectivealternative to learning increasing large dictionaries to adapt toincreasing complexity of visual categories.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering for Structure Estimation
What is co-clustering?
Co-clustering for structure in descriptor data matrix.
Co-clustering for structure in encoded feature matrix.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Co-clustering
Co-clustering is simultaneous and alternative row and columnclustering of a data matrix.
At each step of the optimization routine, the groups of rowsguide column clustering and vice versa.
CX : x1, . . . , xm 7→ x1, . . . , xkCY : y1, . . . , yn 7→ y1, . . . , yl
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Co-clustering methods
Information-Theoretic Co-Clustering
Data matrix is considered a joint probability distribution.Minimizes KL-divergence between original data and co-clusteredmatrices.
Sum-Squared Residue Co-Clustering
Alternative k-means clustering of rows and columns. Minimizessquared Euclidean distance between rows and columns from rowand column means respectively.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Co-clustering Methods
Information-Theoretic Co-clustering
I (X ;Y )− I (X ; Y ) = dKL(p(X ,Y ), q(X ,Y ))
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Mutiple Sub-spaces Intuition
∑i ,j
dE (z•i |Sl , z•j |Sq) >
∑i ,j
dE (z•i , z•j ), l 6= q
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Co-clustering descriptor data matrix
Scene−15 D−SIFT, 500 feature vectors of 128 dimensions
feature vectors
dim
ensio
ns
0
50
100
150
200
250
Information−Theoretic Co−Clustering of Scene−15 D−SIFT 500x128 into 10 row and 10 column clusters
feature vectors
dim
ensio
ns
0
50
100
150
200
250
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Dictionary on single and multiple sub-spaces
Universal PCA Dictionary : VOC−2006 : D−SIFT : 10 x 500 : PCA + Kmeans
dictionary [500]
dim
en
sio
ns [
10
] P
CA
0
100
200
Universal CC Dictionary : VOC−2006 : D−SIFT : 10 x 500 : SSRCC + Kmeans
dictionary [500]
dim
en
sio
ns [
10
] C
C
0
100
200
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Classification performance
VOC2006 VOC2007Data Set
0.50
0.55
0.60
0.65
0.70
F1
Dict: 10x1000MSSD:(i): 5x1000MSSD:(r): 5x1000
VOC2006 VOC2007Data Set
0.50
0.55
0.60
0.65
F1
Dict: 10x1000MSSD:(i): 10x1000MSSD:(r): 10x1000
Comparison of classification performance of single and multiple sub-spacedictionaries.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Dictionary projected to multiple sub-spaces
Universal Dictionary : VOC−2006 : D−SIFT : 128x500 : Kmeans
dictionary [500]
dim
ensio
ns [128]
0
50
100
150
200
250
Universal Submanifold Dictionary : VOC−2006 : D−SIFT : 128 (10) x 500 : SSRCC + Kmeans
dictionary [500]
dim
ensio
ns [128], s
ubm
anifold
s [10]
0
50
100
150
200
250
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Multiple Sub-spaces
Classification performance
VOC2006 VOC2007Data Set
0.50
0.55
0.60
0.65
F1 (5)
Dict: 128x1000SSSD:(i): 128x1000SSSD:(r): 128x1000
VOC2006 VOC2007Data Set
0.50
0.55
0.60
0.65
0.70
F1 (50)
Dict: 128x1000SSSD:(i): 128x1000SSSD:(r): 128x1000
Comparison of classification performance of dictionary projected to multiplesub-spaces.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Structure in Dictionary Intuition
Estimating groups of non-contiguous partitions of feature spacethat are semantically related.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Topic Dictionary Concept
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Classification Performance
Comparison of classification performance of dictionaries using BoWand ITCC, for VOC2006 and Scene15 datasets.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Dictionary sizes
VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 100CC:i: 100
VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 500CC:i: 500
VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
F1
BoW: 1000CC:i: 1000
Comparative classification performance for different dictionarysizes.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Topic Dictionary
Conclusion
Groups of sub-spaces computed using co-clustering yieldeddictionaries with better classification performance.
Groups of feature space partition (dictionary elements) yieldedimproved classification results.
These estimated groups can be used in learning a semanticallystructured visual model.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Decomposition
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Visual Model
Sparse model approximates a feature vector as a combinationof a sub-set of an over-complete basis set.
Sparsity is induced by adding a regularization constraint isadded to the coefficients in the loss function.
Degree of sparsity is determined empirically.
Each basis element is considered individually.
Possible structure amongst basis elements is disregarded.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Model
SSPCA (structure in sub-spaces)
Co-clustered groups of sub-spaces is used to augment Sparse-PCAto compute Structured Sparse-PCA dictionary.
Group Lasso (structure in dictionary)
Co-clustered groups of dictionary elements is used to augmentLasso to compute group Lasso feature encoding.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Sparse Regularization
Sparse regularization : minα
1
n
n∑i=1
L(zi , dαi ) + λΩ(α)
Lasso : minα
1
n
n∑i=1
‖ zi −Dαi ‖2 +λ ‖ αi ‖1
Group Sparsity : minα
1
n
n∑i=1
‖ zi −Dαi ‖2 +λk∑
j=1
‖ αi ‖Gj
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Structured Sub-space Dictionary using ITCC
sheep
horse
bicycle
motorbike cow bu
sdog cat
person car
Visual Category
50
60
70
80
90
mAP
VOC2006
Sparse SubspaceStructured Subspace
sheephorsebicycle
aerop
lanecow sof
abusdog cat
perso
ntrain
dining
table
bottleca
r
pottedplan
t
tvmonitor
chairbir
dboat
motorbike
Visual Category
50
60
70
80
90
mAP
VOC2007
Sparse SubspaceStructured Subspace
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Structured Sub-space Dictionary using SSRCC
sheep
horse
bicycl
e
motor
bike cow bu
sdog cat
perso
n car
Visual Category
60
70
80
90
mAP
VOC2006
Sparse SubspaceStructured Subspace
sheephorsebicycle
aerop
lanecow sof
abusdog cat
perso
ntrain
dining
table
bottleca
r
pottedplan
t
tvmonitor
chairbir
dboat
motorbike
Visual Category
50
60
70
80
90
mAP
VOC2007
Sparse SubspaceStructured Subspace
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sub-space
Sparse Subspace Structured Sparse Subspace
Data Set ITCC SSRCC
VOC2006 67.5941 70.8295 68.5808
VOC2007 67.9971 68.0783 68.3718
Sparse selection of semantically related set of sub-spacesperforms better than sparse individual selection of sub-spaces.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Structured Sparse Encoding using ITCC
MITcoa
st
MITmo
untain
indust
rial
livingroo
m
MITopencou
ntry
PARoffic
e
MITtallbu
ilding
CALsu
burbsto
re
bedro
om
MITforest
MIThig
hway
MITstreet
MITins
idecity
kitchen
Visual Category
50
60
70
80
90
mAP
Scene15 ITCC
Sparse EncodingStructured Encoding
sheep
horse
bicycl
e
motorbi
ke cow bus
dog cat
perso
n car
Visual Category
60
70
80
90
100
mAP
VOC2006 ITCC
Sparse EncodingStructured Encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Structured Sparse Encoding using SSRCC
MITcoa
st
MITmo
untain
indust
rial
livingroo
m
MITopencou
ntry
PARoffic
e
MITtallbu
ilding
CALsu
burbsto
re
bedro
om
MITforest
MIThig
hway
MITstreet
MITinside
city
kitchen
Visual Category
50
55
60
65
70
75
80
85
mAP
Scene15 SSRCC
Sparse EncodingStructured Encoding
sheep
horse
bicycl
e
motorbi
ke cow bus
dog cat
perso
n car
Visual Category
60
70
80
90
100
mAP
VOC2006 SSRCC
Sparse EncodingStructured Encoding
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Structured Sparse Dictionary
Sparse Encoding Structured Sparse Encoding
Data Set ITCC SSRCC
VOC-2006 72.8386 73.3977 72.7738
Scene-15 68.5737 79.8794 72.1155
Sparse selection of semantically related set of dictionaryelements performs better than sparse individual selection ofdictionary element.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Summary
Learning semantically relevant structure in feature space usedto compute better visual models.
Analysis of sub-space embedding emphasized modelling localdistributions.
Incorporation of fuzzy logic framework to learn dictionarykernels that adapt to local distributions yielded better visualmodels.
Co-clustering was successful in grouping semantically relatedsub-spaces and feature space partitions.
Estimated groups of sub-spaces and dictionary elements wereused to compute structured sparse visual models, improvingupon regular sparse models.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Future Work
Future Work
Visual models using Fisher Kernel coding, which uses aGaussian kernel, has been very successful. Combining theapproach in Fisher Kernels with the learnt Fuzzy membershipfunctions could potentially improve the visual model.Fuzzy logic based learning algorithms that are more advancedthan Gustafson-Kessel could be explored to learn bettermembership functions.Co-clustering creates a block factorization of the data matrix.Partial membership of rows and columns to the co-clusterswould be the natural next step.Explore ways of using semantic structure to improve featuregeneration techniques like hierarchical models that aim tolearn category specific descriptors.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Future Work
End
Questions...
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
BoW Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Bag−of−Words Partition | VOC−2006 | #000017
Figure: Bag-of-Words model and image ‘000017’ in VOC-2006 dataset. The dictionary of size 25 () iscomputed using K-means clustering. The feature vectors () are projected to 2 dimensions using PCA.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
FKM Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Fuzzy K−means Fuzzy Partition | VOC−2006 | #000017
Figure: Fuzzy K-means model and image ‘000017’ in VOC-2006 dataset.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary
Appendices
GK Partitioning
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
y
Gustafson−Kessel Fuzzy Partition | VOC−2006 | #000017
Figure: Gustafson-Kessel model and image ‘000017’ in VOC-2006 dataset.
Ashish Gupta University of Surrey
Learning A Structured Model For Visual Category Recognition