robust classification of objects, faces, and flowers using natural image statistics
DESCRIPTION
Robust Classification of Objects, Faces, and Flowers Using Natural Image Statistics. 主讲人:王崇秀. Outline. Authors Abstract Background Framework and Implementation Experiments and Results Conclusions. Authors. Christopher Kanan: - PowerPoint PPT PresentationTRANSCRIPT
Robust Classification of Objects, Faces, and Flowers
Using Natural Image Statistics
主讲人:王崇秀
Outline
Authors Abstract Background Framework and Implementation Experiments and Results Conclusions
23/4/20 2
Authors
Christopher Kanan: Ph.D. student at the University of California, San Diego (UCSD), and
intend to graduate in 2012. Research Interests: fuses findings and methods from computer vision,
machine learning, psychology, and computational neuroscience. Homepage: http://cseweb.ucsd.edu/~ckanan/index.html Email: [email protected]
23/4/20 3
Authors
Garrison Cottrell: Professor in the Computer Science & Engineering Department at
UCSD . Research: His research is strongly interdisciplinary. It concerns using
neural networks as a computational model applied to problems in cognitive science and artificial intelligence, engineering and biology. He has had success in using them for such disparate tasks as modeling how children acquire words, studying how lobsters chew, and nonlinear data compression.
23/4/20 4
Outline
Authors Abstract Background Framework and Implementation Experiments and Results Conclusions
23/4/20 5
Abstract Classification of images in many category datasets has rapidly improved in
recent years. However, systems that perform well on particular datasets typically have one or more limitations such as a failure to generalize across visual tasks (e.g., requiring a face detector or extensive retuning of parameters), insufficient translation invariance, inability to cope with partial views and occlusion, or significant performance degradation as the number of classes is increased.
Here we attempt to overcome these challenges using a model that combines sequential visual attention using fixations with sparse coding. The model’s biologically-inspired filters are acquired using unsupervised learning applied to natural image patches. Using only a single feature type, our approach achieves 78.5% accuracy on Caltech-101 and 75.2% on the 102 Flowers dataset when trained on 30 instances per class and it achieves 92.7% accuracy on the AR Face database with 1 training instance per person. The same features and parameters are used across these datasets to illustrate its robust performance.
23/4/20 6
摘要 最近在很多分类数据集上,图像的分类性能在快速的提升。但是,在某一特定数据集上性能很好的系统往往有一个或者多个限制,例如在视觉任务中难以推广 (需要一个人脸检测器或者额外的参数返回 ),平移不变性不足,不能处理局部遮挡以及随着类别的增多,性能显著下降。
在这里,我们试图使用一个模型来克服这些挑战,该模型结合了顺序视觉注意中的稀疏编码的视点。该模型的生物启发的滤波器是在自然图像块上通过无监督学习得到。仅使用一种特征,每类使用 30个样本来训练,该方法在 caltech101上达到 78.5%的识别率;在 102类的花数据库上达到 75.2%的识别率。每个人使用 1个训练样本,该方法在 AR人脸数据库上达到 92.7%的识别率。在这些数据机上,使用的特征和参数都是一致的,展示了该方法的鲁棒性。
23/4/20 7
Outline
Authors Abstract Background Framework and Implementation Experiments and Results Conclusions
23/4/20 8
Background——Using Natural Image Statistics
Hand-designed features: Haar, DOG, Gabor, HOG, SIFT and so on;
Self-taught learning: Applied to unlabeled natural images to learn basis vectors/filters that are
good for representing natural images. The training data is generally distinct from the datasets the system will
be evaluated on. Self-taught learning works well because it represents natural scenes
efficiently, while not overfitting to a particular dataset.
Sparse coding
23/4/20 9
Background——Visual Attention
Visual Attention A saliency map is a topologically organized map that indicates
interesting regions in an image based on the spatial organization of the features and an agent’s current goal.
Computational model: There are many computational models, typically produce maps that assign high saliency to regions with rare features.
23/4/20 10
Background——Sequential Object Recognition
Sequential Object Recognition Many algorithms for saliency maps have been used to predict the
location of human eye movements, little work has been done on how they can be used to recognize individual objects.
a few notable exceptions [1, 23, 27, 15] and these approaches have several similarities.
Framework: extract features -> saliency maps based on features -> extract small window representing a fixation and classify these fixations to made subsequent fixations -> mechanisms used to combine information across fixations.
NIMBLE framework
23/4/20 11
Outline
Authors Abstract Background Framework and Implementation Experiments and Results Conclusions
23/4/20 12
Framework and Implementation
High level description of the model: Pre-processing image to cope with luminance variation. Sparse ICA features are then extracted from the image. Sparse ICA features are used to compute a saliency map, which
is treated as a probability distribution, and locations are randomly sampled from the map.
Fixations are extracted from the feature maps at the sampled location, followed by probabilistic classification.
23/4/20 13
Framework and Implementation
Image pre-processing: Resizing to ensure smallest dimension is 128 with other dimension
resized accordingly to maintain its aspect ratio. Grayscale images are converted to color. RGB → LMS: is a color space represented by the response of the three t
ypes of cones of the human eye, named after their responsivity (sensitivity) at long, medium and short wavelengths.
Normalization to [0,1]:
where , is a pixel of the image in LMS color space at location z. Note that as well.
23/4/20 14
5.00 ]1,0[)( zrlinear]1,0[)( znonlinearr
Framework and Implementation
Image pre-processing:
23/4/20 15
Framework and Implementation
Feature learning: To learn ICA filters, we preprocess 584 images from the McGill color
image dataset. From each image, 100 b*b*3 patches are extracted from random
locations. The channel mean (L, M, and S) computed across images is subtracted
from each patch. Each patch is then treated as a 3b^2 dimensional vector.
PCA is applied to the patch collection to reduce the dimensionality (discard the first principal component, retain the rest d principal components).
Apply fastICA → d ICA filters. m*n*3 images → m*n*d filter responses, sparse representation.
23/4/20 16
Framework and Implementation
Feature learning:
the ICA filters learned.
23/4/20 17
Framework and Implementation
Saliency Maps: Use SUN model to generate saliency map.
23/4/20 18
Framework and Implementation
Spatial Pooling: Normalize saliency map to sum to one, and then treated as a
probability distribution. Randomly sampled T times, during each fixation t, a location is
chosen according to the saliency map. w*w*d(w=51) stack of filter responses. Reduce the dimension of the stack by spatial subsampling it using
a spatial pyramid, which divide each w*w filter responses into 1*1, 2*2, 4*4 grids, and the mean filter responses in each grid cell is computed and concatenated to form a vector, and normalized to unit length.
This reduces the dimensionality of the from w*w*d(51^2d) to 21d. is normalized by the height and width of the image and stored along with the corresponding features.
23/4/20 19
tl
tl
Framework and Implementation
Spatial Pooling: After acquiring T fixations from every training image, PCA is applied to
the collected feature vectors. The first 500 principal components are retained, and then whitened. Finally, the post-PCA fixation features, denoted , are each made unit
length.
23/4/20 20
ikw ,
Framework and Implementation
Training and Classification Naïve Bayes’ assumption: is the vector of fixation features.
Bayes’ rule:
P (C = k) is uniform and we fix T = 100, which would be about 30 s of viewing time for a person assuming 3 fixations/second.
23/4/20 21
tg
Outline
Authors Abstract Background Framework and Implementation Experiments and Results Conclusions
23/4/20 22
Experiments and Results
Caltech101 results
23/4/20 23
Experiments and Results
Caltech256 results
23/4/20 24
Experiments and Results
AR face database
23/4/20 25
Experiments and Results
102 Flower database
23/4/20 26
Outline
Authors Abstract Background Framework and Implementation Experiments and Results Conclusions
23/4/20 27
Conclusions
One of the reasons we think our approach works well is because it employs a nonparametric exemplar-based classifier.
The Naïve Bayes’ assumption is obviously false and learning a more flexible model could lead to performance improvements.
23/4/20 28