[ieee 2010 2nd international conference on image processing theory, tools and applications (ipta) -...

6
Visual Object Recognition using Local Binary Patterns and Segment-based Feature Chao ZHU, Huanzhang FU, Charles-Edmond BICHOT, Emmanuel DELLANDREA and Liming CHEN Université de Lyon, CNRS Ecole Centrale de Lyon LIRIS, UMR5205, F-69134, France e-mail: {chao.zhu, huanzhang.fu, charles-edmond.bichot, emmanuel.dellandrea, liming.chen}@ec-lyon.fr Abstract—Visual object recognition is one of the most challenging problems in computer vision, due to both inter-class and intra-class variations. The local appearance-based features, especially SIFT, have gained a big success in such a task because of their great discriminative power. In this paper, we propose to adopt two different kinds of feature to characterize different aspects of object. One is the Local Binary Pattern (LBP) operator which catches texture structure, while the other one is segment-based feature which catches geometric information. The experimental results on PASCAL VOC benchmarks show that the LBP operator can provide complementary information to SIFT, and segment- based feature is mainly effective to rigid objects, which means its usefulness is class-specific. We evaluated our features and approach by participating in PASCAL VOC Challenge 2009 for the very first attempt, and achieved decent results. Keywords—Object recognition, Feature extraction, Local binary patterns, Segment-based feature, PASCAL VOC Challenge. I. INTRODUCTION The content-based recognition of visual object categories is one of the most challenging problems in computer vision, and keeps attracting more and more attention now. The PASCAL Visual Object Classes (VOC) Challenge [1] is a famous benchmark database in the image classification domain to evaluate the performances of different techniques and systems. The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (not pre-segmented objects), while not only inter-class variations, but also intra-class variations, such as illumination change, pose change, viewpoint change, scaling, clutter, occlusion, etc., make this task even more complicated and challenging. Therefore, studying powerful feature descriptors and efficient modeling methods to well characterize object information becomes the key issue for us to gain good performance on this challenge. Usually, an object recognition system can be divided into two main parts: feature extraction and classification. A lot of feature descriptors have been proposed in the literature. They can be divided into several categories: color, such as color histogram [2], color moments [3], color coherence vectors [4], color correlogram [5]; texture, such as Gabor filter [6], grey level co-occurrence matrix [7], auto correlation [7]; local appearance, such as SIFT [8], Gradient Location and Orientation Histogram (GLOH) [9], Histogram of Oriented Gradients (HOG) [10]. Recently, the local appearance-based features (especially SIFT) together with the Bag-of-Features (BoF) representation have gained great success in object recognition. The main idea of the BoF is to represent image as an orderless collection of local features. A visual vocabulary is constructed by applying clustering algorithm like k- means, and each cluster center is a “visual word” in the vocabulary. All feature vectors are then quantized to their closest “visual word” in Euclidian space. The number of feature vectors assigned to each “visual word” is then accounted to build the final BoF representation. We follow this method as baseline in our experiments and PASCAL VOC Challenge because of its good performance. While color feature descriptors have been studied comprehensively [11] and local appearance-based features have successfully obtained the state-of-the-art performances as described above, we hold that the texture feature is also a very important aspect for distinction between objects. Figure 1 gives an example. The horse and the zebra in the figure have very similar color and appearance, but very distinct texture. Fig. 1. Horse vs. zebra Gabor filter [6], grey level co-occurrence matrix [7], and auto correlation [7] are widely used texture features, with their own advantages and disadvantages respectively. Gabor filter has been proven its good ability of describing texture, but with very high computational complexity because of the substantial convolution. That means it is more suitable for dealing with small images like face images. It will be very time consuming and memory consuming to apply Gabor filter on large images, such as natural scene images. On the contrary, grey level co-occurrence matrix and auto correlation are relatively fast to compute, but their performances are not so good because their overall ability of describing texture is deficient. Therefore, we adopt the Local Binary Pattern (LBP) operator in our experiments and PASCAL VOC Challenge because of its both powerful ability of describing local texture structures and computational simplicity. Details are given in section II. Besides the texture, we hold that the geometric information is also very important for distinguishing objects, especially the rigid objects. Figure 2 gives an example. Bike and building are both rigid objects, from which many line segments can be extracted. It is clearly observed that the length and the orientation of these line segments are different for bike and building. This inspires us to make use of segment-based feature to help with the recognition task. Details are given in section III. The performances of the LBP operator and our segment-based feature are analyzed experimentally based on the PASCAL VOC 2007 database. Our approach for visual object recognition is then evaluated by participating in the PASCAL VOC Challenge 2009. Image Processing Theory, Tools and Applications 978-1-4244-7249-9/10/$26.00 ©2010 IEEE

Upload: liming

Post on 08-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Visual Object Recognition using Local Binary Patterns and Segment-based Feature

Chao ZHU, Huanzhang FU, Charles-Edmond BICHOT, Emmanuel DELLANDREA and Liming CHEN

Université de Lyon, CNRS Ecole Centrale de Lyon

LIRIS, UMR5205, F-69134, France

e-mail: {chao.zhu, huanzhang.fu, charles-edmond.bichot, emmanuel.dellandrea, liming.chen}@ec-lyon.fr

Abstract—Visual object recognition is one of the most challenging

problems in computer vision, due to both inter-class and intra-class variations. The local appearance-based features, especially SIFT, have gained a big success in such a task because of their great discriminative power. In this paper, we propose to adopt two different kinds of feature to characterize different aspects of object. One is the Local Binary Pattern (LBP) operator which catches texture structure, while the other one is segment-based feature which catches geometric information. The experimental results on PASCAL VOC benchmarks show that the LBP operator can provide complementary information to SIFT, and segment-based feature is mainly effective to rigid objects, which means its usefulness is class-specific. We evaluated our features and approach by participating in PASCAL VOC Challenge 2009 for the very first attempt, and achieved decent results.

Keywords—Object recognition, Feature extraction, Local binary patterns, Segment-based feature, PASCAL VOC Challenge.

I. INTRODUCTION The content-based recognition of visual object categories is one of

the most challenging problems in computer vision, and keeps attracting more and more attention now. The PASCAL Visual Object Classes (VOC) Challenge [1] is a famous benchmark database in the image classification domain to evaluate the performances of different techniques and systems. The goal of this challenge is to recognize objects from a number of visual object classes in realistic scenes (not pre-segmented objects), while not only inter-class variations, but also intra-class variations, such as illumination change, pose change, viewpoint change, scaling, clutter, occlusion, etc., make this task even more complicated and challenging. Therefore, studying powerful feature descriptors and efficient modeling methods to well characterize object information becomes the key issue for us to gain good performance on this challenge.

Usually, an object recognition system can be divided into two main parts: feature extraction and classification. A lot of feature descriptors have been proposed in the literature. They can be divided into several categories: color, such as color histogram [2], color moments [3], color coherence vectors [4], color correlogram [5]; texture, such as Gabor filter [6], grey level co-occurrence matrix [7], auto correlation [7]; local appearance, such as SIFT [8], Gradient Location and Orientation Histogram (GLOH) [9], Histogram of Oriented Gradients (HOG) [10].

Recently, the local appearance-based features (especially SIFT) together with the Bag-of-Features (BoF) representation have gained great success in object recognition. The main idea of the BoF is to represent image as an orderless collection of local features. A visual vocabulary is constructed by applying clustering algorithm like k-means, and each cluster center is a “visual word” in the vocabulary. All feature vectors are then quantized to their closest “visual word” in Euclidian space. The number of feature vectors assigned to each “visual word” is then accounted to build the final BoF representation.

We follow this method as baseline in our experiments and PASCAL VOC Challenge because of its good performance.

While color feature descriptors have been studied comprehensively [11] and local appearance-based features have successfully obtained the state-of-the-art performances as described above, we hold that the texture feature is also a very important aspect for distinction between objects. Figure 1 gives an example. The horse and the zebra in the figure have very similar color and appearance, but very distinct texture.

Fig. 1. Horse vs. zebra

Gabor filter [6], grey level co-occurrence matrix [7], and auto correlation [7] are widely used texture features, with their own advantages and disadvantages respectively. Gabor filter has been proven its good ability of describing texture, but with very high computational complexity because of the substantial convolution. That means it is more suitable for dealing with small images like face images. It will be very time consuming and memory consuming to apply Gabor filter on large images, such as natural scene images. On the contrary, grey level co-occurrence matrix and auto correlation are relatively fast to compute, but their performances are not so good because their overall ability of describing texture is deficient.

Therefore, we adopt the Local Binary Pattern (LBP) operator in our experiments and PASCAL VOC Challenge because of its both powerful ability of describing local texture structures and computational simplicity. Details are given in section II.

Besides the texture, we hold that the geometric information is also very important for distinguishing objects, especially the rigid objects. Figure 2 gives an example. Bike and building are both rigid objects, from which many line segments can be extracted. It is clearly observed that the length and the orientation of these line segments are different for bike and building. This inspires us to make use of segment-based feature to help with the recognition task. Details are given in section III.

The performances of the LBP operator and our segment-based feature are analyzed experimentally based on the PASCAL VOC 2007 database. Our approach for visual object recognition is then evaluated by participating in the PASCAL VOC Challenge 2009.

Image Processing Theory, Tools and Applications

978-1-4244-7249-9/10/$26.00 ©2010 IEEE

The remaining sections are organized as follows. Section II introduces the original LBP operator and multi-scale LBP operator in detail, and compares them with other popular texture descriptors. Section III describes our segment-based feature in detail, including region segmentation scheme and segment-based feature extraction. Section IV presents the framework of our approach for object recognition and the experimental results based on the PASCAL VOC 2007 database. Section V presents our evaluation results of the PASCAL VOC Challenge 2009. Finally, section VI concludes this paper and gives some future directions.

Fig. 2. Bike vs. building

II. LOCAL BINARY PATTERNS

A. Original LBP operator The Local Binary Pattern (LBP) operator was firstly introduced as

a complementary measure for local image contrast [13]. It can be seen as a unified approach to statistical and structural texture analysis. Figure 3 gives an example. For one pixel in a gray image, its eight neighboring pixels are considered – their values are operated by the value of the central pixel as threshold. The LBP code is computed by multiplying the thresholded results with weights given by powers of two, and summing up the results. Then, for each pixel in the image, the same process is followed to get its LBP code, and the final LBP operator is obtained by counting the histogram based on these codes. According to its definition, the LBP operator is invariant to any monotonic lighting condition changes in gray-level, and very fast to calculate.

Fig. 3. Calculation of the original LBP operator

Because of its descriptive power for analyzing both micro and macro texture structures, and computational simplicity, the LBP operator has been successfully applied for many applications, such as texture classification [12,14], texture segmentation [15], face recognition [16], and facial expression recognition [17].

B. Multi-scale LBP operator One big limitation of the original LBP operator is that it only

covers a small neighborhood area (8 neighboring pixels), and can only get very limited local information.

In order to obtain more local information by covering larger neighborhood area, and therefore to increase discriminative power of the original LBP, T.Ojala et al. propose multi-scale LBP operator [12] by combining different LBP operators which use a circular neighborhood with different radius and different number of neighboring pixels. Figure 4 gives an example.

Fig. 4. Multi-scale LBP operator

Formally, the LBP code of the pixel at (xc,yc) is calculated according to the following equation:

⎩⎨⎧

<≥

=×−= ∑−

= 0,00,1

)( ,2)(),(1

0 xx

xSggSyxLBPP

p

pcpcc

(1)

where gp is the value of neighboring pixel, gc is the value of central pixel, and P is the total number of neighboring pixels.

By doing this, it also makes the LBP operator invariant to image scaling to a certain extent.

C. Comparison of the original LBP, multi-scale LBP and other texture descriptors

We make a brief comparison of the original LBP operator, multi-scale LBP operator and other popular texture descriptors in this section.

Three popular texture descriptors which have been introduced in section I are chosen to make comparison with the LBP operator, including Gabor filter [6], grey level co-occurrence matrix (GLCM) [7], and texture auto correlation (TAC) [7]. For Gabor filter, 5 scales and 8 orientations are used. For GLCM, 4 directions (horizontal, vertical and diagonal) with 1 offset between two pixels are considered. For TAC, 0 to 8 with step of 2 are applied as position difference in both x and y directions. For multi-scale LBP operator, 3 different scales are used (as Figure 4): 8 neighboring pixels with radius 1, 12 neighboring pixels with radius 1.5, and 16 neighboring pixels with radius 2.

The comparison is based on the PASCAL VOC Challenge 2007 database, and the experimental setup is the same as that which will be introduced in section IV.

Fig. 5. Comparison of the original LBP, multi-scale LBP and other

texture descriptors

From the results shown in Figure 5, it can be seen that the original LBP operator outperforms other popular texture descriptors by a large margin, proving that the LBP operator has more powerful ability of analyzing texture. The multi-scale LBP operator further outperforms the original LBP operator by 14.1%, proving the importance of obtaining more local information and invariance to scaling. Therefore, we use multi-scale LBP operator as texture feature in our experiments and PASCAL VOC Challenge.

III. SEGMENT-BASED FEATURE In this section, we introduce at first our Gestalt-inspired region

segmentation scheme [18], and then the segment-based feature which is extracted from the region map given by our segmentation scheme.

A. Region segmentation scheme The principle of our region segmentation algorithm is to segment

an image into partial gestalts for further visual object recognition. We thus specifically designed a robust region segmentation method that aims at automatically producing coarse regions, from which we can consistently extract feature vectors [18], using the following Gestalt basic grouping laws in our gestalt construction process: color constancy law, similarity law, vicinity law and finally good continuation law. Because those laws are defined between regions and their context, at each step we assess the possibility to merge regions according to global information.

The algorithm is based on color clustering but also includes an extra post-processing step to ensure spatial consistency of the regions. In order to apply previously mentioned Gestalt laws, we defined a 3-step process: first we filter the image and reduce color depth, then we perform adaptive determination of the number of clusters and cluster color data and finally we perform spatial processing to split unconnected clusters and merge smaller regions.

Images are first filtered for robustness to noise; colors are then quantified by following a first, fast color reduction scheme using an accumulator array in CIELab color space to agglomerate colors that are perceptually similar. In the second step, we use an iterative algorithm to determine a good color count which limits the quantization error. Quantization error measured by Mean Square Error (MSE) between original and quantized colors evolves as Figure 6 according to the number of clusters.

Fig. 6. Evolution of MSE between original and quantized colors

This clearly shows a threshold cluster number under which quantization MSE begins to rise sharply. By performing several fast coarse clustering operations using Neural Gas algorithm [19], which is

fast and less sensitive to initialization than its counterparts such as k-means, we are able to compute the corresponding MSE values and generate a target cluster count. We then use hierarchical ascendant clustering which is more accurate but much slower thus executed only once in our case, to achieve segmentation.

The third step consists in splitting spatially unconnected regions, merging similar regions and constraining segmentation coarseness. Merging of similar regions is achieved through the use of the squared Fisher’s distance as (2), where ni, µi, σi

2 are respectively the number of pixels, the average color and the variance of colors within region i. This distance still stays independent towards image dynamics as it involves intra-cluster distance vs. inter-cluster distances. Finally, regions which are too small to provide significant features are discarded.

222

211

22121

21))((),(

σσμμ

nnnnRRD −+

= (2)

With this algorithm we obtain consistent coarse regions that can be used for feature extraction. Sample segmentation results on PASCAL VOC Challenge database images are shown in Figure 7. As we can see, our Gestalt-inspired segmentation algorithm could automatically adapt its segmentation process to the color depth of the images, producing significant partial gestalts.

Fig. 7. Examples of segmented images

B. Segment-based feature extraction Aiming at capturing geometric information of partial gestalts, we

developed segment-based feature relying on a Fast Connective Hough Transform (FCHT) [20] which can quickly detect segments within a region. Once all segments are identified by FCHT, they are distributed to the regions. Our segment-based feature is the histogram combining length and orientation. In order to obtain the invariant property for scaling, translation and rotation, we first divide all the lengths by the longest segments and then compute an average orientation so that all angles can be expressed with respect to it. The size of the histogram is determined experimentally and set to 6 bins for orientation and 4 bins for length.

Finally, to include neighborhood information, our segment-based feature is expressed at four different levels: original region, region + neighbors, region + neighbors + neighbor’s neighbors, etc. Those levels are concatenated in the final feature vector. This is a basic way to integrate spatial relationship as well as to include global information in each feature vector. On most images, the fourth level will represent features extracted over the whole image.

IV. EXPERIMENTAL EVALUATION The PASCAL Visual Object Classes 2007 image database [1] is

used to evaluate the performances of the LBP operator and our segment-based feature. This database contains nearly 10,000 images of 20 different object categories, such as bike, car, cat, table, person, sofa, train, etc. All the images are taken from real-world scenes, and under variant lighting conditions, which make it very complicated and challenging. The database is divided into a predefined training set (2501 images), validation set (2510 images) and test set (4952 images). The goal is to recognize the objects in images and to classify them into the correct categories. The mean average precision (MAP) is used as the evaluation criterion.

A. Baseline As described in section I, we use SIFT [8] feature together with

the Bag-of-Features (BoF) representation as the baseline of our approach. We follow the way in [11] to extract SIFT feature and build the BoF model. That is, both Harris-Laplace point sampling and dense sampling every 6 pixels are applied to find the keypoints in images, SIFT feature is then extracted around these keypoints. A visual vocabulary with 4000 “visual words” is then constructed by applying k-means clustering algorithm to 200,000 randomly selected descriptors from the training set. Finally, each image is expressed as a fixed-length (4000 bins) histogram.

B. Post-processing of segment-based feature Because the extracted segment-based feature vectors are not

fixed-length for every image (according to the number of segmented regions), the post-processing is needed, which is the same as the baseline. The only difference is that the size of visual vocabulary becomes 1000. Since the extracted LBP feature vectors are already fixed-length, no post-processing is needed.

C. Classifier The Support Vector Machine (SVM) is applied for classification.

Here the LibSVM implementation [21] is used. Once all the feature vectors are extracted from the database, the χ2 distance is computed to measure the similarity between each pair of the feature vectors F and F’ (n is the size of the feature vector):

∑= +

−=

n

i ii

ii

FFFFFFdist

1

2

')'()',(2χ (3)

Then, the kernel function based on this distance is used for SVM to train the classifier:

)',(12

2 )',(FFdist

DeFFK χ

χ

−= (4)

where D is the parameter for normalizing the distances. Here D is set to the average value of the training set.

We train the classifier based on the training set, then tune the parameters based on the validation set, and finally get classification results on the test set.

D. Fusion strategy In order to combine different features, late fusion is adopted.

More precisely, the final output decision values are the weighted sum of different channels of features based on their Equal Error Rate (EER). The weights are calculated as follows:

∑=

= M

m m

mm

r

r

1

1

1ω (5)

where ωm is the weight for the m-th channel, rm is the EER of the m-th channel, and M is the total number of channels.

Finally, the precision-recall curve is plotted according to the output decision values of the SVM classifier, and the MAP is computed based on the proportion of the area under this curve.

E. Experimental results At first, the multi-scale LBP operator is evaluated together with

the baseline (SIFT).

Fig. 8. Evaluation of the multi-scale LBP operator with the

baseline (SIFT)

From the results shown in Figure 8, it can be seen that the LBP operator cannot be comparable with SIFT as single feature. However, combining LBP with SIFT can gain 7% improvement compare with the baseline, indicating that the LBP operator can provide complementary information to SIFT, and therefore help to improve the overall performance.

The segment-based feature is also evaluated together with the baseline (SIFT).

Fig. 9. Evaluation of segment-based feature with the baseline

(SIFT)

From the results shown in Figure 9, it can be seen that the overall performance improvement is small by combining the segment-based feature with SIFT. After analyzing class by class, we found that for the class of rigid object, such as airplane, bike, boat, bus, car and motorbike, an obvious performance improvement can be observed, as Table 1 shows. But for other non-rigid object class, such as cat, dog, horse, people, etc., the performances decreased, indicating that the usefulness of geometric information is class-specific.

Table 1. Performances of the baseline, segment-based feature and the combination class by class

Baseline Segment Baseline + Segment

airplane 0.620 0.393 0.637 bike 0.472 0.210 0.496 boat 0.523 0.357 0.550 bus 0.435 0.248 0.447 car 0.626 0.445 0.649

motorbike 0.450 0.214 0.486

F. Time complexity Our approach is implemented in Matlab. During feature extraction

step, the extraction of the SIFT descriptors and the segment-based features are much more time consuming than that of the LBP operators, as Table 2 shows. Most of the time complexity in training step is consumed by two parts: one is the construction of the visual vocabulary, which involves certain number of clustering iterations, and will take several hours to complete. It could not be paralleled, but only need to be computed once. The other is the construction of the kernel matrix, which involves the pairwise distance calculations. It will also take several hours to complete, but could easily be reduced by making the computation paralleled. Compared with the training step, the time complexity of learning the SVM classifiers can be ignored.

Table 2. Comparison of time complexity in feature extraction step

SIFT

(Harris-Laplace + Dense sampling)

LBP (multi-scale)

Segment-based feature

Time (per image) 5.5s 1.1s 4.5s

V. PASCAL VOC CHALLENGE 2009

We participated in the PASCAL VOC Challenge for the very first time in 2009 to evaluate our features and approach for object recognition, as well as to compare with the state-of-the-art in the field. There are no big changes between PASCAL VOC Challenge 2007 and 2009, only except the increasing number of images in database (3473 for training, 3581 for validation, and 6650 for test). So we follow the same approach described in section IV. For the adopted features, besides the multi-scale LBP operator for texture information, segment-based feature for geometric information and SIFT for local appearance information, we add three color descriptors to catch color information of images, namely color histogram [2], color moments [3] and color coherence vectors [4]. Figure 10 shows some of our evaluation results provided by the organizer, and the comparison with the state-of-the-art [22].

Our results are a little higher than the average, which is a not bad position for our very first participation in this challenge. However, compared with the best results [23,24], we still need some improvements.

Fig. 10. Selected results of PASCAL VOC Challenge 2009

VI. CONCLUSION AND DISCUSSION In this paper, we adopt multi-scale LBP operator and segment-

based feature, together with the popular SIFT feature, to the visual object recognition task. The experimental results show that the multi-scale LBP operator is very powerful for analyzing texture structures, and can provide complementary information to SIFT to help improve the performance. As for geometric information extracted by segment-based feature, its usefulness is class-specific, and very effective to rigid objects. We also participated in the PASCAL VOC Challenge 2009 to evaluate our approach, and achieved an upper middle rank position as our very first attempt.

To further improve the performance of our approach, the following three aspects will be considered in our future work: (1) Since the SIFT and the LBP operator ignore all color information, while color plays an important role for distinction between objects, especially in natural scenes, the color SIFT descriptors [11] and the color LBP operators [25] will be adopted to combine color information, as well as to increase photometric invariance properties. (2) Since the spatial information is discarded during feature extraction, while geometric correspondences are also important for object recognition, the “spatial pyramid” [26] will be a good way to consider spatial information when extracting features. (3) Since many kinds of features or kernels are available, it is important to decide how to combine them, and the importance of each individual feature in the combination, Multiple Kernel Learning (MKL) [27] will be a good framework for this.

ACKNOWLEDGMENT

This work was partly supported by the French ANR Omnia project under the grant ANR-07-MDCO-009-02.

REFERENCES

[1] The PASCAL Visual Object Classes Challenge Homepage, http://pascallin.ecs.soton.ac.uk/challenges/VOC/ [2] MJ Swain, DH Ballard. Color indexing. International Journal of Computer Vision, 11–32, 1991. [3] MA Stricker, M Orengo. Similarity of color images. Storage and Retrieval for Image and Video Databases, 381–392, 1995. [4] G Pass, R Zabih, J Miller. Comparing images using color coherence vectors. In Proc. of the fourth ACM international conference on Multimedia, 65–73, 1997. [5] J Huang, SR Kumar, M Mitra, WJ Zhu, R Zabih. Image Indexing Using Color Correlograms. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 762–769, 1997.

[6] D Zhang, A Wong, M Indrawan, G Lu. Content-based Image Retrieval Using Gabor Texture Features. In Proc. Pacific-Rim Conference on Multimedia, 392–395, 2000. [7] M Tuceryan, AK Jain. Texture analysis. Handbook of Pattern Recognition and Computer Vision, 235–276, 1993. [8] DG Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004. [9] K Mikolajczyk, C Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):1615–1630, 2005. [10] N Dalal, B Triggs. Histograms of Oriented Gradients for Human Detection. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 886–893, 2005. [11] K van de Sande, T Gevers, C Snoek. Evaluating Color Descriptors for Object and Scene Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press, 2010. [12] T Ojala, M Pietikäinen, T Mäenpää. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7):971–987, 2002. [13] T Ojala, M Pietikäinen, D Harwood. A comparative study of texture measures with classification based on feature distribution. Pattern Recognition, 29:51–59, 1996. [14] M Topi, O Timo, P Matti, S Maricor. Robust texture classification by subsets of local binary patterns. In Proc. of International Conference on Pattern Recognition, 3:935–938, 2000. [15] T Ojala, M Pietikäinen. Unsupervised Texture Segmentation Using Feature Distributions. Pattern Recognition, 32(3):477–486, 1999. [16] T Ahonen, A Hadid, M Pietikäinen. Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12):2037–2041, 2006. [17] C Shan, S Gong, PW McOwan. Robust facial expression recognition using local binary patterns. In Proc. of International Conference on Image Processing, 2:370–373, 2005. [18] A. Pujol, L. Chen. Coarse adaptive color image segmentation for visual object classification. In Proc. of International Conference on Systems, Signals & Image Processing, 157–160, 2008. [19] T. Martinetz, K. Schulten. A "neural-gas" network learns topologies. Artificial Neural Networks, 397–402, 1991. [20] M. Ardabilian, L. Chen. A new line extraction algorithm: Fast connective Hough transforms. In Proc. of Pattern Recognition and Image Processing, 127–134, 2001. [21] Chih-Chung Chang, Chih-Jen Lin. LIBSVM: a library for support vector machines, 2001. Software available at http :// www.csie.ntu.edu.tw/~cjlin/libsvm. [22] The PASCAL VOC Challenge 2009 results, http :// pascallin.ecs.soton.ac.uk/challenges/VOC/voc2009/results/index.html. [23] Jianchao Yang, Kai Yu, Yihong Gong, T. Huang. Linear spatial pyramid matching using sparse coding for image classification. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 1794-1801, 2009. [24] Xi Zhou, Na Cui, Zhen Li, Feng Liang, T. Huang. Hierarchical Gaussianization for Image Classification. In Proc. of IEEE International Conference on Computer Vision, 2009. [25] Chao Zhu, Charles-Edmond Bichot, Liming Chen. Multi-scale Color Local Binary Patterns for Visual Object Classes Recognition. In Proc. of the 20th International Conference on Pattern Recognition, 2010. [26] S. Lazebnik, C. Schmid, J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. of IEEE Conference on Computer Vision and Pattern Recognition, 2169-2178, 2006. [27] Jingjing Yang, Yuanning Li, Yonghong Tian, Lingyu Duan, Wen Gao. Group-Sensitive Multiple Kernel Learning For Object Categorization. In Proc. of IEEE International Conference on Computer Vision, 2009.