[ieee 2014 international conference on communication systems and network technologies (csnt) -...

6
Comparison of Color and Color with Edge Feature Extraction Using Contribution- based Clustering Algorithm Snehal Mahajan Department of Computer Engineering RCPIT, Shirpur Dhule, India [email protected] Dharmaraj Patil Department of Computer Engineering RCPIT, Shirpur Dhule, India [email protected] AbstractSearch and retrieval of images based on content has attracted considerable attention in recent years from the research community. Classification and Clustering algorithm are used to improve the result of Content based Image retrieval. This paper relies on a combination of color and edge features of image for the accurate retrieval of images. Color features are extracted by RGB color histogram and edge features are extracted by using canny edge detection algorithm. Contribution based Clustering algorithm is applied to those features to form the cluster of images. Experimental results have been tested on the test dataset of about 771 images from the Washington University database. Combination of Color with Edge features gives the better result than the standalone color extraction with contribution based clustering algorithm. Our experiment improves the recall value and f- measure value of image retrieval. Keywords- Content Based Image Retrieval , Feature Extraction, clustering, contribution based, Edge Detection Introduction The advent of large storage devices and proliferation of digital image capturing devices has meant an innumerable number of images are created continuously. This increasing storage data in image form has meant that the efficient and accurate search of images has become increasingly important. Past approaches have relied on textual description of image. Query image needs to be supplied in text format through human intervention to match with stored text descriptions in the image database. Such text description of image is not feasible for a large amount of database with Language differences, perception differences and inefficient query format adding to the difficulties. To overcome these problems, features of image content can be used for the search of image [1]. Content Based Image Retrieval (CBIR) uses low level feature extraction i.e. visual content of image such as color, texture and shape for image retrieval. High Level Feature extraction utilises Textual description in addition to the low level feature extraction Feature extraction and similarity measurement are the two key steps in CBIR. There are numerous feature extraction techniques possible for CBIR[2]. Further, two approaches viz. Local and global descriptors for feature extraction of an image. In case of global descriptor, visual features of whole image are calculated. In local descriptor, region of image is used for feature extraction. For this purpose, image is divided into blocks and feature extraction technique is applied only on blocks. SIMPLIcity[3] and IBM QBIC[4] is the region based system where local properties of image are used. Low level features like color, shape and texture are extracted and stored in a database. Then query image features are matched with database features by using similarity measurement techniques such as Minkowski-Form distance, Quadratic distance, vector space model[5], Sum-of-Absolute Difference[6],chi-square distance[7],Mahalanobis distance etc [1] . I. CLUSTERING Clustering [8][9] is unsupervised classification technique that is used for the classification of data which improves the retrieval rate. Cluster is a group of similar data points and dissimilar data points are placed in different cluster. CBIR uses various classification and clustering techniques to improve its retrieval of images.. Clustering algorithms are classified into five main types[8]:- 1. Partitioned based clustering algorithm: construct partition of database D having n objects so that the database is partitioned into a set of k-clusters. Find out the similarity and dissimilarity of n objects and as per the similarity and dissimilarity move the n th object into the k th cluster. k- means, k-medoids, contribution based clustering are some of the Partitioned based clustering algorithms. 2. Hierarchical based clustering algorithm: A hierarchical decomposition of data is created for further analysis. Top down hierarchical clustering and bottom up hierarchical clustering are used. In top down approach, all objects are placed in same cluster and then that cluster is divided into sub groups. In bottom up approach, initially all objects placed in separate group and then the proper cluster formation is done. 3. Density based clustering algorithm: it used to form arbitrary based cluster formation. Cluster formation process remains continuous until it reaches up to its threshold value. 4. Grid based clustering algorithm: It uses on grid data structure. Uses statistical information stored in the grid cells called as STING and obtains the cluster by using a wavelet transform method called as WaveCluster 2014 Fourth International Conference on Communication Systems and Network Technologies 978-1-4799-3070-8/14 $31.00 © 2014 IEEE DOI 10.1109/CSNT.2014.254 875 2014 Fourth International Conference on Communication Systems and Network Technologies 978-1-4799-3070-8/14 $31.00 © 2014 IEEE DOI 10.1109/CSNT.2014.254 875

Upload: dharmaraj

Post on 19-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2014 International Conference on Communication Systems and Network Technologies (CSNT) - Bhopal, India (2014.04.7-2014.04.9)] 2014 Fourth International Conference on Communication

Comparison of Color and Color with Edge Feature Extraction Using Contribution-based Clustering Algorithm

Snehal Mahajan Department of Computer Engineering

RCPIT, Shirpur Dhule, India

[email protected]

Dharmaraj Patil Department of Computer Engineering

RCPIT, Shirpur Dhule, India

[email protected]

Abstract— Search and retrieval of images based on content has attracted considerable attention in recent years from the research community. Classification and Clustering algorithm are used to improve the result of Content based Image retrieval. This paper relies on a combination of color and edge features of image for the accurate retrieval of images. Color features are extracted by RGB color histogram and edge features are extracted by using canny edge detection algorithm. Contribution based Clustering algorithm is applied to those features to form the cluster of images. Experimental results have been tested on the test dataset of about 771 images from the Washington University database. Combination of Color with Edge features gives the better result than the standalone color extraction with contribution based clustering algorithm. Our experiment improves the recall value and f-measure value of image retrieval.

Keywords- Content Based Image Retrieval , Feature Extraction, clustering, contribution based, Edge Detection

Introduction

The advent of large storage devices and proliferation of digital image capturing devices has meant an innumerable number of images are created continuously. This increasing storage data in image form has meant that the efficient and accurate search of images has become increasingly important. Past approaches have relied on textual description of image. Query image needs to be supplied in text format through human intervention to match with stored text descriptions in the image database. Such text description of image is not feasible for a large amount of database with Language differences, perception differences and inefficient query format adding to the difficulties. To overcome these problems, features of image content can be used for the search of image [1].

Content Based Image Retrieval (CBIR) uses low level feature extraction i.e. visual content of image such as color, texture and shape for image retrieval. High Level Feature extraction utilises Textual description in addition to the low level feature extraction

Feature extraction and similarity measurement are the two key steps in CBIR. There are numerous feature extraction techniques possible for CBIR[2]. Further, two approaches viz. Local and global descriptors for feature extraction of an image. In case of global descriptor, visual features of whole image are calculated. In local descriptor,

region of image is used for feature extraction. For this purpose, image is divided into blocks and feature extraction technique is applied only on blocks. SIMPLIcity[3] and IBM QBIC[4] is the region based

system where local properties of image are used. Low level features like color, shape and texture are extracted and stored in a database. Then query image features are matched with database features by using similarity measurement techniques such as Minkowski-Form distance, Quadratic distance, vector space model[5], Sum-of-Absolute Difference[6],chi-square distance[7],Mahalanobis distance etc [1] .

I. CLUSTERING Clustering [8][9] is unsupervised classification technique that is used for the classification of data which improves the retrieval rate. Cluster is a group of similar data points and dissimilar data points are placed in different cluster. CBIR uses various classification and clustering techniques to improve its retrieval of images.. Clustering algorithms are classified into five main types[8]:- 1. Partitioned based clustering algorithm: construct partition of database D having n objects so that the database is partitioned into a set of k-clusters. Find out the similarity and dissimilarity of n objects and as per the similarity and dissimilarity move the nth object into the kth cluster. k-means, k-medoids, contribution based clustering are some of the Partitioned based clustering algorithms. 2. Hierarchical based clustering algorithm: A hierarchical decomposition of data is created for further analysis. Top down hierarchical clustering and bottom up hierarchical clustering are used. In top down approach, all objects are placed in same cluster and then that cluster is divided into sub groups. In bottom up approach, initially all objects placed in separate group and then the proper cluster formation is done. 3. Density based clustering algorithm: it used to form arbitrary based cluster formation. Cluster formation process remains continuous until it reaches up to its threshold value. 4. Grid based clustering algorithm: It uses on grid data structure. Uses statistical information stored in the grid cells called as STING and obtains the cluster by using a wavelet transform method called as WaveCluster

2014 Fourth International Conference on Communication Systems and Network Technologies

978-1-4799-3070-8/14 $31.00 © 2014 IEEEDOI 10.1109/CSNT.2014.254

875

2014 Fourth International Conference on Communication Systems and Network Technologies

978-1-4799-3070-8/14 $31.00 © 2014 IEEEDOI 10.1109/CSNT.2014.254

875

Page 2: [IEEE 2014 International Conference on Communication Systems and Network Technologies (CSNT) - Bhopal, India (2014.04.7-2014.04.9)] 2014 Fourth International Conference on Communication

5. Model based clustering algorithm: It uses the given data and mathematical model for clustering. Expectation-Maximization algorithm, conceptual clustering and neural network approach are come under model based clustering. Performance of content based image retrieval system is improved by using clustering techniques [10][16]. The rest of the paper is structured as follows. In section II we discuss related work. In Section III we cover in detail the design of the system and describe the algorithm and system architecture. In Section IV evaluates the image retrieval experimental result. At last, we conclude the paper with future work.

II. RELATED WORK

Several techniques have been developed for image retrieval from large database.

V.S.V.S. Murthy et al. used RGB color space for feature extraction of query image and database image. On the extracted features, hierarchical clustering algorithm is used for initial clusters formation after that on that clusters k-means clustering algorithm is used. Hierarchical clustering helps for faster image retrieval and improves the search by giving proper direction for search. K-Means clustering algorithm gives the accurate result of image from a large database. Complexity gets increased due to using two clustering techniques [11].

Thawari P. B. and N. J. Janwe selected RGB color histogram to determine the color features and mean, standard deviation, normalized relative smoothness, third moment, forth moment, uniformity and entropy is used to calculate texture feature. Combining the texture and color features it gives the better result in CBIR [12].

Fazal-e-Malik and Baharum Baharudin converted RGB image into gray scale image followed by calculating the texture primitive of that image by dividing it into blocks. It gives better result than combining of texture and color feature extraction [12][13].

S.R.Surya and G.Sasikala divided whole image into non-overlapping blocks and then texture primitive is calculated on each block. The canny edge detection algorithm is used for feature extraction. On that feature database, contribution based clustering algorithm is used [6].

Sanjay N. Talbar and Satishkumar L. Varma calculated color features for CBIR. Discrete cosine transform is used to get compaction of the image energy. K-means is used to create clusters and chi-square distance measure is used to match and retrieve images [7].

Zhao Hui et al. used Median filtering and Edge detection algorithm for feature extraction. Canny edge detection algorithm is used on image to calculate edge value and remove the noise value from it after that median filtering is used. It improves the result of CBIR[14].

A.K.Jain and A.Vailaya used Color and shape features integration. This gives better result than using single feature extraction technique. Retrieving result become more efficient and effective by combining the easily attracted features [15].

Raman Maini and Dr. Himanshu Aggarwal compared different edge detection techniques. Canny’s edge detection algorithm gives better performance than Sobel, Prewitt and Robert’s operator for edge detection. Canny’s edge detection has better edge detection especially in noise condition [17].

Harikrishna Narasimhan and Purushothaman Ramraj use RGB color histogram for feature extraction. After extracting the features, Contribution based clustering algorithm is used for the clustering. Euclidean distance method is used for the similarity measurement of query image and database image. Contribution based clustering algorithm gives the better results in the form of recall and f-measure than that of k mean clustering [18].

III. SYSTEM ARCHITECTURE AND ALGORITHM The measure problem of RGB color histogram technique

arises out of inability to increase number of bins which are fixed and the distribution of color in image thereby making use of only color histogram method inefficient for image retrieval. To improve the efficiency, alongwith color, we use the canny edge detection and then construct the feature vector database. Consider for example images in Figure 1.

Figure1: Different images with same color quantity[5]

Despite the obvious visual differences, these two

figures have same histogram. So to remove that drawback of color histogram, we need to add edge features with color features to create feature vector.

RGB color histogram and canny edge detection algorithm is used for feature extraction. We retrieve the image(s) by matching the query image feature with database features using the contribution based clustering algorithm.

Figure.2 shows the system architecture. The input is a RGB color image database. On that database we use the RGB color histogram technique to extract color features of images. We use 8 bins each for Red, Green and Blue respectively and extract 512 features of each image. Further, we do the edge features extraction by using canny edge detection algorithm. For this purpose, we convert our RGB color image into grayscale image. Then smoothing of image is done by using standard deviation σ = 1.4 followed by the

876876

Page 3: [IEEE 2014 International Conference on Communication Systems and Network Technologies (CSNT) - Bhopal, India (2014.04.7-2014.04.9)] 2014 Fourth International Conference on Communication

computation of gradient magnitude and angle. Non-Maximal suppression and edge threshold is calculated and we get the edge features. By combining color features and edge feature we construct the database for feature vector.

Figure 2: System architecture

Canny Edge Detection Algorithm [6][15][17]:-

Canny edge detection algorithm is used to obtain the edge information of image. Shape features are extracted by using canny edge detection.

These are main steps in canny edge detection 1. Convert into Grayscale: RGB color image is converted into grayscale image. 2. Smoothing or noise reduction: Smoothing is done to filter out any noise from the image prior to the edge detection. Smoothing is done with a two dimensional Gaussian filter. Gaussian filter is computed by using a simple mask. The Gaussian mask used with a standard deviation of σ = 1.4 is given below

B = ����

⎣⎢⎢⎢⎡ 2 4 5 4 24 9 12 9 45 12 15 12 54 9 12 9 42 4 5 4 2 ⎦⎥

⎥⎥⎤ (1)

Compute Gradient Magnitude and Angle: The Canny algorithm detects edges where the grayscale intensity of the image gets changed. These changes are identified by determining gradients of the image. Gradient at each pixel in the smoothed image are determined by using the Sobel-operator. Firstly, we approximate the gradient in the x- and y-direction respectively for that purpose, we use the following equations

KGX = �−1 0 1−2 0 2−1 −2 −1� KGY = � 1 2 10 0 0−1 −2 −1� (2)

Gradients can then be calculated by using Euclidean distance measure and is simplified by Manhattan distance measure as shown in equation | | = | �| + | �| (3)

Gradient magnitude can give the edge quite clearly. Sometimes edges are broad making the edge determination difficult, so the direction of edge should be calculated by using the given equation � = ������ �|��|

|��|� (4) Edges occur at the point where we get maximum gradient value .magnitude and direction of gradient is calculated at each pixel. 4. Non-Maximal Suppression: It converts the “blurred” edges in the image of the gradient magnitudes to “sharp” edges. This is calculated by preserving all local maxima in the gradient image and deleting everything else. This is done for each pixel in the gradient image:

i) Round the gradient direction θ to nearest 45◦, corresponding to the use of an 8-connected neighbourhood. ii) Compare the edge strength of the current pixel with the edge strength of the pixel in the positive and negative gradient direction. iii) If the edge strength of the current pixel is largest; preserve the value of the edge strength. If not then remove that value.

5. Edge threshold: Tracing edges through the image and hysteresis thresholding. It uses both high and low threshold. It is done to eliminate breaking of an edge.

The Colors and Edge features are combined to form feature vector. Cluster formation is done by using contribution based clustering algorithm[18]. APartitioning based clustering algorithm, described below, is used that relies on both intra-cluster and inter-cluster dispersion of objects. In implementation, we calculate the contribution and dispersion of every object by using the following formulae.

Given a cluster �� having n points and centroid value is �� then dispersion is ��!"#�!�$� (��) = �

% ∑ (' − ��)*, ∈ ./ (5) The contribution of x which belong to cluster �� is

calculated by Contribution (', ��) = dispersion (�� − {'}) - dispersion(��) (6)

Algorithm for clustering (Contribution-based clustering)[18]

Step 1: Initialization of cluster (C1,C2,……Ck) Step 2:Finding centroids of all cluster(m1,m2,…..mk)

Step 3: Find 1 l k such that distance(x, ml) is minimum

Step 4: Add x to cluster Cl and update centroid ml. Step 5:If contribution(x, Cl) < 0 Move x to a cluster Cp such that contribution (x,Cp) is maximum

Step 6:If contribution(x, Cl) 0

877877

Page 4: [IEEE 2014 International Conference on Communication Systems and Network Technologies (CSNT) - Bhopal, India (2014.04.7-2014.04.9)] 2014 Fourth International Conference on Communication

Move x to a cluster Cp such that 6768:;6 - <7<8:;<8:; is maximum. Here α?@A and B%CD are values of E and B after the point x is moved to cluster Cp. Here α is the intra cluster dispersion and β is inter-cluster dispersion. E = �

% ∑ (' − ��)*,∈./ (7) and B = �

G ∑ (�� − �H)*G�I� (8) �H is mean of all centroids.

We do the similarity measurement of query image with database image by using Euclidian distance measure technique. If p = (p1, p2, ...,pn) and q = (q1, q2,..., qn) are two points, then the distance from p to q, or from qto p is given by

�(J, ") = K∑ (J� − "�)*%�I� (9) As per the distance differences and similarity, images are get retrieved from database.

IV. EXPERIMENTAL RESULT We have used test data having 771 images belonging from 18 categories which is available at the University of Washington’s Object and Concept Recognition for CBIR research project [19]. On those images, we have used color feature and edge feature extraction technique and constructed database for visual descriptor. Then by applying contribution based clustering algorithm, we formed the cluster of images. Result of retrieved images is compared with images retrieved by using standalone color feature extraction and contribution based clustering algorithm. Recall, precision and f-measure are the performance measurement techniques we have used. F-measure is nothing but the harmonic mean of precision and recall. So improvement in F-measure is of essence.

L#��MM = N$��M �O�P#� $Q �#���#R#� �#M#R��� ���S#!N$��M �$ $Q �#M#R��� ���S#!

T�#��!�$� = N$��M �O�P#� $Q �#���#R#� �#M#R��� ���S#!N$��M �$ $Q �#���#R#� ���S#!

U − �#�!O�# = 2 × T�#��!�$� × L#��MMT�#��!�$� + L#��MM

.The experiment was repeated for different images belonging to Indonesia, Arborgreens, Cherries set. By giving different images, we have calculated average

precision, recall and f-measure value for different number of clusters.

Figure 1. Query Image.

Figure 2. Retrieved Images by Using Color.

878878

Page 5: [IEEE 2014 International Conference on Communication Systems and Network Technologies (CSNT) - Bhopal, India (2014.04.7-2014.04.9)] 2014 Fourth International Conference on Communication

Figure 3. Retrieved Images by using Color and Edge.

TABLE I. PRECISION FOR COLOR AND COLOR WITH EDGE

Number of Cluster Precision Color Precision Color

with Edge 10 0.78 0.73 11 0.66 0.66 12 0.73 0.66 13 0.72 0.75 14 0.68 0.67 15 0.69 0.75 16 0.68 0.73 17 0.71 0.66 18 0.67 0.64

Figure 4. Precision for Color and Color and Edge.

TABLE II. RECALL FOR COLOR AND COLOR WITH EDGE

Number of Cluster Recall Color Recall Color with

Edge 10 0.15 0.38 11 0.12 0.50 12 0.12 0.30 13 0.15 0.38 14 0.12 0.21 15 0.12 0.38 16 0.15 0.25 17 0.12 0.25 18 0.09 0.30

Figure 5. Recall for Color and Color and Edge.

TABLE III. F-MEASURE FOR COLOR AND COLOR WITH EDGE

Number of Cluster F-measure Color F-measure Color

with Edge 10 0.25 0.49 11 0.20 0.57 12 0.21 0.41 13 0.25 0.50 14 0.20 0.32 15 0.20 0.50 16 0.25 0.37 17 0.21 0.36 18 0.15 0.41

879879

Page 6: [IEEE 2014 International Conference on Communication Systems and Network Technologies (CSNT) - Bhopal, India (2014.04.7-2014.04.9)] 2014 Fourth International Conference on Communication

Figure 6. F-measure by Using Color and Color and Edge.

REFERENCES [1] F.H. Long, H.J. Zhang, and D.D. Feng, “Fundamentals of

content-based image retrieval” in ‘Multimedia information retrieval and management-technological fundamentals and applications’, Springer-Verlag, New York, 2003, pp. 1–26

[2] Wan Siti Halimatul Munirah Wan Ahmad and Mohammad Faizal Ahmad Fauzi” Comparison of Different Feature Extraction Techniques in Content-Based Image Retrieval for CT Brain Images” in ‘Multimedia Signal Processing, 2008 IEEE 10th Workshop’ Cairns, Qld ,pp. 503 – 508

[3] J.Z.Wang, J.Li and G.Wiederhold, “SIMPLIcity: semantic sensitive integrated matching for picture libraries”, IEEE Trans. on Pattern Analysis and Machine Intelligence, 2001, vol.23, no.9, pp.947-963.

[4] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom et. al. “ Query by Image and Video Content: The QBIC System,” IEEE Computer society, vol. 28, no. 9, 1995.

[5] G. Liu and B. Lee, "A color-based clustering approach for web image search results," in ACM Proc. 2009 International Conference on Hybrid information Technology (ICHIT '09), Daejeon, Korea New York Aug. 2009, vol. 321, pp. 481-484.

[6] S.R.Surya and G.Sasikala “An Enhanced Image Retrieval using Contribution-based Clustering Algorithm with Spatial Feature of Texture Primitive and Edge Detection” ‘International Journal of Computer Applications’ Volume 33– No.2, November 2011

[7] Sanjay N. Talbar and Satishkumar L. Varma “iMATCH: Image Matching and Retrieval for Digital Image Libraries”Second International Conference on Emerging Trends in Engineering and Technology, ICETET-2009

[8] J. Han and K. Micheline, “Data mining concepts and techniques,”Morgan Kauffman, 2006

[9] R. Xu and D. Wunsch, “Survey of clustering algorithms,” IEEE Transactions on Neural Networks, Vol.16, Issue 3, pp. 645– 678, May2005

[10] P.J. Dutta, D.K. Bhattacharyya, J.K. Kalita and M. Dutta, "Clustering approach to content based image retrieval," in Proc. Conference on Geometric Modeling and Imaging: New Trends (GMAI), 2006, IEEE Computer Society, Washington, DC, pp. 183-188.

[11] V.S.V.S Murthy et al. “Content based image retrieval using Hierarchical and K-means clustering techniques” IJEST Vol. 2(3), 2010,pp.209-212

[12] Thawari, P B, and N J Janwe. "CBIR BASED ON COLOR AND TEXTURE." International Journal of Information Technology and Knowledge Management January-June 2011, Volume 4, No. 1, pp. 129-132.

[13] Fazal-e-Malik and Baharum Baharudin “Efficient Image Retrieval Based on Texture Features” National Postgraduate Conference (NPC) IEEE Kuala Lumpur pp. 1-6. 2011

[14] Zhao Hui, Pankoo Kim, and Jongan Park. "Feature analysis based on Edge Extraction and Median Filtering for CBIR." ‘Computer Modelling and Simulation, 2009. UKSIM '09. 11th International Conference on’pp. 245-249. 2009

[15] A.K.Jain and A.Vailaya, “Image retrieval using color and shape”,Pattern Recognition, 1996, vol.29, no.8, pp.1233-1244.

[16] Y. Chen, J.Z. Wang, and R. Krovetz, “Content-based image retrieval by clustering,” in Proc. 5th ACM SIGMM international Workshop on Multimedia information Retrieval, Berkeley, California, MIR '03, ACM, New York, NY, pp. 193-200 , Nov. 2003.

[17] Raman Maini and Dr. Himanshu Aggarwal “Study and Comparison of Various Image Edge Detection Techniques “ International Journal of Image Processing (IJIP), Volume (3) : Issue (1)

[18] Harikrishna Narasimhan and Purushothaman Ramraj” Contribution-Based Clustering Algorithm for Content-Based Image Retrieval” 2010 5th International Conference on Industrial and Information Systems, ICIIS 2010, Mangalore Jul 29 - Aug 01, 2010 ,pp.442 – 447

[19] “Object and concept recognition for content-based image retrieval,”[Online].Available:http://www.cs.washington.edu/research/imagedatabase/groundtruth/.

880880