literature review on content based image retrieval

Literature Review on Content Based Image Retrieval (CBIR)

2015 July MCS3108 Image Processing and Vision

Name : U. V Vandebona

Index No : 13440722

Registration No: 2013/MCS/072


Page 1

A B S T R A C T

The purpose of this study is to explore on the field of content based image retrieval, which comes under image processing and vision domain. This paper further discusses the research work carried out in this arena specifically more focusing on the question, how this image retrieval technique would impact on the computer vision and its future.

Keywords: Content Based Image Retrieval (CBIR), Computer Vision, Reverse Image Search

INTRODUCTION

The term “Content Based Image Retrieval” was coined by Toshikazu Kato in 1992 on his research article “Database architecture for content-based image retrieval”. His experiments were on about automatic retrieval of desired images from a large collection of image database, based on the image features such like color and shape (Kato 1992).

Content based image retrieval or in shorten CBIR, is an application of computer vision techniques to image retrieval. In the earlier days, when it comes to image retrieval, it was only concept based. Which means using of metadata such as keywords, tags, or descriptions associated with the image giving a concept, giving a descriptive meaning to the image (Khutwad and Vaidya 2013). But we cannot guarantee that, for every image there exist associated text annotations or complete text annotations. For example images captured from surveillance cameras. So looking for content is a good option to fill that gap. In this context, the term "content" refer to colors, shapes, textures, or any other image feature information that can be derived from the image itself. The main drawback of traditional concept based approach is to create that kind of a text descriptive database is time consuming as it need to do manually and may not capture the keywords desired to describe the image. As CBIR is an automated approach, the effect of the human errors is really less.

ARCHITECTURE

A typical CBIR system contains four parts in its process as depict in the figure 1.

1. Creating the image data collection 2. Build feature database by automatically extracting features of the images in image data

collection. 3. Search on a required image using feature database 4. Arrange the order of retrieved results

Search

In most CBIR systems, there are two ways to search for an image. Which method to use is depend on the application domain.


Page 2

• Query by Example

Query by example (QBE) is a query technique that involves providing the CBIR system with an example image or part of an image that it will then base its search upon. Also search with multiple sample images or search with a sketched image can be taken. Result images should all share common elements with the provided sample image. This is also called as reverse image search and Google image search is a popular example for this technique. Commonly used reverse image search algorithms include:

• Scale-invariant feature transform - to extract local features of an image • Maximally stable extremal regions • Vocabulary Tree

• Text Semantics

Apart from the QBE, images can be retrieved by providing text semantics. For example if we query by providing the text “elephant with a flower”, the retrieved images should contain an elephant and a flower. This type of open-ended task is very difficult for computers to perform as there need to be some training to match features of the semantics. This method needs some form of human feedback to optimize the resulting images. Human interaction can progressively refine the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information.

Figure 1 - General Architecture of a CBIR System


Page 3

THE “CONTENT” COMPARISON

The most common method for comparing two images in CBIR is using an image distance measure. It compares the similarity of two images in various dimensions such as color, texture and shape; the visual features what we described in the next section. For example a distance of value 0 signifies an exact match with the query, with respect to the dimensions that were considered. Search results then can be sorted based on their distance to the queried image. Many techniques to measure image distance what we called as similarity models, have been developed and can be used for fulfill this requirement. The distance formulas used by many researchers, for image retrieval, include Histogram Euclidean Distance, Histogram Intersection Distance, Histogram Manhattan Distance and Histogram Quadratic Distance (Singha and Hemachandran 2012). The evaluation of results can be done in terms of precision and recall.

THE “CONTENT” - VISUAL FEATURES

Comparing same two images may be an easy task. But if those same two images have different scales, rotations or different transformations, then it would be challenging. It would make more challenging if the object in the image itself has different transformations than to the other. To solve this problem, in CBIR, a description of the required content in terms of visual features of an image is used. Features described can be either of general purpose or domain specific. General features include low level features such like color and texture, and middle level features such like shape etc., whereas domain specific features are those used in special applications such as biometrics.

Color

This method has advantages of speed retrieval, low demand of memory space and not sensitive with the image changes of the size and rotation. Therefore this type of CBIRs being widely used. Color based general purpose image retrieval systems roughly fall into three categories depending on the feature extraction approach used.

1. Histogram Based 2. Color Layout Based 3. Region Based

Histogram-based search methods are investigated in two different color spaces RGB and HSV. The first order (mean), the second order (variance) and the third order (skewness) color moments have been proved to be efficient and effective in representing color distributions of images (Khutwad and Vaidya 2013). Computing distance measures based on color similarity is achieved by computing a color histogram for each image that identifies the proportion of pixels within an image holding specific values.

Many research results suggested that by extending the global color feature to a local one, can obtain better resultant image retrieval. So a good approach is to divide the whole image into sub blocks and extract color features from each of the sub blocks. A variation of this approach is the quad tree-based color layout approach where the entire image is split into a quad tree structure and each tree branch has its own histogram to describe its color content.


Page 4

Even though color layout based approach is conceptually simple, the computation and storage mechanism is expensive. So a more sophisticated approach is to segment the image into regions with salient color features by color-set back projection and then to store the position and color-set feature of each region (Kaur and Banga 2011).

Texture

This method looks for visual patterns in images and how they are spatially defined. Textures are represented by textons which are then placed into a number of sets, depending on how many textures are detected in the image. These sets not only define the texture, but also where in the image the texture is located. Texture based general purpose image retrieval systems usually adopt texture statistic features and structure features by transforming the special domain into frequency domain (Kodituwakku and Selvarajah 2011). This method uses following methods to classify textures.

1. Co-occurrence matrix 2. Laws texture energy 3. Wavelet Transform and Gabor Transform (Singh and Minu 2013) 4. Orthogonal Transforms

Shape

Shapes will often be determined first applying segmentation or edge detection to an image. Edges convey essential visual information about images. The edge descriptor captures the five categories of spatial distribution of edges that include vertical, horizontal, 45 degree diagonal, 135 degree diagonal, and isotropic. This model expects the input as query by example and any combination of features can be selected for retrieval.

Most of the shape descriptors are not been able to address varieties of shape variations in nature. Shapes of natural objects can be from different angles and can be rotated, scaled, skewed, stretched, defected and can be noise affected, etc. It is generally recognized that an effective shape representation should be rotation, translation and scaling invariant. A shape representation should also be invariant or robust to affine and perspective transform to address the skew, stretching, and different views of objects. Generally, there are two groups of shape descriptors;

1. Contour-based shape descriptors 2. Region based shape descriptors.

The Contour shape descriptors only employ shape boundary information and capture shape boundary features. Region-based shape descriptors make use of all the pixel information across the shape region (Kaur and Banga 2011).

SUMMARY AND DISCUSSION

What features and representations should be used in image retrieval is depend on the application domain. By combining the content based image retrieval techniques with the concept based image retrieval techniques, the overall image retrieving performance can make increased.


Page 5

REFERENCE

1. Kato, Toshikazu. "Database architecture for content-based image retrieval." Proceedings of SPIE Image Storage and Retrieval Systems. 1992.

2. Kaur, Simardeep, and V K Banga. "Content Based Image Retrieval." International Conference on Advances in Electrical and Electronics Engineering, 2011.

3. Khutwad, Harshada Anand, and Ravindra Jinadatta Vaidya. "Content Based Image Retrieval." International Journal of Image Processing and Vision Sciences (ISSN Print: 2278 – 1110) Vol 2, no. 1 (2013).

4. Kodituwakku, Saluka Ranasinghe, and S Selvarajah. "Analysis and Comparison of Texture Features for Content Based Image Retrieval." International Journal of Latest Trends in Computing (E-ISSN: 2045-5364) Vol 2, no. 1 (March 2011).

5. Singh, Garima, and Priyanka Bansal Minu. "Content Based Image Retrieval." International Journal of Innovative Research and Studies (ISSN: 2319-9725) Vol 2, no. 7 (July 2013).

6. Singha, Manimala, and K Hemachandran. "Content Based Image Retrieval using Color and Texture." Signal & Image Processing : An International Journal (SIPIJ) Vol 3, no. 1 (2012).

Concept Based Image Retrieval

Content Based Image Retrieval - Low Level Features

Color Texture Shape

Histogram

Color Layout Regi

on

Wavelet Transform and

Gabor Transform

Contour-based shape

descriptors

Region-based shape

descriptors

Use Metadata RGB, HSV

Local color features

Segmentation

Texture statistic features & structure features

Shape boundary information

Regional pixel information

Disadvantages

− Every image do not have complete metadata − Time consuming manual labor to create the feature database

Doesn’t consider the local color information

Computation and storage mechanism is expensive than region based method

Variances in textons can lead confusion for search.

Not been able to address accurately varieties of shape variations

Advantages Speed

− Speed − Low demand for memory space − Not sensitive for image transformations

Performance Performance

literature review on content based image retrieval

Education