genetic algorithm-based relevance feedback for image retrieval using local similarity patterns

Genetic algorithm-based relevance feedback for imageretrieval using local similarity patterns

Zoran Steji�cc a,*, Yasufumi Takama a,b, Kaoru Hirota a

a Department of Computational Intelligence and Systems Science (c/o Hirota Lab.),

Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology,

4259 Nagatsuta, Midori-ward, Yokohama 226-8502, Japanb PREST, Japan Science and Technology Corporation (JST), Tokyo, Japan

Received 13 January 2002; accepted 17 February 2002

Abstract

Local similarity pattern (LSP) is proposed as a new method for computing image similarity. Similarity of

a pair of images is expressed in terms of similarities of the corresponding image regions, obtained by the

uniform partitioning of the image area. Different from the existing methods, each region-wise similarity is

computed using a different combination of image features (color, shape, and texture). In addition, a method

for optimizing the LSP-based similarity computation, based on genetic algorithm, is proposed, and in-

corporated in the relevance feedback mechanism, allowing the user to automatically specify LSP-based

queries. LSP is evaluated on five test databases totalling around 2500 images of various sorts. Comparedwith both the conventional and the relevance feedback methods for computing image similarity, LSP brings

in average over 11% increase in the retrieval precision. Results suggest that the proposed LSP method,

allowing comparison of different image regions using different similarity criteria, is more suited for mod-

eling the human perception of image similarity than the existing methods.

� 2002 Elsevier Science Ltd. All rights reserved.

Keywords: Image retrieval; Image similarity; Relevance feedback; Genetic algorithm

1. Introduction

During the last decade, exponential growth in the amount of digital images on the Internet, invarious image collections and databases, has lead to the rapid growth of the image retrieval field,

*Corresponding author. Tel.: +81-45-924-5682; fax: +81-45-924-5676.

E-mail addresses: [email protected] (Z. Steji�cc), [email protected] (Y. Takama), [email protected] (K. Hirota).

0306-4573/03/$ - see front matter � 2002 Elsevier Science Ltd. All rights reserved.

PII: S0306-4573(02)00024-9

Information Processing and Management 39 (2003) 1–23www.elsevier.com/locate/infoproman

mail to: [email protected]

and, as a consequence, the development of a number of image retrieval systems (Del Bimbo, 1999;Lew, 2001; Smeulders, Worring, Santini, Gupta, & Jain, 2000).The conventional image retrieval process essentially consists of four steps, generally present in

all of the existing systems (Del Bimbo, 1999; Lew, 2001; Smeulders et al., 2000):

1. Querying: the user enters the query image to the system, expressing the user’s information need.2. Similarity computation: the system computes the similarity between the query image and all thedatabase images.

3. Retrieval: the system retrieves the database images most similar to the user’s query image andpresents them to the user.

4. Relevance feedback: the user evaluates the retrieved images as more or less relevant to thequery, whereas, based on that, the system adapts the parameters of the similarity computationmethod and returns to Step 2 (Similarity computation).

The retrieval process, for a given query image, finishes at a point when the user is satisfied with theretrieved images.In the querying step, according to the most widespread query-by-example paradigm (Del

Bimbo, 1999; Lew, 2001; Smeulders et al., 2000), user either browses the image database andchooses the appropriate image, or draws the query image. Regarding the retrieval step, the mainissues are related to the data structures and access methods allowing efficient organization andrapid retrieval of large volumes of image data (Del Bimbo, 1999; Smeulders et al., 2000).The focus of the existing image retrieval research, however, is on the similarity computation

step and the relevance feedback step, which are mutually dependent and closely related (Smeul-ders et al., 2000). Namely, assuming that the query image accurately expresses the user’s infor-mation need, the main problem is in modeling the user’s understanding of image similarity. Inother words, to retrieve the images that the user expects in response to a query, the retrievalsystem must approximate the user’s similarity criteria, when comparing the query image to thedatabase images.Similarity computation step models image similarity based on the combinations of various

features (e.g., color, shape, and texture) extracted from images (Del Bimbo, 1999; Smeulders et al.,2000). However, since human perception of image similarity is both subjective and context-dependent (Santini & Jain, 1997; Santini, Gupta, & Jain, 2001), Relevance feedback step, in-volving user’s interaction, is necessary for the system to adapt the similarity computation methodto each user, and infer the optimal similarity criteria for a given query image (Rui & Huang, 2001;Rui, Huang, Ortega, & Mehrotra, 1998).The focus of this paper is on the Similarity computation and Relevance feedback steps as well:

• Similarity computation: a new method for computing image similarity is proposed, based onthe idea that distinguishing different objects in the image requires different similarity criteriafor each object. Consequently, when comparing images, image regions corresponding to dif-ferent objects are compared using different combinations of image features. The proposedmethod expresses the similarity of a pair of images in terms of similarities of the corre-sponding image regions, and allows different regions to be compared using different similaritycriteria.

2 Z. Steji�cc et al. / Information Processing and Management 39 (2003) 1–23

• Relevance feedback: a genetic algorithm (GA) based method is proposed, for finding an optimalassignment of similarity criteria to image regions, and incorporated in the relevance feedbackmechanism, taking the burden of the explicit query specification off the user.

The proposed method is called local similarity pattern (LSP) method. It addresses the problemof the existing methods for image similarity computation, which do not allow different imageregions to be compared using different similarity criteria (Brandt, Laaksonen, & Oja, 2000; Chan& King, 1999a,b; Laaksonen, Oja, Koskela, & Brandt, 2000; Li, Wang, & Wiederhold, 2000; Rui& Huang, 2001; Rui et al., 1998; Stricker & Orengo, 1995; Wang, Li, & Wiederhold, 2001). This,despite the flexibility of the existing methods in choosing the optimal similarity criteria, preventsthe specification of complex, however natural for a human, similarity criteria used for imagecomparison (Section 2.1), resulting in the low retrieval precision.The proposed LSP method is experimentally evaluated through the comparison with the rep-

resentative conventional (Brandt et al., 2000; Laaksonen et al., 2000; Li et al., 2000; Stricker &Orengo, 1995; Wang et al., 2001), as well as relevance feedback (Chan & King, 1999a, 1999b; Rui& Huang, 2001; Rui et al., 1998) methods for computing image similarity, on five test databases,totalling around 2500 images of various sorts. In this paper, the term ‘‘conventional methods’’refers to the methods not using the relevance feedback.Section 2 gives background on the existing image similarity computation methods, while

Section 3 proposes LSP method as a new image similarity computation method. Section 4 pro-poses a GA-based method to optimize the LSP-based similarity computation, and incorporate itin the relevance feedback mechanism. Section 5 describes the image indexing, as well as the ex-perimental performance evaluation of the proposed method.

2. Background on image similarity modeling

2.1. Human vs. computer similarity perception

For a human, the similarity of images usually means the similarity of objects appearing in theimages. Therefore, most of the existing image retrieval systems model the image similarity at theobject level (Li et al., 2000; Rui & Huang, 2001; Rui et al., 1998; Smeulders et al., 2000; Wanget al., 2001).Research in neuroscience (Wandell, 1995) and computer vision (Garcia, Fdez-Valdivia, Fdez-

Vidal, & Rodriguez-Sanchez, 2001) confirms that, while different objects are primarily charac-terized by different visual features, when judging image similarity, human does not always put anequal emphasis on all the features characterizing each object. For example, for a user searchingthe photos of skies, ‘‘cloudy, gray sky’’ and ‘‘clear, blue sky’’ are similar, no matter how differentsome of their visual features (e.g., color) are.For a computer, an object appearing in the image reduces to a set of pixels, i.e., an image region,

meaning that the similarity of objects corresponds to the similarity of image regions containingthe objects (Smeulders et al., 2000). This is an underlying assumption of all of the image retrievalsystems mentioned earlier. As stated in (Wang et al., 2001), ‘‘Region-based retrieval systems at-tempt to overcome the deficiencies of [. . .] by representing images at the object level.’’

Z. Steji�cc et al. / Information Processing and Management 39 (2003) 1–23 3

2.1.1. Image similarity and image featuresFor modeling the image similarity, conventional retrieval systems, e.g., (Brandt et al., 2000;

Laaksonen et al., 2000; Stricker & Orengo, 1995), use simple image features––like color, shape,and texture––resulting in color similarity, shape similarity, etc. However, as the number andvariety of images in a database grows, the discriminative power of simple features becomes in-sufficient (Vailaya, Jain, & Zhang, 1998). Therefore, more advanced systems, e.g., (Wang et al.,2001), use combinations of features (e.g., color–shape similarity), with relative importance of eachfeature either fixed (Li et al., 2000; Wang et al., 2001), specified by the user (Del Bimbo, 1999;Smeulders et al., 2000), or interactively inferred by the system through the relevance feedback(Chan & King, 1999a,b; Rui & Huang, 2001; Rui et al., 1998).Techniques like relevance feedback allow arbitrarily complex combinations of image features to

be used for computing image similarity, while not imposing any burden on the user, regarding thechoice of features or their relative importance (Chan & King, 1999a, 1999b; Rui & Huang, 2001;Rui et al., 1998). However, even in the systems using the relevance feedback, once the optimalcombination of image features is inferred for a given query image, the same similarity criteria areapplied to the whole image area (or, in some cases, a part of the image, i.e., the region of interest,see Fig. 1). This problem is recognized in a recent survey containing 53 references related to therelevance feedback techniques in the image retrieval (Zhou & Huang, 2001): ‘‘Most relevancefeedback schemes are designed to deal with global image features only, which apparently is notthe best choice.’’Summarizing, the existing methods for computing image similarity do not allow different image

regions to be compared using different similarity criteria. This, despite the flexibility in choosingthe optimal similarity criteria, prevents the specification of complex, however natural for a human,similarity criteria used for image comparison. For example, given a photo of a car taken on a

Fig. 1. Model of image similarity used in the existing methods, compared with the proposed LSP.


sunny day, color similarity is sufficient (and, moreover, optimal) for identifying regions similar tothe ‘‘clear, blue sky’’ above the car, but for identifying regions similar to the ‘‘car’’ itself, shape andother features are necessary. Since, for different image regions, i.e., objects corresponding to them,feature combinations having sufficient discriminatory power can be very different (Vailaya et al.,1998), applying the same, no matter how optimal, similarity criteria to the whole image area limitsthe ability of the existing systems to model the human perception of image similarity.

2.1.2. Assumptions about image similarity

Summarizing, from the preceding discussion we draw the following assumptions about theimage similarity, on which the proposed similarity computation method is based:

• Human perception of image similarity is based on the similarity of objects appearing in theimage.

• In order to distinguish different image objects, different similarity criteria are necessary for eachobject.

2.2. Image similarity computation methods

In the following, we survey the existing conventional and relevance feedback methods forcomputing image similarity. As mentioned in Section 1, in this paper, the term ‘‘conventionalmethod’’ is used to denote a method not using the relevance feedback.

2.2.1. Conventional methodsColor moments method (Stricker & Orengo, 1995) models image similarity in terms of similarity

of color distributions. Image is represented in the HSV color space, and each color channel (H, S,and V) is interpreted as a probability distribution. For each of the three color channels, first,second, and third central moments (average, variance, and skewness) are computed. The obtainednine values, arranged into a feature vector, are used to represent the image. Distance between apair of feature vectors, corresponding to image (dis)similarity, is computed using the weightedEuclidean distance. Color moments method is shown to outperform the color histogram method(Del Bimbo, 1999; Smeulders et al., 2000), widely used in image retrieval.Edge-direction histogram method (Brandt et al., 2000; Vailaya & Jain, 1998) models image

similarity in terms of similarity of distributions of edge directions. Sobel edge detector is appliedto each color channel, in order to identify the edge pixels, together with the corresponding edgedirections (from 0� to 360�), quantized into eight levels (of 45� each). The results from differentcolor channels are combined into an edge direction histogram, showing the distribution of edgedirections over the eight quantized ranges. Distance between a pair of edge-direction histograms iscomputed using the city-block distance (Del Bimbo, 1999). Edge-direction histogram method isfrequently used for modeling the shape similarity of images (Brandt et al., 2000; Vailaya & Jain,1998).Texture neighborhood method (Laaksonen et al., 2000) models image similarity in terms of

similarity of distributions of pixels’ brightness. Image is converted into gray-scale, and brightnessof every pixel’s eight-neighborhood is examined, while the estimated probabilities (over the wholeimage) for each neighbor being brighter than the center pixel are used as features. Eight neighbor


pixels give eight probability values arranged into a feature vector. Distance between a pair offeature vectors is computed using the city-block distance (Del Bimbo, 1999). Texture neighbor-hood method is used for modeling the texture similarity of images.SIMPLIcity is an advanced image retrieval system (Li et al., 2000; Wang et al., 2001), which

models image similarity in terms of similarities of the corresponding image regions, obtained bythe image segmentation. Image is segmented using the k-means clustering algorithm, based oncolor and frequency features. Image similarity is computed using the integrated region matching(IRM) method (Li et al., 2000), that integrates properties of all the regions in the images, and isaimed at reducing the influence of inaccurate segmentation. Features extracted from regions,through the wavelet transform, are color, shape, and texture. When computing region simi-larity, color and texture features are more emphasized than shape features, based on heuris-tics. Region similarity with respect to each feature is computed using the Euclidean distance.SIMPLIcity system is shown to significantly outperform the methods based on simple imagefeatures.

2.2.2. Relevance feedback methodsRelevance feedback methods exploit the context––defined by the query image and a set of

relevant images, chosen by the user––to infer an optimal combination of image features used forthe image similarity computation (Chan & King, 1999a,b; Rui & Huang, 2001; Rui et al., 1998).Image similarity is expressed as a weighted sum of feature-wise similarities, where an arbitrarynumber of image features can be employed. Feature-wise similarities are computed using con-ventional methods, described earlier. Relevance feedback is used to assign a weight (in range[0; 1]), i.e., the relative importance, to each feature.Standard deviation-based method (Rui & Huang, 2001; Rui et al., 1998), is based on the as-

sumption that, if all the relevant images have similar values for a given feature, then that feature isa good indicator of the user’s information need. On the other hand, if the values for a feature arevery different among the relevant images, then that feature is not a good indicator. Similar valuesof a given feature imply that the standard deviation (and consequently variance as well) of thatfeature, computed over all the relevant images, is small. Accordingly, for each feature, variance iscomputed over all the relevant images, and the weight of a feature is set to be the inverse of thecorresponding variance. The smaller the variance, the bigger the weight, and vice versa. Standarddeviation-based method is a representative relevance feedback method, both in effectiveness andin efficiency (Lew, 2001).Genetic algorithm-based method (Chan & King, 1999a,b) uses GA to find an optimal assignment

of the weights to the image features. The objective is to maximize the ‘‘ranking score function’’,i.e., to maximize the average rank of the relevant images. This, in turn, means maximizing theretrieval precision, since the higher the relevant images are ranked, the higher is the precision.The GA-based method, while slightly more effective, is significantly slower than the standard

deviation-based method (Section 5.2). However, different from the standard deviation-basedmethod, it does not directly utilize the information about the relevant images (i.e., their features),which makes it less sensitive to user’s inconsistency in choosing the relevant images.The applications of GA to relevance feedback in image retrieval are very rare––a recent survey

containing 53 references related to the relevance feedback techniques in the image retrievalmentions none (Zhou & Huang, 2001).


2.2.3. Relation of the proposed method to the existing onesThe proposed method expresses image similarity in terms of region-wise similarities, similar to

the SIMPLIcity system (Li et al., 2000; Wang et al., 2001). The difference is that, in the proposedmethod, image regions are obtained by the uniform partitioning of the image area, while in theSIMPLIcity system image segmentation is used. The computation of region-wise similarities alsodiffers, since the proposed method uses a different feature combination for each image region,while in the SIMPLIcity system all regions are compared using the same similarity criteria.The difference between the proposed method and the relevance feedback methods is that the

relevance feedback methods do not partition the image area into regions, but apply the samesimilarity criteria to the whole image area. Furthermore, the proposed GA-based relevancefeedback approach is different from the the GA-based method (Chan & King, 1999a,b), since ituses only a finite number of feature combinations (Section 3), thus significantly reducing the sizeof the search space, without negative effect on the performance (Section 5.2).

3. Local similarity pattern

Starting from the assumptions about the image similarity, elaborated in the previous section,we propose the LSP method, which expresses image similarity in terms of similarities of thecorresponding image regions, and in addition allows different image regions to be compared usingdifferent combinations of image features.Variables and functions used for the formalization of the LSP method are summarized in

Tables 1 and 2, respectively, and explained in the following.Set of images I represents the image database. Set of regions R contains N � N (¼ nR) regions

obtained by the uniform partitioning of the image area. Set of features F contains image featuresthat are extracted from regions and used for computing region-wise similarity. Feature denoteseither a simple image feature, e.g., color or texture, or an arbitrary combination of image features,e.g., color–shape.Feature assignment function FR, given an image region r, assigns an image feature FRðrÞ to that

region. (Fig. 2). Region similarity function DR returns the similarity degree DRði1; i2; r; fÞ of a pair of

Table 1

Variables used for formalizing the LSP method

Set Symbol Element Size

Images I ij nIRegions R rj nRFeatures F fj nF

Table 2

Functions used for formalizing the LSP method

Name Symbol Mapping

Feature assignment FR R ! FRegion similarity DR I � I � R� F ! ½0; 1�Image similarity DI I � I ! ½0; 1�


images i1 and i2, with respect to image region r, and using image feature f. Finally, image similarityfunction DI returns the similarity degree DIði1; i2Þ of a pair of images i1 and i2, being the arithmeticaverage of region-wise similarities:

DIði1; i2Þ ¼1

nR

XnR

i¼1DRði1; i2; ri; FRðriÞÞ ð1Þ

The expression for image similarity function DI depends on the set of regions R and the featureassignment function FR. Consequently, LSP is defined as a structure containing R and FR:

LSP � ðR; FRÞ ð2ÞIn short, LSP is the assignment of image features to image regions, which are obtained by uniformpartitioning of the image area.Despite the uniform partitioning of the image area, LSP is able to identify even the image

regions containing objects of irregular (i.e., non-rectangular) shape. For example, as Fig. 3 il-lustrates, the ‘‘sky’’ region in the template image is identified by forming a patch of ‘‘color’’ blocksin the upper half of the image, corresponding to the ‘‘sky’’.In the ideal case, LSP partitions the image area precisely into regions (segments) containing

objects like ‘‘sky’’, etc. A unique combination of image features characterizes each image region,and, in turn, the object(s) contained within that region, which corresponds to human perceptionand enables the images containing similar objects to be identified and retrieved.

3.1. LSP-based image queries

As described, LSP is proposed for computing similarity between the query image and thedatabase images. However, each query image, depending on the context in which it is used, can

Fig. 2. Feature assignment function for LSP method.


have many different interpretations, with each interpretation requiring different similarity criteriafor retrieving similar images (Santini & Jain, 1997; Santini et al., 2001). For example, the samephoto of a landscape showing a lake in front of a mountain can, in one context, mean that the useris looking for photos of lakes, while, in a different context, the user might be looking for photos ofmountains.Without specifying the intended interpretation of a query image, it is not possible to define the

image similarity criteria. Therefore, according to our definition, a well-defined query consists ofthe query image together with the intended interpretation (i.e., the corresponding similarity cri-teria).The idea behind LSP, as a structure, is to represent the intended interpretation (and the as-

sociated similarity criteria) of the query image in a given context. An LSP-based query we proposeconsists of the query image, that the user selects, and the LSP, that the system automatically infersbased on the user’s relevance feedback. Of course, an experienced user could as well manuallyspecify the LSP.The essential point is that, in order to specify the information need completely, user’s query

must include both the query image and its intended interpretation, i.e., similarity criteria.

3.2. LSP implementation issues

LSP method can represent complex image similarity criteria, and using them approximatehuman perception of image similarity. However, for a user not familiar with the relation of imagesimilarity to image partitioning and image features, manually specifying the optimal LSP, given aquery image, is difficult. Furthermore, the number of possible assignments of image features toimage regions is huge, making it impossible to try out all the possible LSPs and find the best one.For example, uniformly partitioning the image area into 7� 7 regions, and employing six types

of image features––color, shape, texture, color–shape, color–texture, and shape–texture––gives67�7 1:3� 1038 possible LSPs. Furthermore, the space of LSPs is highly discontinuous, meaningthat two LSPs differing in only a few regions can correspond to very different similarity criteria.

Fig. 3. Example template image segmented into regions containing objects like ‘‘sky’’, ‘‘house’’, and ‘‘grass’’, together

with the corresponding LSP. Each region is characterized by a different image feature, which is reflected by the LSP as

well.


The size and high discontinuity of the space of LSPs have motivated us to use a GA for findingoptimal LSP. In Section 4, we propose a GA-based approach to solve the problem of assigningimage features to image regions in an optimal way. GA actually realizes the feature assignmentfunction FR.

4. GA-based approach to relevance feedback using LSP

This section proposes a GA-based approach to optimize the LSP-based similarity computation,and incorporate it in the relevance feedback mechanism.

4.1. Image retrieval as optimization problem

Before GA can be employed to find an optimal assignment of image features to the imageregions of the LSP, the retrieval process using LSP-based queries must be formalized as an op-timization problem. Variables used for the formalization are summarized in Table 3, and ex-plained in the following.Set of images I represents the image database. Set of queries Q represents the query images

chosen by the user. Set of relevant images AðqÞ for query q represents the correct answers to aquery (i.e., the ground truth), as defined by the user. This set is initially, at the very beginningof the retrieval process, a singleton, containing only the query image itself, and gradually grows asthe user evaluates the retrieved images through the relevance feedback. Set of LSPs S representsthe proposed LSP, according to which the system computes the similarity between the queryimage and database images. Finally, set of retrieved images Rðq; sÞ for query q, and with respect toLSP s, represents the images retrieved by the system in response to the query.Based on the preceding definitions, retrieval precision for query q and LSP s is defined as:

P ðq; sÞ ¼ jRðq; sÞ \ AðqÞjnAðqÞ

2 ½0; 1� ð3Þ

representing the ratio of the relevant images that the system retrieved, among the highest-rankednAðqÞ images.At this point it is necessary to notice that, in response to query q and using LSP s, the system

generates the ranking of all database images (i.e., nRðq;sÞ ¼ nI), according to the similarity to thequery image (using the image similarity function DI , Eq. (1)). However, only the highest ranked

Table 3

Variables used for formalizing the retrieval process as optimization problem

Set Symbol Element Size Property

Images I ij nI –

Queries Q qj nQ IRelevant images AðqÞ aj nAðqÞ ILSPs S sj nS –

Retrieved images Rðq; sÞ rj nRðq;sÞ I


nAðqÞ images are considered, since the images most similar to the query image, i.e., top-rankedimages, are the most important ones for the user.Following these definitions, the objective is to maximize the retrieval precision Pðq; sÞ, with

respect to LSP s, for a given query q:

maxs2S

Pðq; sÞ 8q 2 Q ð4Þ

Maximizing Pðq; sÞ with respect to LSP s means searching for a LSP that, given a query image q,results in the highest ratio of the retrieved relevant images, i.e., best approximates the user’ssimilarity criteria, in the context defined by the query image and the set of relevant images.

4.2. GA for optimizing LSP-based image queries

Given a user’s query image, a GA is used to automatically generate a LSP, and complete a LSP-based query (Section 3). The objective is to generate a LSP which closely approximates the user’sperception of image similarity, and results in optimal retrieval precision, in a user-defined sense.The underlying mathematical model was presented in Section 4.1.

4.2.1. Genetic algorithms

GA is a domain-independent problem-solving method, that attempts to find a sub-optimal oroptimal solution to a problem by genetically breeding the population of individuals, where eachindividual represents a possible solution to a given problem (Goldberg, 1989). Each individual hasan associated fitness value, which expresses the quality of the corresponding solution to aproblem. Starting with a population of randomly created individuals, GA progressively breeds apopulation of individuals over a series of generations using natural selection, crossover (recom-bination), mutation, and other genetic operations. Individuals in the population, i.e., chromo-somes, are represented as fixed-length character strings over a given alphabet.Solving a problem using GA requires specification of: (1) chromosome coding, (2) fitness

measure, and (3) GA parameters (Goldberg, 1989).

4.2.2. GA for generating LSPAs described in Section 3, we propose a GA-based solution to automatically generate a LSP.

The task of the GA is to realize the feature assignment function FR (Section 3), i.e., to assign imagefeatures to the image regions obtained by the uniform partitioning of the image area (Fig. 4). Inthat way, GA defines which image features are used when comparing the corresponding regions ofa pair of images (by the region similarity function DR, Section 3). Six (combinations of) imagefeatures are employed (set F, Section 3): color, shape, texture, color–shape, color–texture, andshape–texture (Section 5.1).

4.2.3. Chromosome codingA chromosome represents image area uniformly partitioned into N � N regions (nR ¼ N � N is

the size of the set of regions R, Section 3). Each gene of a chromosome corresponds to one imageregion, and stores the type of image feature {f} assigned to that region (f 2 F, Section 3). Thelength of a chromosome is N � N genes. In the experiments, N 2 f3; 5; 7g was used (Section 5.1).A chromosome is illustrated in Fig. 4.


4.2.4. Fitness measureSince chromosome corresponds to a LSP, fitness of a chromosome is expressed as the retrieval

precision of the corresponding LSP-based query (Eq. (3)).

4.2.5. GA parameters

We adopted conventional GA parameters (Chan & King, 1999a,b; Goldberg, 1989). Selection isroulette-wheel. Crossover is uniform, with probability 0.6. Mutation is standard, with probability0.1. Population size is 50 chromosomes. Length of the evolution is 250 generations.

5. LSP method implementation and performance evaluation

This section describes the image indexing used for the LSP-based similarity computation(Section 5.1), as well as the systematic performance evaluation of the proposed method (Section5.2).

5.1. Image indexing for LSP method

LSP method allows arbitrarily many image features and their combinations to be assigned tothe image regions, and used for the image similarity computation (Section 3). In principle, themore features are available, the more complex image similarity criteria can be expressed, and theuser’s perception of image similarity can be better approximated.Since most of the image features used in conventional retrieval systems (e.g., shape, texture), do

not have a meaning at the level of a single image pixel, we define a basic unit of image comparisonto be an image region, i.e., a rectangular array of image pixels, which is a common approach inimage retrieval (Del Bimbo, 1999). Accordingly, each database image is uniformly partitionedinto N � N regions (set R, Section 3), and image features are extracted from each region, makingup the image index. The number of regions (N � N ) is chosen to ensure that most of the regions in

Fig. 4. Proposed GA for generating LSP by assigning image features to image regions in an optimal way.


isolation do not correspond to more than a single object shown in the image. In the experiments(Section 5.2), we used N 2 f3; 5; 7g, depending on the test database.Regarding the image features extracted from regions (set F, Section 3), we have chosen three

most commonly used in image retrieval (Del Bimbo, 1999; Smeulders et al., 2000), as well as theircombinations: color, shape, texture, color–shape, color–texture, and shape–texture (Section 4.2).Color features are represented by color moments (Stricker & Orengo, 1995), resulting in a nine-

dimensional feature vector. Shape features are represented by edge-direction histogram (Brandtet al., 2000), resulting in eight-dimensional (8D) feature vector. Texture features are representedby texture neighborhood (Laaksonen et al., 2000), resulting in 8D feature vector. The detailsabout each method are given in Section 2.2.The distance between a pair of feature vectors, which expresses region-wise image similarity with

respect to a given feature (function DR, Section 3), is computed using weighted Euclidean distance(Stricker & Orengo, 1995) for color moments, and city-block distance (Del Bimbo, 1999) for edge-direction histogram and texture neighborhood (Section 2.2). For a combination of features, e.g.,color–shape, the region-wise similarity is computed as the arithmetic average of the individual (i.e.,color and shape) similarities.

5.2. Experiments and performance evaluation

Test databases. The performance of the proposed LSP method is systematically evaluated onfive test databases, totalling around 2500 images:Vistex-60 database. The database contains 60 color images of resolution 128� 128 pixels,

showing ‘‘real-world’’ scenes, each with multiple textures. Database is partitioned into 10 cate-gories, between 2 and 12 images each. Source is (MIT Media Lab, 2001), directory pub/FLAT/scene128�128/.Vistex-167 database. The database contains 167 color images of resolution 128� 128 pixels,

showing homogeneous textures. Database is partitioned into 19 categories, between 3 and 20images each. Source is (MIT Media Lab, 2001), directory pub/FLAT/128�128/.Brodatz-208 database. The database contains 208 gray-scale images of resolution 128� 128

pixels, showing homogeneous textures. Database is partitioned into 13 categories, each with 16images. Each category corresponds to a 512� 512 image from the Brodatz collection (Brodatz,1966), partitioned into 16 128� 128 non-overlapping regions. Source for the 13 Brodatz textureimages is (USC SIPI, 2001).Corel-1000-A database. The database contains 1000 color photographs of resolution 384� 256

pixels, covering a wide range of semantic categories, from natural scenes to artificial objects (Liet al., 2000; Wang et al., 2001). Database is partitioned into 10 categories, each with 100 pho-tographs. Source is (Corel Corporation, 2000).Corel-1000-B database. The database contains 1000 color photographs of resolution 384� 256

pixels, showing natural scenes (Steji�cc, Iyoda, Takama, & Hirota, 2001). Database is partitionedinto 10 categories, each with 100 photographs. Source is (Corel Corporation, 2000).All the five test databases originate from the well-known image collections, used for the

evaluation of the image retrieval systems (Li et al., 2000; Rui & Huang, 2001; Wang et al., 2001).Partitioning of each database into semantic categories is determined by the creators of the data-base and reflects the human perception of image similarity.


Vistex-60, Vistex-167, and Brodatz-208 databases are small in size, but diverse in content, in-cluding both color and gray-scale images, with artificial and natural motives. Each of the threedatabases contains a relatively large number of image categories, which is appropriate for testingthe image categorization performance.Corel-1000-A and Corel-1000-B databases are medium in size, with each image category

containing a large number of images, and a big diversity of images within the category. Corel-1000-B database, which focuses only on natural scenes (like vegetation and landscapes), repre-sents a more difficult categorization problem than the Corel-1000-A database, which covers awider range of categories.

5.2.1. Performance measureAs a performance measure, the retrieval precision is used, being, along with the recall, the most

frequently used measure of the retrieval system performance (Del Bimbo, 1999; Smeulders et al.,2000).Evaluation of the retrieval precision is performed so that each image, in each test database, is

used as a query (set Q, Table 3). For each query image (q 2 Q), relevant images (set AðqÞ, Table 3)are considered to be those, and only those, which belong to the same category as the query image(Li et al., 2000; Rui & Huang, 2001; Wang et al., 2001). Based on this, retrieval precision iscomputed for each query image (Eq. (3)). Finally, average retrieval precision is computed for eachcategory, as well as for the whole test database.The reason why the retrieval recall is not evaluated is that, according to our definition (Eq. (3)),

the precision and the recall are equal. Namely, precision is usually defined as the fraction of theretrieved images which are relevant, while recall is the fraction of the relevant images which areretrieved (Del Bimbo, 1999). However, as the ‘‘retrieved images’’, we always consider only thetop-ranked images, whose number corresponds to the number of the relevant images for a givenquery (Li et al., 2000; Wang et al., 2001). Therefore, the number of retrieved images always equalsthe number of relevant images, making the precision equal to the recall.

5.2.2. Methods compared with the LSP methodThe retrieval precision of the proposed LSP method is evaluated through comparison with both

the conventional and the relevance feedback methods, described in Section 2.2.Conventional methods include three methods based on most commonly used image features

(Section 2.2): color (Stricker & Orengo, 1995), shape (Brandt et al., 2000), and texture (Laaksonenet al., 2000), as well as their combinations: color–shape, color–texture, and shape–texture. Thesesix methods are also used in the LSP method, for computing the region-wise image similarity(Section 5.1). In addition, conventional methods include a method employed in SIMPLIcity, anadvanced image retrieval system (Li et al., 2000; Wang et al., 2001). SIMPLIcity uses a wavelet-based feature extraction method, and integrated region matching technique for image similaritycomputation (Section 2.2).Relevance feedback methods include the standard deviation-based method (Rui & Huang,

2001; Rui et al., 1998), and the the GA-based method (Chan & King, 1999a,b). Both methodsinteractively infer the relative importance (i.e., weights) of the image features used for the imagesimilarity computation (Section 2.2). Image features used in both methods are the same as thoseused in the LSP method––color, shape, and texture (Section 5.1).


The six conventional methods, based on simple image features, serve as a basic benchmark forthe comparison. Comparison with the SIMPLIcity system shows whether the simple features,when combined through the LSP, can compete with the more complex image features.The standard deviation-based method is chosen for being a representative relevance feedback

method, both in effectiveness and in efficiency (Section 2.2). Finally, the GA-based method ischosen for the comparison since it uses the same optimization method, GA, as the proposed LSPmethod.In total, nine image similarity computation methods are used for the comparison with the

proposed LSP method. Including the three resolution variations of the LSP method (Section 5.1),this gives 12 methods used in the experiments. Since each image, in each test database, is used as aquery, altogether 29 220 queries (12 methods� 2435 images) are executed, based on which theresults are reported.

5.2.3. Retrieval precision evaluationResults of the comparison are shown in Fig. 5 (Vistex-60 database), Fig. 6 (Vistex-167 data-

base), Fig. 7 (Brodatz-208 database), Fig. 8 (Corel-1000-A database), and Fig. 9 (Corel-1000-Bdatabase).

Fig. 5. Category-wise average retrieval precision of the proposed LSP method, compared with conventional (a), and

relevance feedback (b and c) methods, on Vistex-60 database (60 images, 10 categories).


Figs. 10 and 11 illustrate the retrieval results of the LSP method, compared with the best of theexisting methods used for the comparison––the GA-based method––on Brodatz-208 and Corel-1000-A databases, respectively.In the case of six conventional methods based on simple image features, the results are reported

only for the method with the best performance, which is the shape–texture method for the Bro-datz-208 database, and the color–texture method for the other test databases. Comparison withthe SIMPLIcity system is done only for the Corel-1000-A database, based on the data reported in(Li et al., 2000).As Table 4 summarizes, the proposed LSP method brings in average over 11% increase in the

retrieval precision, compared with both the conventional and the relevance feedback methods.The fact that the LSP has outperformed the existing relevance feedback methods suggests that,

for the image similarity computation closely approximating the human perception, spatial dis-tribution of image features over the image area is more relevant than the distribution of relativeimportance (i.e., weights) among the image features.The results also suggest that the difference in performance between the LSP method and the

existing methods grows, as the size of the database, and the number of relevant images, increase.This makes the LSP method suitable for the retrieval from the large scale image databases.


relevance feedback (b and c) methods, on Vistex-167 database (167 images, 19 categories).


5.2.4. LSP at different resolutionsWe have also examined how the performance of the LSP method depends on the number

of image regions making up the LSP, i.e., the resolution of the uniform partitioning of the im-age area (set of regions R, Section 3). Fig. 12 shows the comparison of the LSPs composed of3� 3, 5� 5, and 7� 7 image regions, respectively, on Vistex-60 and Corel-1000-A test data-bases.The results show that, for the small database (Vistex-60), increasing the resolution slightly

decreases the performance of the LSP method, while for the big database (Corel-1000-A), theeffect is the opposite. The reason is that, in the small database, images within each categoryare homogeneous, meaning that a simple LSP is sufficient to capture the human perception ofthe image similarity. Increasing the resolution of the LSP, in this case, requires more time to findthe optimal, simple LSP, resulting in the performance decrease. On the contrary, for the bigdatabase, with heterogenous images within each category, a complex LSP is necessary to capturethe image similarity, implying that the increase in the resolution results in the performance in-crease.


relevance feedback (b and c) methods, on Brodatz-208 database (208 images, 13 categories).


Increase in the resolution of the LSP results in the increase in the computation time as well.Comparison of computation times of the LSP’s at three different resolutions (Table 5), shows thatthe computation time is proportional to the number of regions making up the LSP.

5.2.5. Computation time evaluationComputation time of the proposed LSP method is evaluated though the comparison with the

described conventional and relevance feedback methods. We consider the time necessary toprocess a single query, i.e., (a) to find the (sub)optimal parameters of the image similaritymodel, and (b) to compute the similarity between the query image and all the database images(Section 1).Since for the methods not using the relevance feedback only (b) is applicable, we focus on the

comparison of the proposed method to the two relevance feedback techniques. We evaluate thethree methods on the Corel-1000-A database, and average the computation time over allthe images in the database, used as queries.The number of iterations necessary for each method to find the (sub)optimal solution is de-

termined experimentally. For the standard deviation-based method, 30 iterations are used, while

Fig. 8. Category-wise average retrieval precision of the proposed LSP method, compared with conventional (a and b),

and relevance feedback (c and d) methods, on Corel-1000-A database (1000 images, 10 categories).



relevance feedback (b and c) methods, on Corel-1000-B database (1000 images, 10 categories).

Fig. 10. Retrieval results (12 top-ranked images) for the proposed LSP method (a), and the best of the existing

methods used for the comparison (b), on Brodatz-208 database. The first image is the query.


for the GA-based method and the proposed LSP method, 250 iterations are used (Section 4.2).Preliminary experiments using larger number of iterations (up to 3000 for the two methods usingGA), have shown no significant increase in the performance for any of the three methods, whichmotivated our choice.Table 6 shows the average computation time necessary to process a single query, for the three

methods. The proposed method is approximately 42 times slower than the standard deviation-based method, and 1.3 times slower than the GA-based method.Due to the relatively long computation time, compared with the representative of the existing

methods, the proposed method might be more suitable for the filtering than for the interactiveretrieval application. Namely, in a situation when the same query is used over a period of time,e.g., to filter the images on the Internet, investing some more time to construct a query isworthwhile, providing that the significantly higher retrieval precision is guaranteed.

Fig. 11. Retrieval results (12 top-ranked images) for the proposed LSP method (a), and the best of the existing

methods used for the comparison (b), on Corel-1000-A database. The first image is the query.

Table 4

Database-wise average retrieval precision of the proposed LSP method, compared with conventional (color–texture and

SIMPLIcity) and relevance feedback (GA and standard deviation) methods, on the five test databases

Average precision (%)

Database Color–texture� SIMPLIcity GA S.D. LSP

Vistex-60 48 – 59 50 72 (þ13)Vistex-167 41 – 51 49 62 (þ11)Brodatz-208 59 – 83 82 90 (þ7)Corel-1000-A 43 46 53 46 68 (þ15)Corel-1000-B 34 – 44 36 55 (þ11)

Average 45 46 58 52 69 (þ11)Numbers in parentheses denote the difference between the LSP and the best of the other methods, for each database.* In the case of Brodatz-208 database, shape–texture method is used for the comparison.


5.2.6. Conclusions drawn from experiments

Characteristics of the proposed LSP method, verified through the experiments, suggest thefollowing conclusions about the image similarity computation:

• For the image similarity computation closely approximating the human perception, spatial dis-tribution of image features over the image area is more relevant than the distribution of weights(i.e., relative importance) among the image features.

• When spatially distributed over the image area, even a small number of simple image features issufficient for modeling the image similarity.

• The use of simple image features, enabled by the LSP method, makes the feature extractionboth easier and faster, and simplifies the overall implementation of the similarity computationmethod.

Fig. 12. Category-wise average retrieval precision of the proposed LSP method, with the image area uniformly par-

titioned into 3� 3, 5� 5, and 7� 7 regions, respectively, on Vistex-60 (a), and Corel-1000-A (b) databases.

Table 5

Computation time of the proposed LSP method, at three different resolutions, for a single query image (i.e., 250 it-

erations of the GA), on Corel-1000-A database (1000 images, 10 categories)

Method Time (s) Time ratio

3� 3 LSP 25.1 1.0

5� 5 LSP 47.2 1.9

7� 7 LSP 72.4 2.9

Table 6

Average computation time necessary to process a single query, on Corel-1000-A database (1000 images, 10 categories)

Method Time (s) Time ratio

Standard deviation 0.6 1

GA 19.5 33

LSP 25.1 42


5.2.7. Future research directionsThe primary concern is the computation time, which has to be improved to make the proposed

LSP method more usable for the interactive retrieval tasks. Possibilities include employing anoptimization method different than the GA, or the combination of the GA currently used, with analternative optimization method.The proposed LSP method is extensible, being able to handle arbitrarily many image features

and their combinations. The question is whether increasing the number of features, or increasingthe complexity of features, has a significant effect on the performance of the method.

6. Conclusion

LSP is proposed as a new method for computing image similarity. Similarity of a pair of imagesis expressed in terms of similarities of the corresponding image regions, obtained by the uniformpartitioning of the image area. Different from the existing methods, each region-wise similarity iscomputed using a different combination of image features (color, shape, and texture). In addition,a method for optimizing the LSP-based similarity computation, based on GA, is proposed, andincorporated in the relevance feedback mechanism, allowing the user to automatically specifyLSP-based queries.LSP is evaluated on five test databases totalling around 2500 images of various sorts. Com-

pared with both the conventional and the relevance feedback methods for computing imagesimilarity, LSP brings in average over 11% increase in the retrieval precision.Results suggest that the proposed LSP method, allowing comparison of different image regions

using different similarity criteria, is more suited for modeling the human perception of imagesimilarity than the existing methods.

Acknowledgements

Authors are grateful to Prof. J.Z. Wang, at the Pennsylvania State University, PA, USA, forproviding the test data (Corel-1000-A test database). Authors also thank to Mr. Eduardo M.Iyoda, at the Tokyo Institute of Technology, Yokohama, Japan, for the valuable comments.

References

Brandt, S., Laaksonen, J., & Oja, E. (2000). Statistical shape features in content-based image retrieval. In Proceedings of

the 15th International Conference on Pattern Recognition, ICPR-2000, Barcelona, Spain (Vol. 2) (pp. 1066–1069).

Brodatz, P. (1966). Textures: A photographic album for artists and designers. New York: Dover Publications.

Chan, D. Y. M., & King, I. (1999a). Weight assignment in dissimilarity function for Chinese cursive script character

image retrieval using genetic algorithm. In Proceedings of the 4th International Workshop on Information Retrieval

with Asian Languages, IRAL’99, Taipei, Taiwan (pp. 55–62).

Chan, D. Y. M., & King, I. (1999b). Genetic algorithm for weights assignment in dissimilarity function for trademark

retrieval. In Lecture Notes in Computer Science (Vol. 1614). Proceedings of the 3rd International Conference on Visual

Information Systems, VISUAL’99, Amsterdam, The Netherlands (pp. 557–565). Berlin: Springer Verlag.

Corel Corporation. (2000). Corel Gallery 3.0. Available: http://www3.corel.com/.


http://www3.corel.com/

Del Bimbo, A. (1999). Visual information retrieval. San Francisco: Morgan Kaufmann Publishers, Inc.

Garcia, J. A., Fdez-Valdivia, J., Fdez-Vidal, X. R., & Rodriguez-Sanchez, R. (2001). Information theoretic measure for

visual target distinctness. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(4), 362–383.

Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison-

Wesley.

Laaksonen, J., Oja, E., Koskela, M., & Brandt, S. (2000). Analyzing low-level visual features using content-based image

retrieval. In Proceedings of the 7th International Conference on Neural Information Processing (ICONIP’00), Taejon,

Korea (pp. 1333–1338).

Lew, M. S. (Ed.). (2001). Principles of visual information retrieval. London: Springer Verlag.

Li, J., Wang, J. Z., & Wiederhold, G. (2000). IRM: Integrated region matching for image retrieval. In Proceedings of the

8th ACM Multimedia Conference, Los Angeles, CA, USA (pp. 147–156).

Massachusetts Institute of Technology, Media Lab. (2001). Vision Texture Database. Available: ftp://whitechapel.

media.mit.edu/pub/VisTex/.

Rui, Y., & Huang, T. S. (2001). Relevance feedback techniques in image retrieval. In Lew (Ed.) (pp. 219–258) 2001

[Chap. 9].

Rui, Y., Huang, T. S., Ortega, M., & Mehrotra, S. (1998). Relevance feedback: a power tool for interactive content-

based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5), 644–655.

Santini, S., & Jain, R. (1997). Similarity is a geometer. Multimedia Tools and Applications, 5(3), 277–306.

Santini, S., Gupta, A., & Jain, R. (2001). Emergent semantics through interaction in image databases. IEEE

Transactions on Knowledge and Data Engineering, 13(3), 337–351.

Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., & Jain, R. (2000). Content-based image retrieval at the end

of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1349–1380.

Steji�cc, Z., Iyoda, E. M., Takama, Y., & Hirota, K. (2001). Automatic textual summarization of image database

contents using combination of clustering and neural network techniques. In Proceedings of the 2nd International

Conference on Intelligent Technologies, Intech 2001, Bangkok, Thailand (pp. 233–239).

Stricker, M., & Orengo, M. (1995). Similarity of color images. In Proceedings of IS&T and SPIE Storage and Retrieval

of Image and Video Databases III, San Jose, CA, USA (pp. 381–392).

University of Southern California, Signal and Image Processing Institute. (2001). The USC-SIPI Image Database.

Available: http://sipi.usc.edu/services/database/Database.html.

Vailaya, A., & Jain, A. K. (1998). Shape-based retrieval: a case study with trademark image databases. Pattern

Recognition, 31(9), 1369–1390.

Vailaya, A., Jain, A. K., & Zhang, H. J. (1998). On image classification: city images vs. landscapes. Pattern Recognition,

31(12), 1921–1935.

Wandell, B. A. (1995). Foundations of vision. Sunderland, MA: Sinauer Associates.

Wang, J. Z., Li, J., & Wiederhold, G. (2001). SIMPLIcity: Semantics-sensitive Integrated Matching for Picture

LIbraries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(9), 947–963.

Zhou, X. S., & Huang, T. S. (2001). Exploring the nature and variants of relevance feedback. In Proceedings of IEEE

Workshop on Content-Based Access of Image and Video Libraries, in conjunction with IEEE Conference on Computer

Vision and Pattern Recognition (CVPR’01), Hawaii, USA.


ftp://whitechapel.media.mit.edu/pub/VisTex/

http://sipi.usc.edu/services/database/Database.html

genetic algorithm-based relevance feedback for image retrieval using local similarity patterns

Documents