web image retrieval using majority-based ranking approachhklee.kaist.ac.kr/publications/multimedia...

Multimed Tools Appl (2006) 31: 195–219DOI 10.1007/s11042-006-0039-x

Web image retrieval using majority-basedranking approach

Gunhan Park · Yunju Baek · Heung-Kyu Lee

Published online: 30 September 2006© Springer Science + Business Media, LLC 2006

Abstract Web image retrieval has characteristics different from typical content-based image retrieval; web images have associated textual cues. However, a webimage retrieval system often yields undesirable results, because it uses limited textinformation such as surrounding text, URLs, and image filenames. In this paper,we propose a new approach to retrieval, which uses the image content of retrievedresults without relying on assistance from the user. Our basic hypothesis is that morepopular images have a higher probability of being the ones that the user wishes toretrieve. According to this hypothesis, we propose a retrieval approach that is basedon a majority of the images under consideration. We define four methods for findingthe visual features of majority of images; (1) majority-first method, (2) centroid-of-allmethod, (3) centroid-of-top K method, and (4) centroid-of-largest-cluster method.In addition, we implement a graph/picture classifier for improving the effectivenessof web image retrieval. We evaluate the retrieval effectiveness of both our methodsand conventional ones by using precision and recall graphs. Experimental resultsshow that the proposed methods are more effective than conventional keyword-based retrieval methods.

G. Park (B)Multimedia Search Team, R&D Center, NHN Corp., 6th Floor Chorim Bldg., 6–3 Sunae-dong,Bundang-gu, Seongnam City, Gyeonggi-do, 463–825, Republic of Koreae-mail: [email protected]

Y. BaekDivision of Computer Science and Engineering, Pusan National University, San-30,Jangjeon-dong, Keumjeong-gu, Busan, 609-735, Republic of Koreae-mail: [email protected]

H.-K. LeeDivision of Computer Science, Department of Electrical Engineering and Computer Science,Korea Advanced Institute of Science and Technology, 373-1 Kusung-Dong Yusong-Gu Taejon,305-701, Republic of Koreae-mail: [email protected]

196 Multimed Tools Appl (2006) 31: 195–219

Keywords Web image retrieval · Content-based image retrieval · Image clustering ·Image ranking · Graph/picture classifier

1 Introduction

The rapid growth of the web environment and advances in technology, have enabledus to access and manage huge numbers of images easily. We often need to retrieveimages effectively to perform assigned tasks and to make intelligent decisions.Various content-based image retrieval (CBIR) techniques have been developed inrecent years to distinguish useful images from extraneous ones, and to retrievedesired sets of images from image collections.

Web image retrieval differs from typical CBIR systems in some ways. In general,web images have related text annotations that can be obtained from the web pagesthat contain them. Conventional web image retrieval systems utilize the text informa-tion of the images, and work as text(keyword)-based retrieval systems. Existing webimage search engines, such as Google1, Excite2, Altavista3, and Naver4 allow usersto search for images via a keyword interface. The systems deal with over 10 millionimages, and they all use queries based on keywords because keyword-based retrievalmethods are easy to implement and fast to retrieve. However, the results yielded bykeyword-based image retrieval are often not those that are desired by the user, evenif they are ranked highly by the search engine, because filenames and URLs can bemisleading and surrounding text may not be informative [22].

In order to use image content in web image retrieval, methods based on CBIRhave been developed in the literature. Some systems provide an input interfaceso that the user can provide relevance feedback. Images determined by relevancefeedback are used to define an image that can be used in a further query basedon CBIR, which will result in the retrieval of images that better match the user’sintentions [9, 22]. However, these methods use image content via user assistance afterinitial keyword-based retrieval.

To improve the effectiveness of image retrieval without relying on assistance fromthe user, we propose a new retrieval method that uses the image content of retrievedresults. Our approach is based primarily on integrating the results of keyword-basedretrieval and image content by analyzing the retrieved results. First, the proposedapproach determines candidates using keywords, and then automatically re-ranksimages using visual features of the retrieved results.

The images resulting from the keyword-based search that are desirable tend tohave similar characteristics. We assume that the more popular an image is, thegreater is the probability that it will be among the ones that the user wishes toretrieve. Based on this hypothesis, we propose the ranking mechanism based on thevisual features of majority of images. We define four retrieval methods to find the

1http://www.google.com2http://www.excite.com3http://www.altavista.com4http://www.naver.com is one of the most popular search engines in Korea. This search engineretrieve the relevant images among over 10 million ones on the web.

http://www.google.com

http://www.excite.com

http://www.altavista.com

http://www.naver.com

Multimed Tools Appl (2006) 31: 195–219 197

visual features from retrieved results. Two methods use the clustering results, andthe other two methods use the average vector of the retrieved images as the visualfeatures of majority of images.

For examples, when we want to find “tiger” images in the web, we first searchimages based on previous keyword-based retrieval. Then, we compute the averagevalues, or centroid values of image features such as color histogram, GLCM (GrayLevel Co-occurrence Matrix), and edge direction histogram. Using the distancefrom this representative values, we re-rank images. By re-calculating the similaritybetween the query keyword and images, our approach can improve the retrievalprecision within top-N ranks.

The advantage of our approach is that we can use the other information aboutimages besides text information. In general, although there are many correct imagesin the results of text-based retrieval, they are ranked at low rank in many cases. Inthese cases, our approach improves the results according to our re-ranking methodbased on the visual features of majority of images. The weak point of our approach isthat our approach can not improve the retrieval effectiveness when a few correctimages are contained within top-N results. However, this is an open problem inCBIR, and we will continue to solve this problem completely.

In addition, for reducing the negative effects on finding the visual features ofmajority of images, we classify the special image groups. We design the graph/pictureclassifier using wavelet coefficients based on SVM (Support Vector Machine). Ithelps that we eliminate negative (biased) effects on finding correct answers inpractical applications. In experiments, we show that the proposed retrieval methodswith the classifier improve retrieval effectiveness more significantly, compared to theconventional one.

The paper is organized as follows. In Section 2, we summarize briefly related workon web image retrieval. Our approach is described in Section 3. We describe imagefeatures and clustering methods in Section 4. In Section 5, we present experimentalresults and discussions. Conclusions are stated in the last section.

2 Related work

In recent years, a significant amount of research has been carried out on CBIR sys-tems, most of which has concentrated on feature extraction of an image, e.g., QBIC[7], VisualSeek [24], SIMPLicity [33], Netra [16], and Blobworld [4]. Compared tothese systems, web image retrieval systems utilize different attributes, textual cues.Several systems have been developed for web image retrieval.

PictoSeek [10] indexes images collected from the web. First, the system usespure visual information, and then uses textual information to classify the images.A similar system, Webseek [24] makes classifications that rely, to some degree, onassistance from the user. The system makes categories, searches images within eachcategory, and provides category browsing and a search by example. Webseer [9]retrieves images using keywords and additional image information that gives the sizeand format of the image, and simple classification information (e.g., graph, portrait,computer generated graphics, close-up, number of faces etc.). The ImageRover [22]system allows the user to specify the image via keywords, and an example image viarelevance feedback. The ImageRover approach is most similar in spirit to that of


WebSeer. However, ImageRover differs in that it allows searches of web images thatare based directly on image content. ImageRover also proposed a method combiningtextual and visual cues using LSI (Latent Semantic Indexing). Mukherjea and Cho[18] proposed a method for determining the semantics of images on the web usingan approach similar to that of ImageRover. They used web page analysis to assignkeywords to images based on HTML pages and to determine similar images basedon the assigned text. Mukherjea et al. [17] also developed a result visualizationenvironment that allows the user to organize search results using text and imageclustering (user friendly visualization).

We can summarize the basic mechanism of previous web image retrieval researchas follows: 1) the system retrieves the images using keywords or simple informationabout an image (excluding image content, such as color, texture, and shape). 2) thesystem provides an interface that allows the selection of relevant images from thefirst set of retrieved results (relevance feedback mechanism). 3) the system retrievesthe images using selected images or/and keywords. 4) the system refines the resultsby repeating steps 2 and 3.

As shown by previous research, keywords may help to guide the search, andconstitute important data for effective web image retrieval. Unfortunately, keywordsmay not describe image content accurately or completely. Information about imagecontent taken directly from the image must also be added to retrieval processing.Hence, in this paper, we propose a new approach that improves the effectiveness ofimage retrieval by using visual features based on the visual features of majority ofimages. Our approach differs from previous ones in that we concentrate on usingimage content automatically, without user assistance and pre-defined categorization.In the next section, we describe the image features and clustering methods that areused in our paper.

3 Majority-based web image ranking approach

3.1 Problems of keyword-based retrieval

In this section, we describe a new approach to web image retrieval that uses imagecontents. We can occasionally find the irrelevant images in the top ranks fromkeyword-based retrieval, because the ranking function is just based on the existenceof keywords. For example, when we want to find “car”, or “tiger” images in thegoogle system, we get the wrong images in the very high ranks as shown in figure 1.Our approach is implementation of the idea to eliminate these images in the highranks. Or, in case of there are many correct images within some scope (200 in thisresearch), ranks of keyword-based retrieval are not appropriate for correct answers.Our approach alleviates this problem, and improves the retrieval effectiveness byre-ordering the images using image features.

3.2 Basic concept of our approach

Our approach differs from relevance feedback. The difference from the previousrelevance feedback mechanism is that we re-rank the results without assistance fromthe user (i.e. our approach is to select the proper visual features automatically).


Fig. 1 Undesirable images in high ranks

As shown in figure 2, our approach is applied after keyword-based retrieval, andrelevance feedback mechanism can be applied after our method is applied.

For our approach, we can classify the images into two categories. One is a groupof images that are related to a query keyword, and the other is a group of imagesthat are not related to a query keyword. As shown in figure 3, the images relatedto a query keyword tend to have very similar characteristics between each other,while the wrong images have different characteristics. Based on this property, we canassume the relevant images to the query have distinguished features from irrelevantimages. According to this idea, our approach uses the visual features of majority ofimages to determine the ranks of images after keyword-based retrieval. For findingthe visual features of majority of images, we define two different concepts; one isusing of clustering (groups of similar features), the other is using of average of imagefeatures (computational average values of image features).

Figure 4 shows the architecture and scope of our approach. In our approach,we first retrieve the images from the keyword-based retrieval that is commonlyused by web image retrieval systems; next, we classify the graph/picture images foreliminating the biased images (features); then the results are re-ranked with theproposed ranking methods. The proposed ranking methods are based on computingthe visual features of majority of images. To determine the visual features of majorityof images, we define two methods (method 1, and 4 in figure 4) based on clusteranalysis, and two methods (method 2, and 3 in figure 4) based on an average ofimage features. In the next section, we will explain the ranking methods and theconstruction of the graph/picture classifier.

Keyword-based Retrieval

Relevance feedback(Query image selection)

General content-basedimage retrieval

Our approach

Fig. 2 Our approach is related to re-ranking mechanism of retrieved results from keyword-basedretrieval


Returned images from keyword-based retrieval

Same Concept : similarattributes betweeneach other

Different Concept :different attributesbetween each other

Tendency

Fig. 3 Tendency of returned images

3.3 Retrieval methods

The basic idea underlying the design of our system is that we identify the charac-teristics of relevant images to a query. We analyzed the results of keyword-basedretrieval, and found that relevant images have similar attributes; while irrelevantimages have divergent attributes. Our basic hypothesis is that more popular imagesin the web have a higher probability of being those that the user wishes to retrieve.Based on this hypothesis, we propose four methods that use image content forranking. In brief, the proposed retrieval approach is as follows: we analyze the top-Nretrieval results, classify the graph/picture images, and then re-rank images according

Web image DB(Images, HTML documents) Meta-data DBKeywords

extraction

Keyword

User query inputImage retrieval based on text DB

Retrieved resultsImage contents

analysis(graph/picture

classifier)

New Ranking Results

Top-N images

Scope of our approach

HACM

Generating the Query feature vector

Method 1Method 2,3

Web image DB(Images, HTML documents) Meta-data DBKeywords

extraction

Keyword

User query inpuImage retrieval based on text DB

Retrieved resultsImage contents

analysis(graph/picture

classifier)

New Ranking Results

Top-N images

Scope of our approach

HACM

Generating the Query feature vector

Method 1Method 2,3

Method 4

Fig. 4 Overview of the proposed approach


to the proposed retrieval methods based on the visual features of majority of images.In our methods, we define the centroid C(AI) of image set AI as :

C(AI) =∑

v∈AI�v

|AI | (1)

where |AI | is a size of AI and �v is a feature vector of an image.The retrieval methods that are proposed in this paper are as follows. As mentioned

above, two methods (method 1, and 4) are based on clustering, and two methods(method 2, and 3) are based on an average value. S(I, I′) (Eq. 10) is a similaritybetween two image vectors.

• Method 1 (majority-first method): This method uses the clustering propertyof retrieved images. For this method, we partition the retrieved images usingHACM, and then order the clusters according to cluster size. In other words, thelargest cluster is ranked first, and the sequence of clusters is determined by thedecreasing order of the size of the clusters. After determining the cluster order,we rank the images within a cluster by the distance from an image to a centroidof the cluster.The similarity between an image I and the query keyword(s) K is defined asfollows,

S′(I, K) = RI ∗ w + S(I, C(AI)) (2)

where RI is the rank of a cluster that contains an image I, w is a constant valuethat satisfy w > MAX{S(I, I′)} (e.g., w = 1000), and AI is a image set of a clusterthat contains an image I.

• Method 2 (centroid-of-all method): This method uses the centroid of the wholeimages of the retrieved results by keywords. We calculate the centroid as beingthe average of retrieved images (Eq. (1)). Using this centroid as a query vector (afeature vector of a query image), the system is turned into conventional CBIR;the system re-ranks the images using a similarity to this feature vector. Thesimilarity between an image I and the query keyword(s) K is defined as follows,

S′(I, K) = S(I, C(Atop−all)) (3)

where Atop−all is a image set of whole retrieved images. In this paper, we fix thenumber of images to 200. Therefore, Atop−all is identical to Atop−200.

• Method 3 (centroid-of-top-K method): This method uses the centroid of theK top-ranked images. Since there are many undesirable images in retrievedresults, we only select some of the top ranked images. We used 20 for K in theexperiments. As in method 2, the centroid is used as a query vector to re-rank theresults.The similarity between an image I and the query keyword(s) K is definedas follows,

S′(I, K) = S(I, C(Atop−20)) (4)

where Atop−20 is a image set of top 20 images.• Method 4 (centroid-of-largest-cluster method): In the fourth method, we use the

centroid of the largest cluster as a query vector for image searching. We assumethat the original rank is not important in this method, in contrast to methods 2


and 3. The similarity between an image I and the query keyword(s) K is definedas follows,

S′(I, K) = S(I, C(Alargest−cluster)) (5)

where Alargest−cluster is a image set of the largest cluster.

We define new similarity measure S′(I.K) is a similarity between image I andkeyword K. As mentioned above, S′(I.K) is based on the visual features of majorityof images without user assistance during retrieval. The basic philosophy of thecombination of visual and textual features is that we use the additional evidence(visual features) to determine the ranking of images. In previous approach (such asgoogle.com), the systems just use the keyword-matching for image retrieval.

3.4 Classification of graph/picture images

In the web environment, there are many different types of image, some of whichhave distinct characteristics, such as icons, simple drawings, and simple graphs. Theclassification of images is essential for their appropriate interpretation. Recently, inCBIR systems, various classification methods have been proposed: city-landscap [29],indoor-outdoor [27], and graph-picture [15]. In particular, graph/picture classificationis important in web applications because graph images have characteristics distinctfrom natural images, but are quite common in the web environment [1, 12, 33].In many cases, graphs are not images that the user wishes to retrieve. Hence, theclassification of graph/picture is of great practical importance for web image retrieval.

In this paper, we define “graph” and “picture” as follows: “graph” designatesimages with a large background and small meaningful information (e.g. lines,characters, etc.), “picture” designates continuous-tone images (e.g. nature, people,object, non-drawings, etc.). Image classifications can be used to further improvethe effectiveness of retrieval, because users typically wish to retrieve images ofa particular type when they search. It is a simple classification approach, but webelieve that it is valuable for finding those images that users wish to retrieve. In ouralgorithm, we apply this classifier after retrieving images based on keywords as shownin figure 4. This classifier prevents that a set of graph images can be a largest cluster,and affect the average value of image features.

In this paper, we design and use a simple and novel scheme for classifying graphsand pictures using SVMs (Support Vector Machines) based on wavelet coefficients.We tested the classifier using 2200 images, and achieved over 97.5% accuracy.The detailed implementation and performance of the classifier will be described inSection 5.2.

4 Image features and clustering methods

Most of the research on CBIR has focused on the techniques of representing imageswith visual features. A variety of image features, such as color, shape and texture,have been used in the literature. In a typical image retrieval model, an image is repre-sented as a vector in a n-dimensional vector space as follows; I = ( f1, f2, f3, · · · , fn),where fi is an element of the image feature vector.


Fig. 5 Example of color histogram feature

In this paper, we use features of images as follows: a color histogram, GLCMfeatures, and an edge direction histogram. Each feature represents one type ofimage characteristic. The color histogram represents color attributes [26], GLCMrepresents the property of gray values [2], and the edge direction histogram presentsthe shape of edges [13]. These features complement each other in that, takentogether, they compensate for each other’s weak points. Also in this section, wedescribe clustering methods and a method for integrating and normalizing thesefeatures.

θ d

d θ

1 2 3 4 5

0

13

Divided by the number of pixels(for relative frequencies for normalization with regards to size of images)

Compute five features for each matrix

Original image Gray image

Matrix

Fig. 6 Example of GLCM feature


Table 1 Five features computed from the GLCM

Features Equations

Energy E(d, θ) = �Ki=0�

Kj=0[P(i, j; d, θ)]2

Entropy H(d, θ) = �Ki=0�

Kj=0 P(i, j; d, θ) log P(i, j; d, θ)

Correlation C(d, θ) = �Ki=0�K

j=0(i−μx)( j−μy)P(i, j;d,θ)

σxσy

Inertia(Contrast) I(d, θ) = �Ki=0�

Kj=0(i − j)2 P(i, j; d, θ)

Local Homogeneity L(d, θ) = �Ki=0�

Kj=0

11+(i− j)2 P(i, j; d, θ)

where μx, μy are averages, and σx, σy are standard deviations of x, y respectively. K is the number ofgray levels.

4.1 Color histogram

Color is an important attribute for describing image content. A color histogram,which represents the proportion of specific colors in an image, has been widelyused among color representation methods. It has been found that the use of a colorhistogram in CBIR provides reasonable retrieval effectiveness when we use theHSV (Hue, Saturation, Value) color space, 128 quantization level, and the histogramintersection as a similarity function. The HSV color model is the most frequentlyused model for CBIR, because it presents human perception well. The histogramintersection is calculated as follows; H(I, I′) = ∑n

i=1 min( fi, f ′i )/(

∑ni=1 f ′

i ). If twoimages have the same size, the histogram intersection is equivalent to the use of thesum of absolute differences or city-block metric [26]. In this paper, we use the city-block metric for similarity computation. City-block distance is defined as follows:

Dcolor(I, I′) =n∑

i=1

| fi − f ′i | (6)

Original image Edges of an image Edge direction histogram

Fig. 7 Example of EDH feature


The example of color histogram is shown at figure 5.

4.2 Gray level co-occurrence matrix features

The textural features of an image are represented by the property of gray values.Texture has different characteristics from color, so an integration of color andtextural features result in more effective image retrieval than does using color alone.We use GLCM (Gray Level Co-occurrence Matrix) features, which is a simple andeffective method for representing texture [11]. The GLCM represents the directionof, and distance between, gray values of an image.

The GLCM can be specified by a matrix of the relative frequencies Pij with whichtwo neighboring pixels separated by distance d occur on the image, one with graylevel i and the other with gray level j [11]. In other words, each entry (i, j) in GLCMcorresponds to the number of occurrences of the pair of gray levels i and j which area distance d in the image [21]. The GLCM P(i. j; d, θ) is defined as follows;

P(i, j; d, θ) = #{[(k, l), (m, n)] | , θ = acos((k − m)/d), 0 ≤ θ ≤ 180◦,

d =√

(k − m)2 + (l − n)2, I(k, l) = i, I(m, n) = j} (7)

where # denotes the number of elements in the set. The value of Eq. (7) dependson the size of the image. To normalize this value, the P(i, j; d, θ) is divided by thenumber of pixels under a given d, and θ . The example of GLCM computation isshown at figure 6.

In order to use the information contained in the GLCM, we use the five featuresthat are used most frequently in the literature: energy, entropy, correlation, inertia,and local homogeneity [2, 5]. Definitions of these features are given in Table 1.

To simplify and reduce the dimension of a feature vector, we quantize the graylevels into 16 steps (textural properties are preserved by this operation), then wecompute the five features when d=1,2,3,4,5 and θ = 0◦, 45◦, 90◦, 135◦. Finally, we getthe 100-dimensional feature vector. To compute the similarity of these features, wecalculate the sum of absolute differences (Eq. (6)).

4.3 Edge direction histogram

We define the attributes of the edges of shapes in an image by means of an edgedirection histogram [13, 29]. This method differs from other methods of shaperepresentation and has the advantage that we do not need to segment the image. Forthe edge direction histogram, 73 bins are defined to represent the edges of shapes inan image; the first 72 bins are used to represent edge directions and the last bin isused to represent non-edge pixels. To compensate for different images sizes, we usethe normalization method as follows [29]:

H(i) = H0(i)ne

, i ∈ [0, ..., 71] ; H0(72) = H(72)

np(8)

where H0(i) is the count in bin i, ne is the total number of edge points, and np is thetotal number of pixels in the image.


Table 2 Lance-Williams parameters

HACM α β

Group average link mi/(mi + m j) 0Ward’s method (mi + mk)/(mi + m j + mk) −mk/(mi + m j + mk)

We use the Canny edge detector [20] to extract the edges from an image. Tocompute the similarity of these features, we calculate the sum of absolute differences(Eq. (6)). The example of edge direction histogram is shown at figure 7.

4.4 Normalization and integration of features

Since different features can generate different ranges of values of similarity, a nor-malization method should be applied to each similarity computation. We normalizeeach similarity by the Min/Max normalization (linear scaling) method as follows:

N(I, I′) = D(I, I′) − min(D(I, I′))max(D(I, I′)) − min(D(I, I′))

(9)

After normalizing similarity, the total similarity between the query and the imagein the data collection is calculated via a weighted sum of the similarities providedby each of the features. The equation of combining the similarities is defined asfollows:

S(I, I′) = ω1 Ncolor(I, I′) + ω2 Nglcm(I, I′) + ω3 Nedge(I, I′) (10)

where S(I, I′) is a weighted sum of similarities, Ncolor(I, I′) is a normalized similarityof the color histogram, Nglcm(I, I′) is for the GLCM features, and Nedge(I, I′) is forthe edge direction histogram. The ω1, ω2 and ω3 are weighting factors to adjust for therelative importance of image features. We choose ω1 = 1.0, ω2 = 0.2, and ω3 = 0.2for our experiments in this paper. Combining features with these weights is moreeffective than using each feature alone.

Image DB(graph/picture)

HAAR WaveletLH coefficients

vectorsSVM training

Image HAAR WaveletLH coefficients

vectorsSVM classifier

graph

picture

Fig. 8 Graph/picture classifier


Table 3 Graph/picture classifier performance

Accuracy Precision/Recall # of SVs Computation Time

Training Set 97.75% 98.5/98.8 % 104 0.011 secTest Set 97.8% 100/97.8 % 104 0.01 sec

4.5 Clustering methods

Clustering techniques for improving retrieval effectiveness have been proposed inthe literature on information retrieval [14, 28], and also proposed in CBIR [6, 19].We use clustering methods to group the images and to select representative imagefeatures for our approach (method 1, and 4).

Hierarchical agglomerative clustering methods (HACM) are known to be effec-tive in information retrieval applications. The cluster structure resulting from HACMis made as a tree structure : dendrogram. There are several methods for making a

Fig. 9 Comparison of retrieval effectiveness : Precision/Recall graphs


Fig. 10 Comparison of retrieval effectiveness : Precision/Recall graphs

tree structure for the HACM. We use group average link, and Ward’s method. Thecomparative efficacy of each method is well documented in the literature [31, 32, 34].

There are three approaches to the implementation of the general HACM. Amongthem, we use the stored matrix approach. We first calculate a N by N matrixcontaining all pairwise dissimilarity values using a similarity function (Eq. (10)). TheLance-Williams update formula makes it possible to re-calculate the dissimilaritybetween cluster centers by using only the stored values. Eq. (11) shows the modifiedupdate formula for two clustering methods, and Table 2 shows its parameters [8].

dCi, jCk = αidCiCk + α jdC jCk + βdCiC j (11)

where dCiC j is the distance between cluster Ci and C j.A general algorithm for constructing the image clusters is as follows,

1. Calculate the similarities between the query image and the N-top initiallyretrieved images.

2. Make the similarity matrix (N by N) for N images.3. Identify the two closest images and combine them in a cluster.


Fig. 11 Comparison of retrieval effectiveness: different grouping methods

4. Update the similarity matrix using Eq. (11).5. If more than one cluster remain, return to step (3).

5 Experimental results

5.1 Experimental environments : test collections

We conducted experiments using images retrieved from Google1, and Naver4 viakeyword queries (e.g. tiger, car, and sea). We gathered the top 200 images from re-sults of a search engine for experiments, and evaluated the effectiveness in comparingthe precision and recall values. Precision and recall are calculated as follows:

Precision = number of relevant retrieved imagestotal number of retrieved images

Recall = number of relevant retrieved imagesnumber of all relevant images

(12)

Table 4 The number ofclusters for each threshold Threshold

Data Set 1.6 1.7 1.8 1.9 2.0 2.1 2.2

Tiger (Naver) 39 33 25 16 14 9 6Car (Naver) 49 37 26 20 12 9 4Sea (Naver) 45 37 23 15 11 7 3Soccer (Naver) 46 33 25 15 8 7 5Forest (Naver) 47 36 29 23 16 11 6Sunrising (Naver) 35 25 17 13 6 5 2Tiger (Google) 53 37 30 23 16 12 9Car (Google) 49 38 26 19 14 11 7


Fig. 12 Retrieval effectivenessaccording to clusterthresholds: Query “tiger”(data from Google)

For the evaluation, we marked relevant and irrelevant images manually, for thetop 200 images. Naver basically uses text annotations for image retrieval in a webimage album service that is maintained by user input, and Google analyzes and usesthe text on the page adjacent to the image.5 Acquired images for our experimentsare subject to change, but the tendency of results is similar to our experimental data.

5.2 Implementation and performance of the classifier

To classify images in the web, we define the graph/picture classifier. We describethe results of the classifier and the detailed implementation method in this section.We used images from the web for training the classifier (1000 images for picture,200 images for graph), and we examined the performance of the classifier over 1000picture images.

The block diagram of our graph/picture classification method is shown in figure 8a.First, we calculate the wavelet coefficients using a Haar transform [25], and then trainthe classifier with a wavelet coefficients histogram (especially using LH band).

Wavelet coefficients in high frequency bands of natural images tend to a particulardistribution (e.g. Laplacian) [15], and this yields important information for classifyinggraph/picture images. A typical distribution of wavelet coefficients is shown infigure 8b. As shown in figure 8b., the characteristics of the histogram are verydifferent. We can design the classifier using this difference of vectors. We computethe histogram by shifting the values of the coefficients because they have negativevalues.

For the classifier, we use SVMs, which show significant improvement over othermachine-learning algorithms in various classification areas [3]. First, we define thetraining data that belongs to either a positive (picture) or negative (graph) class. Thedetailed derivation may be found in [3, 30]. We used linear and non-linear SVMs

5http://www.google.com/help/faq_images.html

http://www.google.com/help/faq_images.html


Fig. 13 The retrieval results using “tiger” (data from Naver)

with a polynomial kernel (with d=2). We show only the results of the linear SVM,because the results are identical in the two cases. Table 3 shows the correction ratioand computation time. The results indicate that our method has a high recognition


Fig. 14 The retrieved results using “tiger” (data from Google)

ratio (over 97.5%) and small computation time (0.01 sec for 1000 images) comparedto the previous method [15]. The advantage of this classifier is that the influenceof biased facts to the image features can be reduced (e.g., graph images have very


Fig. 15 The retrieved results using “car” (data from Google)

large white areas in the image), and we can also find very dark or blight images withabnormal lights. This classification and elimination of images is very important forpractical applications on the Web such as an icon elimination procedure.


Fig. 16 The retrieved results using “sea” (data from Naver)

5.3 Retrieval results

The goal of the experiment is to evaluate the retrieval effectiveness of the proposedmethods. The results of the experiments are shown in figures 9, and 10. The


results show precision/recall graphs for the initial method (keyword only retrieval)and our four methods: majority-first, centroid-of-largest-cluster, centroid-of-all, andcentroid-of-top-20. In the case of the two methods based on HACM, results usinggroup average link clustering are reported as more effective one.

Figure 9 and Figure 10 show that the proposed methods improve the retrievaleffectiveness, compared to keyword-based retrieval with different types of queries.The centroid-of-largest-cluster and majority-first methods are the most useful amongthe four suggested methods. This conclusion is based on the fact that other methodscontain more wrong images in calculating the centroid. Thus, if done properly,clustering helps to extract separate clusters which represent different visual classesof images. In order to evaluate the effects of each cluster, we have experimented thedifferent grouping methods (such as a centroid-of-top-m-clusters (m ≤ 3)). Figure 11shows the retrieval effectiveness of several methods: Centroid-of-2nd-largest-clustermethod, Centroid-of-1st+2nd-largest-cluster method, Centroid-of-3rd-largest-clustermethod, and Centroid-of-1st+2nd+3rd-largest-cluster method .

As shown in figure 11, the centroid-of-largest-cluster method performs the bestin our experiments. The 1st+2nd-, and 1st+2nd+3rd-largest-cluster method showthe good results relatively. Accordingly we know that the largest cluster representsthe characteristics of images. Meanwhile, the centroid-of-2nd-, or centroid-of-3rd-largest-cluster method did not improve the retrieval effectiveness significantly. Insome cases, the methods show the worse results than keyword only method. There-fore, as described in this paper, the method using the popularity of images shows thebetter performance compared to the other methods.

The retrieval examples of the initial method and our methods are shown infigures 13, 14, 15, and 16. We used a centroid-of-largest-cluster method and groupaverage link clustering for these examples. It is clear that images relevant to thekeyword query are ranked higher in the proposed methods than in the initial method.

It should be noticed that the the proposed method is significantly more effectivethan the initial method. As shown in the results of experiments, when the top 200results hold well-distinguished images, the retrieval effectiveness peaks the highestthan ever (figure 9a); compare to this, as for the poor-distinguished image in results,the effectiveness does not improve much (figure 10b) relative to the initial method.

This algorithm requires additional computation time for constructing the clusters,but the additional computational time is not critical (0.08 second for clustering, 0.02second for ordering) because it performs on a small number (200) of images. Weperformed the experiments based on Red Hat Linux 7.2 and a Pentium III 800 MHzsystem.

As shown in section 4.5, we used HACM for clustering in our experiments. Inthe HACM, we can define the threshold to determine the dividing the clusters.The Table 4 shows the threshold versus the number of clusters. For example, if wedetermine the threshold to 1.9, the numbers of clusters are as follows: 16, 20, 15, 15,23, 13, 23, 19 for each data set, respectively.

The figure 12 shows the retrieval effectiveness according to cluster thresholds. Asshown in figure, the retrieval effectiveness is related to the threshold of clusters, andwe get the reasonable threshold via experiments.

The advantages of our approach are that we propose the approach solves theproblem, which there are many wrong images in the top-ranks although there aremany correct images in the results. However, our approach has limitation when the


results do not have similar attributes in the results, or very few correct images inthe results. For example, if you want to find “John Jacobson” images, you find manywrong images in the top-ranks via keyword-based retrieval. Our approach also cannot find correct images in the top-ranks, because the results of first retrieval did notshow the majority property of images. This is an open problem in CBIR. For solvingthis limitation, we will develop the other approach with user interaction (e.g., usercan select the correct group of intention, or other relevance feedback mechanisms)

6 Conclusions

In this paper, we have proposed a new approach for web image retrieval, usingmajority-based ranking methods. This approach has the advantage that it can useimage content in determining the rank of web images. We first determined candidatesusing keywords, and then automatically re-ranked images based on a majority ofthe images. We defined four methods for finding the visual features of majority ofimages. We also designed and implemented a graph/picture classifier to detect simpledrawings, graphs, tables, and text images. We compared the four proposed methodsto a keyword-based retrieval method in our experiments. Experimental results showthat the proposed approach, especially with the centroid-of-largest-cluster method, ismore effective than the initial method, which uses only textual information. Since ourapproach can use additional information gained from retrieved images, we believethat the majority-based ranking approach will be more effective in enhancing theeffectiveness of image retrieval than general keyword-based retrieval methods. Infuture work, we intend to develop other classification techniques, which will improvethe proposed approach by grouping images semantically.

Acknowledgments This work is financially supported by the Ministry of Education and HumanResources Development (MOE), the Ministry of Commerce, Industry and Energy (MOCIE) andthe Ministry of Labor (MOLAB) through the fostering project of the Lab of Excellency. And alsoit is supported by the Korea Science and Engineering Foundation (KOSEF) through the AdvancedInformation Technology Research Center (AITrc).

References

1. Athitsos V, Swain MJ, Frankel C (1997) Distinguishing photographs and graphics on the worldwide web. In: Proceedings of IEEE workshop on content-based access of image and videolibraries, pp 10-16

2. Ballard DH, Brown CM (1982) Computer Vision. Prentice-Hall Inc., pp 181-1893. Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Mining

and Knowledge Discovery 2(2):121-1674. Carson C, Belongie S, Greenspan H., Malik J (2002) Blobworld: Image segmentation using

expectation-maximization and its application to image querying. IEEE Trans. Pattern Anal.Mach. Intell. 24(8):1026-1038

5. Celebi E, Alpkocak A (2000) Clustering of texture features for content-based image retrieval. In:Proceedings of International Conference on Advances in Information Systems(ADVIS2000), pp216-225

6. Chen Y, Wang JZ, Krovetz R (2003 November) Content-Based Image Retrieval by Clustering.5th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp 193-200


7. Faloutsos C, Barber R, Flickner M, Hafner J, Niblack W, Petkovic D, Equitz W (1994) Efficientand effective querying by image content. Journal of Intelligent Information Systems 3(3/4):231-262

8. Frakes WB, Baeza-Yates R (1992) Information Retrieval: Data Structures & Algorithms.Prentice Hall, pp 419 - 443

9. Frankel C, Swain MJ, Athitsos V (1996) WebSeer: An image search engine for the world wideweb. University of Chicago Technical Report TR-96-14

10. Gevers T, Smeulders AWM (1997) Pictoseek: A content-based image search engine for theWWW. In: Proceedings of International Conference On Visual Information Systems, pp 93-100

11. Haralick RM, Shapiro LG (1992) Computer and Robot Vision. Addison-Wesley publishingcompany, pp 453-470

12. Hartmann A, Lienhart R (2002) Automatic classification of images on the web. In: Proceedingsof Storage and Retrieval for Media Databases 2002, Vol. SPIE 4676, pp 31-40

13. Jain AK, Vailaya A (1996) Image retrieval using color and shape. Pattern Recognition29(8):1233-1244

14. Lee KS, Park YC, Choi KS (2001) Re-ranking model based on document clusters. Inf. Process.Manag. 37(1):1-14

15. Li J, Gray RM (2000) Context-based multiscale classification of document images using waveletcoefficient distributions. IEEE Trans. Image Process. 9(9):1604-1616

16. Ma WY, Manjunath BS (1999) NeTra: A toolbox for navigating large image databases. ACMMultimedia Syst. 7(3):184-198

17. Mukherjea S, Hirata K, Hara Y (1998) Using clustering and visualization for refining the resultsof a WWW image search engine. In: Proceedings of Workshop on New Paradigms in InformationVisualization and Manipulation (NPIV 1998), pp 29-35

18. Mukherjea S, Cho J (1999) Automatically determining semantics for world wide web multimediainformation retrieval. J. Vis. Lang. Comput. 10(6):585-606

19. Park G, Baek Y, Lee HK (2002) A ranking algorithm using dynamic clustering for content-basedimage retrieval. In: Proceeding of the Challenge of Image and Video Retrieval(CIVR2002):International Conference on Image and Video Retrieval, pp 316-324

20. Parker JR (1997) Algorithms for Image Processing and Computer Vision. John Wiley & Sons,Inc., pp 23-53

21. Partio M, Cramariuc B, Gabbouj M, Visa A (2002, October 4-7) Rock texture retrieval usinggray level co-occurrence matrix. In: Proceeding of the 5th Nordic Signal Processing Symposium(NORSIG 2002), Norway

22. Sclaroff S, la Cascia M, Sethi S, Taycher L (1999) Unifying textual and visual cues for content-based image retrieval on the world wide web. Comput. Vis. Image Underst. 75(1/2):86-98

23. Serrano N, Savakis A, Luo J (2002) A computationally efficient approach to indoor/outdoorscene classification. In: Proceedings of International Conference on Pattern Recognition(ICPR’02), volume 4, pp 146-149

24. Smith JR (1997) Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Com-pression. Doctoral Dissertations, Columbia University

25. Stollnitz EJ, DeRose TD, Salesin DH (1995) Wavelets for computer graphics: A primer, part 1.IEEE Comput. Graph. Appl. 15(3):76-84

26. Swain MJ, Ballard DH (1991) Color indexing. Int. J. Comput. Vis. 7(1):11-3227. Szummer M, Picard RW (1998) Indoor-outdoor image classification. In: Proceedings of IEEE

International Workshop on Content-Based Access of Image and Video Databases, pp 42-5128. Tombros A, Villa R, Van Rijsbergen CJ (2002) The effectiveness of query-specific hierarchic

clustering in information retrieval. Inf. Process. Manag. 38(4):559-58229. Vailaya A, Jain AK, Zhang H-J (1998) On image classification: city images vs. landscapes. Pattern

Recogn. 31(12):1921-193630. Vapnik VN (1999) The Nature of Statistical Learning Theory. Springer-Verlag, pp 123 - 16931. Voorhees EM (1985) The cluster hypothesis revisited. In: Proceedings of 8th ACM SIGIR

International Conference on Research and Development in Information Retrieval, pp 188-19632. Voorhees EM (1986) Implementation agglomerative hierarchic clustering algorithms for use

in document retrieval. Technical Report TR 86-765 of the Department of Computing Science,Cornell University

33. Wang JZ, Li J, Wiederhold G (2001) SIMPLIcity: Semantics-sensitive Integrated Matching forPicture Libraries. IEEE Trans. Pattern Anal. Machine Intel. 23(9):947-963

34. Willett P (1988) Recent trends in hierarchic document clustering: A critical review. Inf. Process.Manag. 24(5):577-587


Gunhan Park received his B.S., M.S., and Ph.D. degrees in Computer Science from the KoreaAdvanced Institute of Science and Technology (KAIST), Korea, in 1994, 1996, and 2004 respec-tively. From 2001 to 2004, he was a research engineer at Search Solution Co. where he worked ondeveloping a content-based image retrieval engine. He was also a senior engineer at Digital MediaR&D Center, Samsung Electronics Co. from 2004 to July 2006. Since August 2006, he has beenleading the multimedia search team in NHN Co. His current research interests include multimediainformation retrieval, computer vision, and image processing for improving web image/music/videoretrieval performance.

Yunju Baek received his B.S., M.S., and Ph.D. degrees in computer science from the KoreaAdvanced Institute of Science and Technology (KAIST), Korea, in 1990, 1992 and 1997 respec-tively. From 1999 to 2002 he was the Chief Technology Officer of NHN Co., a major internetportal company in Korea. Since 2003 he has been an assistant professor in the Department ofComputer Science and Engineering, Pusan National University. His research interests includemultimedia information retrieval, internet multimedia application, and real-time embeddedcomputing.


Heung-Kyu Lee received the B.S. degree in electronics engineering from the Seoul NationalUniversity, Seoul, Korea, in 1978, and M.S., Ph.D. Degrees in Computer Science from theKorea Advanced Institute of Science and Technology (KAIST), in 1981 and 1984, respectively.Since 1986 he has been a professor of the Deparatment of Computer Science, KAIST, Korea([email protected]).He is an author/coauthor of more than 100 international journal and conference papers. Hehas been a reviewer of many international journals, J. of Electronic Imaging, Real-Time Imaging,IEEE Trans. on Circuits and Sytems for Video Technology, etc. He was also program chairmansof many international conferences including the Int. Workshop on Digital Watermarking (IWDW)in 2004, IEEE Int. conf. on Real-Time Systems and Appications, etc. He is now a generaldirector of Center of Fusion Technology for Security (CFTS). His major interest are digitalwatermarking/fingerprinting, media forensics, and steganography.

web image retrieval using majority-based ranking approachhklee.kaist.ac.kr/publications/multimedia...

Documents