a kernel-based framework for image collection exploration

15
A kernel-based framework for image collection exploration Jorge E. Camargo, Juan C. Caicedo, Fabio A. Gonzalez n,1 Departamento de Ingenierı ´a de Sistemas e Industrial, Universidad Nacional de Colombia, Bogota ´ Of. 114 Edif. 453 (Aulas de Ingenierı ´a), Colombia article info Article history: Received 18 August 2011 Received in revised form 30 August 2012 Accepted 29 October 2012 This paper has been recommended for acceptance by S.-K. Chang Available online 16 November 2012 Keywords: Image collection exploration Exploratory search Summarization Visualization Kernel methods abstract While search engines have been a successful tool to search text information, image search systems still face challenges. The keyword-based query paradigm used to search in image collection systems, which has been successful in text retrieval, may not be useful in scenarios where the user does not have the precise way to express a visual query. Image collection exploration is a new paradigm where users interact with the image collection to discover useful and relevant pictures. This paper proposes a frame- work for the construction of an image collection exploration system based on kernel methods, which offers a mathematically strong basis to address each stage of an image collection exploration system: image representation, summarization, visualization and interaction. In particular, our approach emphasizes a semantic representation of images using kernel functions, which can be seamlessly harnessed across all system compo- nents. Experiments were conducted with real users to verify the effectiveness and efficiency of the proposed strategy. & 2012 Elsevier Ltd. All rights reserved. 1. Introduction Image collections are growing very fast due to the easy access to low-cost digital cameras and other image capturing devices. For instance, Flickr, a photo sharing system on the web, receives about 5000 new photos per minute. Finding useful information in these large image collections is a very difficult task. Current approaches for image search are based on keywords or image examples provided by users [1], requiring a clear goal during the search process. These systems are useless when all what is required is to understand the structure of an image collection, i.e., to understand what kind of images are available in the collection and what are the relationships between them. For instance, in a personal photo collection that has been built for years, someone would like to explore the contents of the entire collection to identify meaningful patterns, special events, remember occasions and re-discover images that were archived a long time ago. This task becomes challenging when the collection comprises thousands of images that are shared between multiple users, such as in a scientific research community, training in medical imaging, and even in social networks and web communities. A system to explore and under- stand an image collection could rise the awareness of all the useful material archived in these repositories. Building an exploration system involves several tech- nical problems to process and organize visual informa- tion, following a strategy for image collection exploration. We define an image collection exploration system as the composition of four main components: content represen- tation, summarization, visualization and interaction. Content Representation is the ability of the system to encode and recognize visual configurations in images. This is usually done using features that discriminate different images and match similar ones. Summarization consists in selecting an iconic portion of the repository that represents a set of images to build an overview that allows the user to see example images of potential Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/jvlc Journal of Visual Languages and Computing 1045-926X/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jvlc.2012.10.008 n Corresponding author. Tel.: þ571 3165000. E-mail address: [email protected] (F.A. Gonzalez). 1 MindLab-Bioingenium Research Group, Universidad Nacional de Colombia, Bogota ´ , Colombia. Journal of Visual Languages and Computing 24 (2013) 53–67

Upload: fabio-a

Post on 23-Dec-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A kernel-based framework for image collection exploration

Contents lists available at SciVerse ScienceDirect

Journal of Visual Languages and Computing

Journal of Visual Languages and Computing 24 (2013) 53–67

1045-92

http://d

n Corr

E-m1 M

Colomb

journal homepage: www.elsevier.com/locate/jvlc

A kernel-based framework for image collection exploration

Jorge E. Camargo, Juan C. Caicedo, Fabio A. Gonzalez n,1

Departamento de Ingenierıa de Sistemas e Industrial, Universidad Nacional de Colombia, Bogota Of. 114 Edif.

453 (Aulas de Ingenierıa), Colombia

a r t i c l e i n f o

Article history:

Received 18 August 2011

Received in revised form

30 August 2012

Accepted 29 October 2012

This paper has been recommended

for acceptance by S.-K. Chang

query. Image collection exploration is a new paradigm where users interact with the

image collection to discover useful and relevant pictures. This paper proposes a frame-

Available online 16 November 2012

Keywords:

Image collection exploration

Exploratory search

Summarization

Visualization

Kernel methods

6X/$ - see front matter & 2012 Elsevier Ltd

x.doi.org/10.1016/j.jvlc.2012.10.008

esponding author. Tel.: þ571 3165000.

ail address: [email protected] (F.A. G

indLab-Bioingenium Research Group, Univ

ia, Bogota, Colombia.

a b s t r a c t

While search engines have been a successful tool to search text information, image

search systems still face challenges. The keyword-based query paradigm used to search

in image collection systems, which has been successful in text retrieval, may not be

useful in scenarios where the user does not have the precise way to express a visual

work for the construction of an image collection exploration system based on kernel

methods, which offers a mathematically strong basis to address each stage of an image

collection exploration system: image representation, summarization, visualization and

interaction. In particular, our approach emphasizes a semantic representation of images

using kernel functions, which can be seamlessly harnessed across all system compo-

nents. Experiments were conducted with real users to verify the effectiveness and

efficiency of the proposed strategy.

& 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Image collections are growing very fast due to the easyaccess to low-cost digital cameras and other imagecapturing devices. For instance, Flickr, a photo sharingsystem on the web, receives about 5000 new photos perminute. Finding useful information in these large imagecollections is a very difficult task. Current approaches forimage search are based on keywords or image examplesprovided by users [1], requiring a clear goal during thesearch process. These systems are useless when all whatis required is to understand the structure of an imagecollection, i.e., to understand what kind of images areavailable in the collection and what are the relationshipsbetween them. For instance, in a personal photo collectionthat has been built for years, someone would like toexplore the contents of the entire collection to identify

. All rights reserved.

onzalez).

ersidad Nacional de

meaningful patterns, special events, remember occasionsand re-discover images that were archived a long timeago. This task becomes challenging when the collectioncomprises thousands of images that are shared betweenmultiple users, such as in a scientific research community,training in medical imaging, and even in social networksand web communities. A system to explore and under-stand an image collection could rise the awareness of allthe useful material archived in these repositories.

Building an exploration system involves several tech-nical problems to process and organize visual informa-tion, following a strategy for image collection exploration.We define an image collection exploration system as thecomposition of four main components: content represen-tation, summarization, visualization and interaction.Content Representation is the ability of the system toencode and recognize visual configurations in images.This is usually done using features that discriminatedifferent images and match similar ones. Summarization

consists in selecting an iconic portion of the repositorythat represents a set of images to build an overview thatallows the user to see example images of potential

Page 2: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6754

contents. These example images should be representativepictures of semantic subsets in the collection, assumingthe existence of subsets that share meaning and/or visualrelationships. Visualization consists in organizing a groupof images into the screen following a metaphor thatrepresents relationships between them. Interaction is thestage in which the user interacts with the system. At eachstep of the exploration process, the user performs someactions to feedback the system and the system learns fromthis feedback to refine search results.

Previous works that address the image collectionexploration problem focus on parts of the whole problem.The retrieval community makes special efforts on imagecontent representation to build systems that correctlymatch similar images [2]. However, these systems areoriented to support specific query paradigms as opposedto offer exploration capabilities. Other works that con-centrate on summarizing and visualizing image collec-tions [3–7], usually build their systems on top of verybasic image representations and may not evaluate inter-action functionalities. Research efforts in interactivesearch systems may be close to integrate all the pipeline[8], however, delegate the major responsibility to theinteraction component, and do not explore dependenciesbetween different subsystems. A holistic view of theproblem may provide insights of which parts of the wholeproblem contribute to bring a better experience for users.

In this paper, we propose the design of an imagecollection exploration system that brings an alternativeway to access image repositories. The proposed systemallows to explore image collections in an intuitive waywhere the system visualizes image results using a two-dimensional metaphor exploiting image similarity, sum-marizes image results, and learns from the user interac-tion in order to refine the user search needs. Besides, wepropose a framework, completely based on kernel meth-ods, to model the computational methods behind eachstage of the system. In this framework, it is possible to useany valid image kernel, which enables the system to usediscriminative and expressive content representations. Inorder to harness the power of kernel functions to buildmeaningful representations, we also propose the con-struction of a semantic kernel that combines differentvisual features according to information of categories tobetter discriminate between different classes of images.This representation is exploited in each component ofthe system to generate relevant summarizations, usefulvisualizations and appropriate interaction responses.

The contributions of this work are: (1) the design andimplementation of a complete image exploration systemwhich includes all four components: a common contentrepresentation, summarization, visualization and interac-tion. Up to our knowledge, this is the first work thatpresents a fully integrated system for image exploration.(2) A semantic image representation based on kernelfunctions, that integrates category information of imagestogether with visual features. This representation is theresult of an optimization procedure that selects the bestcombination of visual features with respect to a categorydiscrimination criterion. (3) To take advantage of the seman-tic image representation provided by kernel functions, the

proposed computational framework is entirely based on thekernel methods. Then, the framework exploits non-linearpatterns in the collection and separates the representationfrom the algorithms.

This paper is organized as follows: Section 2 reviewssome related work; Section 3 presents materials andmethods; Section 4 presents experimental evaluation;and finally, Section 5 presents the concluding remarks.

2. Related work

2.1. Image representation

In computer vision applications, visual features areusually organized in a feature vector, in such a way thatimages are represented as points in Rn (where n is thenumber of features). Kernel methods are also popular incomputer vision to analyze and recognize images andobjects. Kernel representations are useful to find non-linear patterns in data using the kernel trick, which is away of mapping observations into an inner product space,without explicitly computing the mapping, and expectingthat non-linear patterns in the original representation spacemap to linear patterns in the kernel-induced space [9].Intuitively, kernel functions provide a similarity relationshipbetween objects being processed. Kernel functions havebeen successfully used in a wide range of problems inpattern analysis since they provide a general framework todecouple data representation and learning algorithms.

Kernels on images have been widely proposed: histo-gram intersection kernel [10], graph kernels [11] andpyramidal kernel [12]. In this work, we use histogram

intersection kernel (HIK) [13], however, the proposedmodular strategy can be applied to any kind of validimage kernel. In our work, we have chosen the HIK sinceit has exhibited an excellent performance when imagefeatures are represented by histograms. HIK has beensuccessful in different domains such as pedestrian detec-tion [14,15], scene recognition [16], pattern recognition[12,17], and image recognition [18,19], among others.Recently, the HIK has attracted a lot of attention in thecomputer vision community not only because of itsexcellent performance as similarity measure but alsobecause of its fast evaluation speed [20].

Besides the use of a powerful kernel for image analysis,our work extends the representation of images to includesemantic information extracted from image categories.Usually, kernel functions for images are used for classifica-tion tasks, i.e., supervised learning settings. However, theproblem herein considered is not a classification problem,but an image exploration system. In that respect, we need toincorporate semantic information directly in the represen-tation, which is achieved using optimization strategies onkernel functions.

2.2. Image collection summarization

In general, keyword-based search systems for imagesmay be a restrictive query strategy for users, specially whenspecific visual compositions are required. In such cases,additional query tools are useful. Interactive search systems

Page 3: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 55

for images have been studied during the last years to allowusers to introduce additional hints to the system about theimages that should be retrieved. The use of example imagesas queries may provide a good head start for finding relatedimages, so, this is one of these potential hints [1]. However,finding a good query image is a serious problem by itself.

To overcome this problem, the system may help theuser by showing a set of images from the collection, toallow the selection of interesting images and conduct anexploration. The first problem of this approach is theselection of a proper set of images to show to the user.Simple strategies may be used such as just picking a fixednumber of random pictures from the database. But thatcan be very frustrating for users if the set of images doesnot contain something representative, and asking formore random pictures might be hopeless. Summarizationis a more promising strategy, which builds on top of theextraction of meaningful patterns from the image collec-tion to select representative images in an automatic andinformed way.

In most of the cases, the summarization problem isapproached as a non-supervised learning problem. Typi-cally, image clusters are identified in the collection andrepresentative images from each cluster are chosen tocompose the summary. Previous works that have addressedthe construction of image collection summaries are basedon the different methodologies, from clustering methods[21,22] to similarity pyramids [23], including graph meth-ods [5,24], neural networks [25], formal concept analysis[26] and kernel methods [27]. Most of these works use low-level visual features for the construction of summaries, andit is well known that the quality of the resulting clustersdepends on the underlying content representation. The moresemantic the representation, the higher the quality of thesummary. In our work, we make a special effort to build animage representation that incorporates semantic knowledgefor the construction of summaries.

2.3. Image collection visualization

After a summary has been built, the next challenge isto show the summary to the users. Again, simple strate-gies can be employed, such as sorting the set of repre-sentative images in a sequential list. However, to takeadvantage of the ability of human beings to interpretvisual information, a more powerful visualization can beconstructed, in such a way that relationships betweenimages may be easily identified. The organization ofimages in a two-dimensional map is a useful metaphorfor achieving that. Thus, projecting the high dimensionalrepresentation of images into the screen actually helps toreveal groups of images that share similar contents.

Methods like Multidimensional Scaling (MDS) [28],Principal component analysis (PCA) [29], Isometric Map-ping (Isomap) [30], Local Linear Embedding (LLE) [31],and combinations of them, have been used to generatethe navigation map. Some works focus on non-lineardimensionality reduction, also known as manifold learn-ing. Weinberger et al. [32] for example, address theproblem of learning a kernel matrix for non-linear dimen-sionality reduction, trying to answer the question of how

to choose a kernel that best preserves the original struc-ture. In [33], kernel Isomap is proposed, which addressesgeneralization and topological stability problems of theoriginal Isomap.

Nguyen et al. [8] use some of the mentioned methodsand focus on how to organize the obtained projectiontaking into account a trade-off among image collectionoverview (summary), visibility (occlusion) and structurepreservation (dimensionality reduction). Janjusevic et al.[34] address the problem of how to optimize the limitedspace to visualize an image collection. In [35], the authorspropose a modification of MDS that solves the overlap-ping and occluding problems, using a regular grid struc-ture to relocate images. Chen et al. [36] propose apathfinder-network-scaling technique that uses a similar-ity measure based on the color, layout and texture.

Liu et al. [37] use a browsing strategy based on a one-page overview and a task-driven attention model in orderto optimize the visualization space. Users can interactwith the overview with a slider bar that allows to adjustthe image overlapping. Porta [38] proposes different non-conventional methods for visualizing and exploring largecollection of images, using metaphors such as cubes,snow, snakes, volcanos and funnels, among others.

Most of these algorithms and strategies try to preserveimage relationships from a low-level feature space into atwo-dimensional map. However, once again, the originalrepresentation in an exploration system requires to repre-sent semantic relationships among images to help the uservisualize meaningful patterns. In our work, we develop acommon framework to exploit the same image representa-tion used to build summaries to also organize meaningfulvisualizations.

2.4. User interaction and feedback

When designing image search systems, the problem ofthe semantic gap has to be addressed, i.e., managing thedifference between low-level image representations and thesemantic high-level human perception. One successfulstrategy is modeling human interaction, i.e., processing theuser’s feedback. In an interactive framework for imagesearch, the user is an active element of the explorationprocess. The most popular model to involve the user in theprocess is the well known Relevance Feedback (RF) mechan-ism, which is widely used in text retrieval.

RF was introduced in the mid-1960s by the text retrievalcommunity to improve retrieval effectiveness. RF is aprocess to automatically adjusting an existing query byusing user’s feedback. In the context of images, this mechan-ism is known as Interactive Content-Based Image Retrieval(ICBIR), where the image query is refined in an iterativeprocess to model the user’s needs. Basically, this modelconsists of a process where images are given to the users,they select which are relevant to the query and the systemreturns a new result based on this feedback. During thisiterative process, the ICBIR system learns the correspon-dence between low-level features and high-level concepts.

The most representative works in this area can beorganized as follows: (1) Query point movement [39–41],where the query is moved in the search space to relevant

Page 4: A kernel-based framework for image collection exploration

Fig. 1. Image collection exploration framework.

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6756

images according to the user feedback; (2) Active learning

[42–44], where the system trains a Support VectorMachine (SVM) to select the closest images to the SVMboundary in order to show to the user a new set ofimages; (3) The ostensive model [45,2], where users do nothave to rank or label images, instead, the user only selectsan image from the retrieved set and a new set is returnedin response. In this model, the query is built from theuser’s interaction with the collection through time, and iscalculated as a weighted sum of the features of thecurrent image and the previously selected ones duringthe navigation. (4) Long-term learning [46], where all theinteraction with the system in different sessions isrecorded and used to train a learning model.

In our work, we formulate a RF mechanism that isincorporated in the proposed general framework. Our strat-egy takes advantage of the interaction process on top of acommon kernel representation, which has been semanti-cally adapted from the very beginning, providing an addi-tional layer of abstraction to tackle the semantic gap.

2.5. Integrated exploration systems

We consider that an integrated system for image collec-tion exploration should involve four important components:(1) An appropriate image content representation; (2) asummarization component to prevent the user from review-ing many pages; (3) a visualization strategy to organizeexploration results in a meaningful way; and (4) an inter-active mechanism that allows users to lead the systemtoward the desired results. Most of the works reviewed inthis section focus on one or two of these components,making contributions to the understanding of important,but independent problems.

In this work, we propose a system that integrates all ofthese components in a seamless framework. This allows tostudy the exploration process as a whole. Our approachexploits the ability of kernel methods to build expressivecontent representations for images and, at the same time,allowing methods to take advantage of non-linear patternsin the representation. This basically allows the system andthe user to manipulate a common content representationof images.

2.6. Datasets and evaluation

In the literature, we find different works that addressparticular stages of the image collection explorationprocess, which use the Corel dataset in summarization[47,48], visualization [8,49,34], and interaction [50,51].Other image datasets such as Flickr-based datasets havebeen recently used, but also for evaluating standaloneexploration tasks: visualization [27,52], summarization[53,54], and interaction [55,56]. Other important charac-teristics of these works is that they do not use a standardevaluation framework, as is the case of the TREC evalua-tion benchmark in content-based image retrieval tasks,where a complete process is defined to evaluate theperformance of a new algorithm. This phenomena canbe attributed to the fact that in image collection explora-tion there are many aspects that make difficult and

challenging the evaluation process. In a recent survey ofinteractive, image search reported in [57] authorsdescribe in detail the difficulty and challenging of evalu-ating image collection exploration systems. Authors arguethat evaluation of image collection exploration systems isin their ‘‘infancy’’ and ‘‘some works provide no evaluation,because they present a novel idea and only show a proofof concept.

3. Materials and methods

The proposed framework is illustrated in Fig. 1, whichshows the integrated system working on top of a commonkernel representation of images. The first component is theone that supports the whole framework providing a power-ful representation of images using kernel functions. Thefollowing components, i.e., summarization, visualizationand relevance feedback are formulated as kernel algorithmsto exploit non-linear patterns in the representation.

In this section, we start describing the process of buildinga discriminative image representation that includes visualfeatures and semantic information from image categories.Then, the algorithms for summarization and visualization aredescribed. Finally, the relevance feedback mechanism isformulated, to complete the kernel-based framework forimage collection exploration.

3.1. Image representation

The aim of the feature extraction process is to identifyand extract relevant information from the image thatallows discrimination of different image classes. Featureextraction approaches are based on the calculation ofobjective content measures related to visual patterns suchas colors, textures, edges, among others. In general,images can be represented as a set of histograms fordifferent visual features. In this work, we extract the grayhistogram, RGB histogram (color) [58], Sobel histogram(edges) [59], and Local Binary Patterns (LBP) histogram(texture) [60].

Although histograms can be seen as feature vectors,they have particular properties that can be exploited by asimilarity function. Consider h as a histogram with n bins,associated to one of different visual features. The histo-gram intersection kernel between two histograms isdefined as k\ðhi,hjÞ ¼

Pnl ¼ 1 minðhiðlÞ,hjðlÞÞ, where hi, hj

are histograms and minð�,�Þ is the minimum value betweentwo histogram bins. Intuitively, this kernel function is

Page 5: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 57

capturing the notion of common area between both histo-grams. Note that, if the input histograms have n bins, thehistogram intersection kernel is actually embedding theseobjects into a feature space of a dimensionality many timesgreater than n.

In this work, four different histograms were calculatedfor each image. Using k\ and these four visual features, weobtain four different kernel functions that will be used forbuilding a new kernel. A kernel function using just one low-level feature provides a similarity notion based on theparticular aspect of the visual perception. For instance, theRGB histogram feature is able to indicate whether twoimages have similar color distributions. However, we aimto design a kernel function that provides a better notion ofimage similarity according to prior information. We con-struct the new kernel function using a linear combination ofkernels associated to individual features as follows

Ka ¼XN

i ¼ 1

aiKi, ð1Þ

The simplest combination is obtained by assigningequal weights to all basis kernel functions, so the newkernel induces a representation space with all visualfeatures. However, depending on the particular class,some features may have more or less importance.For instance, in a class where images share the samecolor distribution the RGB feature will not be a gooddiscriminant, so in this case other features such as textureand edges will be more suitable to discriminate it.Kandola et al. [61] proposed the kernel alignment strategyin the context of supervised learning to combine differentvisual features in an optimal way with respect to adomain knowledge target (ideal kernel). The empiricalkernel alignment, is a similarity measure between twokernel functions, calculated over a data sample. If Ka andKt are the kernel matrices associated to kernel functionska and kt in a data sample S, the kernel alignment measureis defined as

ASðKa,KtÞ ¼Ka,Kth iFffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

Ka,Kah iF

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiKt ,Kth iF

p , ð2Þ

where �,�h iF is the Frobenius inner product defined asA,B� �

F¼P

i

PjAijBij, Ka is the linear combination of basis

kernels, that is, the combination of all visual featuresgiven by kaðx,yÞ ¼

Pfaf k\ hf ðxÞ,hf ðyÞ

� �, where hf(x) is the

fth feature histogram of image x, and a is a weightingvector. The definition of a target kernel function Kt, i.e., anideal kernel with explicit domain knowledge, is done

Fig. 2. Image feature ex

using labels associated to each image that are extractedfrom previous information (class labels). It is given by theexplicit classification of images for a particular class usingyn as the labels vector associated to the nth class, in whichynðxÞ ¼ 1 if the image x is an example of the nth class andynðxÞ ¼ �1 otherwise. So, Kt ¼ yy0 is the kernel matrixassociated to the target for a particular class. This config-uration only considers a two-class case. We need to builda new kernel function that takes into account the infor-mation of all classes simultaneously (multi-class case).

Vert [62] proposes a strategy that addresses the multi-class problem in the context of multi-class classification.Therefore, we adapted that strategy in the context ofimage collection summarization. The author proposes tobuild the ideal kernel matrix as follows:

Ktðx,x0Þ ¼1 if y¼ y0

�1=ðQ�1Þ otherwise

(, ð3Þ

where Q is the number of classes. Kt is, by construction, avalid kernel and we will call it the ideal kernel. Undersome regularity assumptions on Ka, the alignment func-tion is differentiable with respect to a. Upon this assump-tion we can use a gradient ascent algorithm in order tomaximize the alignment between the combined kerneland the ideal kernel as follows:

an ¼ argmaxa2Y

ASðKa,KtÞ: ð4Þ

Kernel alignment strategy has been used in the contextof supervised learning and in classification problems.We use it for both, supervised and non-supervised learn-ing in the context of image collection exploration. Fig. 2shows the process of proposed kernel alignment process.

3.2. Summarization

In large collections of images, it is not possible to showall images at once to the user due to the limitations ofscreen devices. Therefore, it is necessary to provide amechanism that summarizes the image collection. Thissummary represents an overview of the dataset and allowsthe user to start the navigation process. A good summarycorresponds to a representative set of samples from thecollection, i.e., a set that includes prototypical images fromthe different categories present in the collection. The mostcommon strategies to build image collection summaries arebased on the clustering algorithms [63]. Fig. 3 shows apicture of the process used for summarizing an imagecollection. In the proposed framework, kernel k-means [9]

traction process.

Page 6: A kernel-based framework for image collection exploration

Fig. 3. Summarization process based on the kernel matrix.

Fig. 4. Two-dimensional visualization. Similar images are projected close to each other in the visualization space.

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6758

was used to summarize the collection. As input, the cluster-ing algorithm receives the aligned kernel matrix previouslydiscussed. The parameter k of the algorithm can be setaccording to the number of semantic categories of the imagecollection. In this paper, we propose to analyze the entropyof the summary to determine a value that yields a goodsummary. Our hypothesis is that the larger the number ofimage examples, the more descriptive the image summary is.

3.3. Visualization

Given a set of images, the visualization consists inorganizing those images in a two-dimensional coordinatessystem. Hence, the goal is to find a two-dimensionalcoordinate to each image given the kernel function. In thiswork, we use kernel principal component analysis (KPCA) [64]to find a representation of the image collection in a low-dimensional space. Images are represented based on theirprojections on the two principal components. This producesa two-dimensional representation that is expected to pre-serve, to some extent, the similarity relationship, i.e., two

similar images are expected to be projected to the sameregion. Fig. 4 shows a two-dimensional visualization of 2500images.

The aligned kernel matrix Ka obtained in Section 3.1 isgiven as input to the algorithm, which generates a set oftwo-dimensional coordinates, one for each image. Fig. 5shows the general process to build a two-dimensionalvisualization from the kernel matrix.

3.4. Interaction

Exploratory image collection search involves severaltasks performed by the user and other by the system. Thecomplete interaction loop is shown in Fig. 6. First, thesystem selects a representative image subset of thecollection. Then, the user determines which images arerelevant. Next, the system captures this feedback andreformulates the query using a relevance feedback (RF)model. Then, the system ranks the image set to select ther most relevant in order to reduce the search space. Thus,the system computes a new summary taking as input

Page 7: A kernel-based framework for image collection exploration

Fig. 5. Visualization process based on the kernel matrix.

Fig. 6. Interaction process. In each iteration the query is refined with the

user’s feedback.

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 59

those images. Finally, the user determines whether thetask has been completed, otherwise, the process isrepeated until the user interests have been meet.

In our strategy, the search space is reduced at eachfeedback iteration. In the first stage, the system shows animage collection summary that represents the completeimage collection. When the user selects a set of relevantand non-relevant images the first time, the system esti-mates the position of the new adjusted query and discardsthe ðl�rÞ farthest images from that position, with l beingthe size of the image collection and r the number ofrelevant images considered in this first iteration. Then,using only the r most relevant images, a new summary isbuilt to help the user iteratively concentrate on the kindof images that she/he is interested in. Note that the r

parameter must be dynamic as the user interacts with thesystem, each iteration being smaller. This is equivalent toreduce the search space by taking a hypersphere centeredin the query position, with a decreasing radius each time.At the beginning, the radius covers the complete imagecollection, since the user is exploring its contents. In theend, it is expected that the user may have access to a reduced

set of highly relevant images, since the provided feedbackrelocated the center of the hypersphere to an interestingregion. We explored two different functions to reduce thesearch space at each iteration: (1) A constant percentage ofthe current image set: rtþ1 ¼ srt , and (2) an exponentialdecreasing size: rtþ1 ¼ expð�srtÞ. The first function modelsa more exploratory task with slow convergence, compared tothe second, which leads the user to a small image set faster.In both cases, the user may control the s parameter whichdefines the actual convergence speed.

3.4.1. Kernel-based relevance feedback model

Primal model Rocchio [65] proposed a relevance feed-back model to learn from the user’s feedback. In thismodel, results are exposed to the user who selects whichdocuments are considered relevant and which are not.In this way, this task is performed iteratively such that ineach iteration the query is closer to the positive feedbackand away from the negative feedback. The model assumesa vector-space representation, i.e., both documents andqueries belong to a real-valued vector space. The Roc-chio’s formula is expressed a follows:

qtþ1 ¼ aqtþbR

XR

j ¼ 1

dj�g

NR

XNR

j ¼ 1

dj, ð5Þ

where t denotes the tth iteration, qt is the query vector inthe previous iteration, dj is the jth document vector fromthe relevant and non-relevant sets, R is the number ofrelevant documents, NR is the number of non-relevantdocuments and the parameters a, b, g are used to weightthe importance of each component in the formula. Oncethe new query has been reformulated using this model,the search algorithm must be triggered to identify themost relevant documents. This model has been widelyused by the information retrieval community due to itssimplicity, speed and easy implementation. It is importantto note, that this model depends on the vector represen-tation. In the proposed framework, each image may berepresented as a complex object with structured data thatcan be compared to other objects using a similarity measureor kernel function. So that, in the proposed model, theoriginal Rocchio’s formula cannot be applied.

Dual model. Formally, a kernel function is a function kthat for all x, z in a certain set X (the problem space) satisfies

kðx,zÞ ¼ fðxÞ,fðzÞ� �

, ð6Þ

Page 8: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6760

where f is a mapping from X to an inner product space F

(the feature space), f : x�!fðxÞ 2 F:

The function k is a kernel function with F its corre-sponding feature space. This means that we can computethe inner product between the projection of two pointsinto the feature space without explicitly evaluating theircoordinates [9]. In the context of information retrieval,the problem space, X, corresponds to the original data andthe feature space, F, corresponds to its representation.This means that the similarity between documents andqueries must be calculated in the feature space, and this isexactly what the kernel function does. Following this idea,we formulate Eq. (5) in terms of points into the feature spaceand define its computation in terms of kernel functions.

First of all, we need to compute the center of mass ofthe relevant and the non-relevant images. The center ofmass in the feature space can be computed as the vectorfS ¼ ð1=‘Þ

P‘j ¼ 1 fðxjÞ. So we can compute the dot product

between relevant vectors and their center of mass asfollows:

/fðxÞ,frels S¼ fðxÞ,

1

R

Xj ¼ 1

R

fðyjÞ

* +¼

1

R

Xj ¼ 1

R

kðx,yjÞ, ð7Þ

where fy1, . . . ,yRg is the set of relevant images and frels is

its center of mass. Using the same analysis for the non-relevant images

/fðxÞ,fnonrels S¼ fðxÞ,

1

NR

Xj ¼ 1

NR

fðzjÞ

* +¼

1

NR

Xj ¼ 1

NR

kðx,zjÞ, ð8Þ

where fz1; . . . ; zNRg is the set of non-relevant images andfnonrel

s is its center of mass. Therefore, the similaritybetween an image x and the center of mass of relevantand non-relevant items can be expressed in terms of thekernel function calculated between relevant/non-relevantimages and all the images involved in the current itera-tion. The Rocchio’s model transforms the user’s queryaccording to the user’s feedback, resulting in a new queryvector qiþ1. Since we do not have an explicit vectorrepresentation for the query, let’s introduce the notationfðqÞ as the representation of the user’s query in thefeature space. Now, the Rocchio’s model can be expressedin terms of the dot products in the feature space:

fðxiÞ,fðqtþ1Þ� �

¼ a fðxiÞ,fðqtÞ� �

þ fðxiÞ,bR

XR

j ¼ 1

fðxjÞ

0@

1A* +

� fðxiÞ,g

NR

XNR

j ¼ 1

fðxjÞ

0@

1A* +

,

where xi is a given image, and t is the tth iteration.According to Eq. (6), we can rewrite last equation interms of the kernel function between the query and eachith document in the collection

kðxi,qtþ1Þ ¼ akðxi,qtÞþbkðxi,rsÞ�gkðxi,nrsÞ, ð9Þ

where kðxi,rsÞ ¼ ð1=RÞPR

j ¼ 1 kðxi,xjÞ and kðxi,nrsÞ ¼ ð1=NRÞPNRj ¼ 1 kðxi,xjÞ, according to Eqs. (7) and (8) respectively.Note that, even though we do not have an explicit

representation of the reformulated user’s query, we cancalculate the similarity measure between any image in

the collection and the new adjusted query. The similaritymeasure will be used as a score function to select themost relevant images.

4. Experimental evaluation

4.1. Dataset

The experiments were conducted on a subset of 25classes from the Corel image database, each class has 100items, leading to a collection of 2500 images. This dataset hasbeen extensively used to evaluate performance in conven-tional image retrieval tasks, but not in a complete imagecollection exploration system, therefore we propose a generalexperimental evaluation setup suitable for image collectionexploration systems that uses this dataset. Our results arenot directly comparable with previous works reported in theliterature since previous works use experiment setups thatare heterogeneous and measure the performance of specificstandalone image collection exploration tasks. In this paper,we propose an evaluation setup that can be used in futureworks to perform a complete evaluation of an image collec-tion exploration system as an overall process.

4.2. Aligned kernel matrix

For the experimental evaluation, the following imagefeatures are used: Gray histogram, RGB color histogram,Sobel histogram (borders) and local binary patterns (tex-ture). As described in Section 3.1, an optimal linear combi-nation of feature kernels is found by optimizing the functiondescribed in Eq. (4). The optimization was performed with agradient ascent algorithm that started with random avalues, step size Z¼ 0:1, rai ¼ 0:1 and 100 iterations.

The optimal kernel matrix can be computed with the aobtained after the optimization process

Ka ¼ 0:1537 � Kgrayþ0:0507 � Klbpþ0:1023 � Ksobel

þ0:1537 � Krgb: ð10Þ

Note that the color kernel has the highest weight in thelinear combination. It means that colors have differentdistributions among different classes in the Corel datasets,and so allow to discriminate contents. In contrast, thetexture kernel (LBP) has the lowest weight, which indicatesthat texture is not a good class discriminant in this dataset.

4.3. Summarization and visualization

A good summary is a representative set of samples fromthe collection, i.e., a set that includes prototypical imagesfrom the different categories present in the collection. Basedon this idea, we define a supervised summarization qualitymeasure that makes use of the image labels. This measurecorresponds to the entropy of the summarization and iscalculated as follows:

Hsummary ¼�XC

i ¼ 1

#Ci

k

� �log2

#Ci

k

� �, ð11Þ

where C is the number of classes, M¼ fm1, . . . ,mkg is theset of k medoids obtained in the clustering process, and

Page 9: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 61

#Ci ¼ :fmj 2 M9mj 2 Cig: is the number of medoids in M

that belong to class Ci. The quantity #Ci=k represents theproportion of samples in the summary that belongs toclass Ci. The maximum entropy is obtained when this valueis the same for all classes, e.i., 8i, #Ci=k¼ 1=C. In this case,all the classes are equally represented in the summary.The maximum entropy depends on the number of classes,Hsummary ¼ log2ðCÞ. In this experimental setup, log2ðCÞ ¼

log2 ð25Þ � 4:64. With this measure defined, we aim toassess the quality of the summaries generated for thefollowing kernel functions: an ideal kernel function usingEq. (3), which will have the maximum entropy since it hasthe a priori class labels information; a base-line kernel

Fig. 7. Entropy vs. number of centroids (average for 100 runs).

The kernel that involves domain knowledge (aligned kernel) outperforms

the base-line kernel.

Fig. 8. Two-dimensional visualization of the complete da

function as a combination of the base kernel functions(RGB, Sobel, LBP and Gray) with equal weights; and thealigned kernel built as was suggested in Section 3.

Fig. 7 shows the quality of the three summaries: ideal

kernel, basis kernel, and aligned kernel. Results show that thealigned kernel outperforms the baseline, which indicates thatthe proposed method improves the quality of the summary.All three kernels increase the summary entropy when thenumber of medoids is increased; with k¼50 medoids thesummary entropy is close to the maximum. A minimum ofthis parameter is related to the number of representativeimages in the summary. In our experiments, we know apriori 25 classes, but results show that we reach a mean-ingful discrimination when k¼50, that is, with two timesthe number of classes. Entropy is a measure that can beuseful to automatically found k in a collection where a prioriclass information is unknown. The obtained entropy fordifferent values of k shows that the minimum value of thisparameter can be found through entropy analysis of thesummarization process.

Fig. 8 shows the visualization of the entire datasetusing KPCA and a visualization of the automaticallygenerated summary is shown in Fig. 9.

4.4. Interaction

We implemented a prototype system using a web-enabled user interface. The system was deployed on acomputer with a 3.0 GHz Pentium D processor and 4 GB ofRAM memory. A screenshot is shown in Fig. 10.

The summarization algorithm was empirically set tofind 100 representative images from the results in each

taset highlighting the most representative images.

Page 10: A kernel-based framework for image collection exploration

Fig. 9. 2D visualization of the most representative images in the collection.

Fig. 10. Screenshot of the system. The user selects relevant images by double clicking on them.

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6762

iteration. The system allows users to select relevantimages in each iteration and provides a search buttonto request an update of the image collection visualization.We aim to identify the way in which the relevance feedbackparameters influence the response of the system in a category

search task. The purpose of this task is to find as manyimages as possible of a particular category.

The system evaluation involved 10 master studentsaged 25–30 years old whom had experience with searchengines. We met each participant separately and followed

Page 11: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 63

the procedure outlined as follows: (1) each user wasgiven a training session on the system; (2) each userwas asked to search images related to one of the 25possible categories according to a one random imagegenerated as starting point; (3) each user had to completeat least 10 iterations in the relevance feedback loop; (4) ineach iteration, the number of images of the target cate-gory was recorded in the server log. Experiments weregrouped in three particular evaluations: weightingimpact, filtering behavior and global performance.

4.4.1. Weighting impact

This evaluation aims to analyze the system responseusing different values of the parameters a, b and g. Theseparameters weight the importance of the selections madein previous iterations, the relevant examples selected bythe user and the non-relevant images, respectively. Figs. 11and 12 show the visualization of one of the experiments atiterations 3 and 4, respectively. The larger the number ofiterations, the higher the amount of images of the targetcategory. For this experiment, we calculated the recall of thefirst 10 iterations as is shown in Fig. 13. Using the parametersb¼ 0:4 and g¼ 0:6, indicates that non-relevant images havemore importance in the definition of the user’s query. Theplot associated to these parameters showed the lowest recallsince the query is forced to the direction of the negativeexamples and the user obtains more non-relevant images.The zig-zag shape indicates that the interaction with the userhas a contrary effect, since relevant images selected by theuser have less importance than the other amount of non-relevant ones. With b¼ 0:8 and g¼ 0:2, we give moreimportance to the positive examples selected by the user.

Fig. 11. Visualization of the class buses at iteration 3. Some images are visual

Surprisingly, this configuration does not lead the best results,because of the lack of discrimination ability between positiveand negative examples. When b, c were set to the sameweight, the performance reached a maximum value.

4.4.2. Filtering behavior

We also investigate the filtering behavior to identifythe influence of the r parameter in the explorationprocess, which is related to the number of similar imagesconsidered for selecting the nearest images to the queryat each iteration. Instead of using a decreasing functionfor this parameter, we fixed it to some constant values toanalyze the impact of this parameter in the convergenceof the categorical search task. We fixed the category calledbuses for the weighting impact and filtering behaviortasks, and evaluated its performance at each iteration.The lowest performance is obtained with large values ofthe parameter r, this is because of the search spaceremains very large among the different iterations andthe summary includes images of non-relevant categories.Another extreme is to reduce the search space to anumber of images less than the total number of relevantitems, that is in our experiments r¼50. That means thatgood results are obtained by the user at the first iterationssince the search space is smaller than the appropriate. It isinteresting to highlight that the parameter should beset near to the size of the categories, in our case, whenr¼100, see Fig. 14.

4.4.3. Interaction performance

Finally, the global performance evaluation was conductedwith a fixed configuration of the parameters analyzed on the

ly similar to one another but they do not belong to the buses category.

Page 12: A kernel-based framework for image collection exploration

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 2 4 6 8 10

reca

ll

iteration

B=1.0, G=1.0B=0.5, G=0.5B=0.8, G=0.2B=0.4, G=0.6

Fig. 13. Recall vs. iterations for the buses category. Each curve corre-

sponds to different combinations of values for b and g parameters.

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10

reca

ll

iteration

50100250500

Fig. 14. Recall vs. iterations for the buses category. Each curve corre-

sponds to a different value of r, the number of shown images.

Fig. 12. Visualization of the class buses at iteration 4. Some images are visually similar to one another but they do not belong to the buses category.

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6764

previous two evaluations, selecting those that leads to thebest configuration. In the global performance experiments,the user was asked to search images from a categoryrandomly assigned. Fig. 15 shows different recall plots forseveral categories. Note that finding images may be less ormore difficult according to the underlying target category.For instance, buses is easier than the wildcats, since in theformer only two iterations are needed to obtain 80% ofrecall, while in the later, only the 15% is obtained. Fig. 16shows the average recall for all categories in the collection,that shows an increasing recall value when the number ofinteractions increases.

5. Conclusions and future work

This paper presented a fully functional model forimage collection exploration. The main contribution ofthis paper is the proposal of a modular frameworkcompletely based on kernel methods including, imagerepresentation, summarization, and interaction. Kernelmethods have shown to be a powerful tool to modelimage semantics and to build better similarity measures.Traditionally, image collection exploration approachesuse feature vector representation to model image contentfollowing the standard practice in text retrieval. This

Page 13: A kernel-based framework for image collection exploration

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

reca

ll

iteration

aviationwildcats

catsdogs

buses

Fig. 15. Recall at first 10 iterations. The class buses reaches the highest

recall, whilst wildcats has the lowest.

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0 2 4 6 8 10

reca

ll

iteration

average performance

Fig. 16. Average recall for all classes at the first 10 iterations.

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 65

feature vector is then used to calculate a standardsimilarity measure such as cosine similarity. However,in the case of images this is not necessarily the best wayto proceed, image similarity calculation could involvemore complex procedures that combine the differentvisual features. The proposed framework exclusively usesimage similarity values which are modeled as kernels.So, the proposed approach can be applied to any kind ofvalid image kernel. This characteristic allowed us toformulate an image similarity learning process that opti-mally combine different visual kernel to generate a kernelthat better reflects the semantics of the collection. Thepaper also contributed the formulation of a kernelizedform of the Rocchio’s model, which effectively fits inkernel-based strategy. This kernelized relevance feedbackmodel can be used as baseline in the comparison withfuture kernel relevance feedback models.

We conducted experiments using an image collectionwith the order of thousands of images, and the resultsdemonstrate the potential of the proposed strategy toprovide effective exploratory access to it. Our future workincludes the design of extensions to efficiently process

very large image collections, with potentially millions ofimages. Other important line of research, which foresee asfuture work, is the design of kernels that integrate multi-ple sources of information associated to images, such asattached texts, geo-tags, temporal data and so on, whichwill furnish the system with more criteria to organizeimages when users interact with it.

Acknowledgments

This work was funded by the COLCIENCIAS projectnumber 202010017297 Sistema para la Recuperacion deImagenes Medicas Utilizando Indexacion Multimodal,RES. 2465 de 2011

References

[1] R.J.D.L.J.W.J.Z. Datta, Image retrieval: ideas, influences, and trendsof the new age, ACM Computing Surveys 40 (2) (2008) 1–60.

[2] D. Heesch, A survey of browsing models for content basedimage retrieval, Multimedia Tools and Applications 40 (2) (2008)261–284.

[3] J. Li, J.H. Lim, Q. Tian, Automatic summarization for personal digitalphotos, in: Proceedings of the 2003 Joint Conference of the FourthInternational Conference on Information, Communications andSignal Processing, 2003 and the Fourth Pacific Rim Conference onMultimedia, vol. 3, 2003, pp. 1536–1540.

[4] K. Borner, Extracting and visualizing semantic structures in retrie-val results for browsing, in: DL ’00: Proceedings of the Fifth ACMConference on Digital Libraries, ACM, New York, NY, USA, 2000,pp. 234–235.

[5] D. Cai, X. He, Z. Li, W.-Y. Ma, J.-R. Wen, Hierarchical clustering ofWWW image search results using visual, textual and link informa-tion, in: Proceedings of the 12th Annual ACM International Con-ference on Multimedia, 2004, pp. 952–959.

[6] C. Chen, G. Gagaudakis, P. Rosin, Content-based image visualization,iv 00.

[7] Y. Chen, J.Z. Wang, R. Krovetz, CLUE: cluster-based retrieval ofimages by unsupervised learning, IEEE Transactions on ImageProcessing 14 (8) (2005) 1187–1201.

[8] G.P. Nguyen, M. Worring, Interactive access to large image collectionsusing similarity-based visualization, Journal of Visual Languages &Computing 19 (2) (2008) 203–224.

[9] J. Shawe Taylor, N. Cristianini, Kernel Methods for Pattern Analysis,Cambridge University Press, 2004.

[10] A. Barla, F. Odone, A. Verri, Histogram intersection kernel forimage classification, in: Proceedings. 2003 International Conferenceon Image Processing, 2003, ICIP 2003, vol. 3, 2003, pp. III–513–16,vol. 2.

[11] Z. Harchaoui, F. Bach, Image classification with segmentation graphkernels, in: IEEE Conference on Computer Vision and PatternRecognition, 2007, CVPR ’07, 2007, pp. 1–8.

[12] S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatialpyramid matching for recognizing natural scene categories, in:2006 IEEE Computer Society Conference on Computer Vision andPattern Recognition, vol. 2, 2006, pp. 2169–2178.

[13] M.J. Swain, D.H. Ballard, Color indexing, International Journal ofComputer Vision 7 (1991) 11–32.

[14] S. Maji, A. Berg, J. Malik, Classification using intersection kernelsupport vector machines is efficient, in: IEEE Conference on ComputerVision and Pattern Recognition, 2008, CVPR 2008, 2008, pp. 1–8.

[15] Y. Zhi-qin, S. Song-zhi, L. Shao-zi, Research on branch and bound forpedestrian detection, in: 2011 IEEE International Conference onComputer Science and Automation Engineering (CSAE), vol. 2,2011, pp. 366–370.

[16] J. Wu, J. Rehg, Beyond the Euclidean distance: creating effectivevisual codebooks using the histogram intersection kernel, in: 2009IEEE 12th International Conference on Computer Vision, 2009,pp. 630–637.

[17] J. Almeida, N.J. Leite, R. da, S. Torres, VISON: video summarizationfor online applications, Pattern Recognition Letters 33 (4) (2012)397–409. intelligent Multimedia Interactivity.

Page 14: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–6766

[18] S. Boughorbel, J.-P. Tarel, N. Boujemaa, Generalized histogramintersection kernel for image recognition, in: IEEE InternationalConference on Image Processing, 2005. ICIP 2005, vol. 3, 2005,pp. III–161–4.

[19] L. Tang, G. Hamarneh, T. Bressmann, A machine learning approachto tongue motion analysis in 2d ultrasound image sequences, in: K.Suzuki, F. Wang, D. Shen, P. Yan (Eds.), Machine Learning in MedicalImaging, Lecture Notes in Computer Science, vol. 7009, Springer,Berlin/Heidelberg, 2011, pp. 151–158.

[20] J. Wu, A fast dual method for HIK SVM learning, in: Proceedings ofthe 11th European conference on Computer vision: Part II, ECCV’10,Springer-Verlag, Berlin, Heidelberg, 2010, pp. 552–565.

[21] D. Stan, I.K. Sethi, eID: a system for exploration of image databases,Information Processing and Management 39 (3) (2003) 335–361.

[22] I. Simon, N. Snavely, S.M. Seitz, Scene summarization for onlineimage collections, in: IEEE 11th International Conference on Com-puter Vision, ICCV 2007, 2007, pp. 1–8.

[23] J.-Y. Chen, C.A. Bouman, J.C. Dalton, Hierarchical browsing andsearch of large image databases, IEEE Transactions on ImageProcessing 9 (3) (2000) 442–455.

[24] B. Gao, T.-Y. Liu, T. Qin, X. Zheng, Q.-S. Cheng, W.-Y. Ma, Web imageclustering by consistent utilization of visual features and surround-ing texts, in: MULTIMEDIA ’05: Proceedings of the 13th AnnualACM International Conference on Multimedia, ACM, New York, NY,USA, 2005, pp. 112–121.

[25] D. Deng, Content-based image collection summarization and com-parison using self-organizing maps, Pattern Recognition 40 (2)(2007) 718–727.

[26] H. Nobuhara, A lattice structure visualization by formal conceptanalysis and its application to huge image database, in: IEEE/ICMEInternational Conference on Complex Medical Engineering, CME2007, 2007, pp. 448–452.

[27] J. Fan, Y. Gao, H. Luo, D.A. Keim, Z. Li, A novel approach to enablesemantic and visual image summarization for exploratory imagesearch, in: Proceeding of the 1st ACM International Conference onMultimedia Information Retrieval, ACM, New York, NY, USA, 2008,pp. 358–365.

[28] M. Torgerson, Multidimensional scaling: I. theory and method,Psychometrika 17 (4) (1958) 401–419.

[29] K. Pearson, On lines and planes of closest fit to systems of points inspace, Philosophical Magazine 2 (6) (1901) 559–572.

[30] J.B. de Silva, V. Tenenbaum, J.C. Langford, A global geometricframework for nonlinear dimensionality reduction, Science 260(2000) 2319–2323.

[31] L.S.S. Roweis, Nonlinear dimensionality reduction by locally linearembedding, Science 290 (5500) (2000) 2323–2326.

[32] K.Q. Weinberger, F. Sha, L.K. Saul, Learning a kernel matrix fornonlinear dimensionality reduction, in: ICML ’04: Proceedings ofthe Twenty-first International Conference on Machine Learning,ACM, New York, NY, USA, 2004, p. 106.

[33] H. Choi, S. Choi, Robust kernel isomap, Pattern Recognition 40 (3)(2007) 853–862.

[34] T. Janjusevic, E. Izquierdo, Layout methods for intuitive partitioningof visualization space, in: 12th International Conference Informa-tion Visualisation, IV ’08, 2008, pp. 88–93.

[35] G. Schaefer, S. Ruszala, Image database navigation on a hierarchicalMDS grid, 2006.

[36] C. Chen, G. Gagaudakis, P. Rosin, Similarity-based image browsing,in: Proceedings of the 16th IFIP World Computer Congress (Inter-national Conference on Intelligent Information Processing),Proceedings of the 16th IFIP World Computer Congress (Interna-tional Conference on Intelligent Information Processing), Beijing,China, 2000, pp. 206-213.

[37] B. Liu, W. Wang, J. Duan, Z. Wang, B. Shi, Subsequence similaritysearch under time shifting, in: Information and CommunicationTechnologies, 2006, ICTTA ’06. 2nd, vol. 2, 2006, pp. 2935–2940.

[38] M. Porta, Browsing large collections of images through unconven-tional visualization techniques, in: AVI ’06: Proceedings of theWorking Conference on Advanced Visual Interfaces, ACM Press,New York, NY, USA, 2006, pp. 440–444.

[39] Y. Rui, T.S. Huang, M. Ortega, S. Mehrotra, Relevance feedback: apower tool for interactive content-based image retrieval, 1998.

[40] C. Nastar, M. Mitschke, C. Meilhac, Efficient query refinement forimage retrieval, in: CVPR ’98: Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition,IEEE Computer Society, Washington, DC, USA, 1998, p. 547.

[41] K. Porkaew, S. Mehrotra, M. Ortega, Query reformulation forcontent based multimedia retrieval in Mars, in: ICMCS ’99: Pro-ceedings of the IEEE International Conference on Multimedia

Computing and Systems, IEEE Computer Society, Washington, DC,USA, 1999, p. 747.

[42] G.P. Nguyen, M. Worring, Optimization of interactive visual-similarity-based search, ACM Transactions on Multimedia Comput-ing, Communications and Applications 4 (1) (2008) 1–23.

[43] M.Y. Chen, M. Christel, A. Hauptmann, H. Wactlar, Putting activelearning into multimedia applications: dynamic definition andrefinement of concept classifiers, in: Proceedings of the 13thAnnual ACM International Conference on Multimedia, ACM, NewYork, NY, USA, 2005, pp. 902–911.

[44] P.G. Matthieu, M. Cord, RETINAL: an active learning strategy forimage category retrieval, in: Proceedings of the IEEE Conference onImage Processing, Singapore, 2004, pp. 2219–2222.

[45] I. Campbell, The Ostensive Model of Developing Information Needs,Ph.D. Thesis, University of Glasgow, 2000.

[46] J. Li, N.M. Allinson, Long-term learning in content-based imageretrieval, International Journal of Imaging Systems and Technology18 (2–3) (2008) 160–169.

[47] J. Fan, Y. Gao, H. Luo, Hierarchical classification for automaticimage annotation, in: Proceedings of the 30th Annual Interna-tional ACM SIGIR Conference on Research and Development inInformation Retrieval, SIGIR ’07, ACM, New York, NY, USA, 2007,pp. 111–118.

[48] J.J. Foo, J. Zobel, R. Sinha, Clustering near-duplicate images in largecollections, in: Proceedings of the International Workshop onWorkshop on Multimedia Information Retrieval, MIR ’07, ACM,New York, NY, USA, 2007, pp. 21–30.

[49] K. Sawase, H. Nobuhara, B. Bede, Visualizing huge image databasesby formal concept analysis, in: A. Bargiela, W. Pedrycz (Eds.),Human-Centric Information Processing Through Granular Model-ling, Studies in Computational Intelligence, vol. 182, Springer,Berlin/Heidelberg, 2009, pp. 351–373.

[50] P.H. Gosselin, M. Cord, RETINAL: an active learning strategy forimage category retrieval, in: 2004 International Conference onImage Processing, 2004, ICIP ’04, vol. 4, 2004, pp. 2219–2222.

[51] D. Liu, X.-S. Hua, L. Yang, H.-J. Zhang, Multiple-instance activelearning for image categorization, in: B. Huet, A. Smeaton, K.Mayer-Patel, Y. Avrithis (Eds.), Advances in Multimedia Modeling,Lecture Notes in Computer Science, vol. 5371, Springer, Berlin/Heidelberg, 2009, pp. 239–249.

[52] R. Raguram, S. Lazebnik, Computing iconic summaries of generalvisual concepts.

[53] T. Berg, A. Berg, Finding iconic images, in: IEEE Computer SocietyConference on Computer Vision and Pattern Recognition Work-shops, 2009, CVPR Workshops 2009, 2009, pp. 1–8.

[54] P.-A. Moellic, J.-E. Haugeard, G. Pitel, Image clustering based on ashared nearest neighbors approach for tagged collections, in:Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval, CIVR ’08, ACM, New York, NY,USA, 2008, pp. 269–278.

[55] R. Tronci, G. Murgia, M. Pili, L. Piras, G. Giacinto, Imagehunter: anovel tool for relevance feedback in content based image retrieval,in: C. Lai, G. Semeraro, E. Vargiu (Eds.), New Challenges inDistributed Information Filtering and Retrieval, Studies in Compu-tational Intelligence, vol. 439, Springer, Berlin/Heidelberg, 2013,pp. 53–70.

[56] G. Wang, D. Hoiem, D. Forsyth, Learning image similarity fromFlickr groups using fast kernel machines, IEEE Transactions onPattern Analysis and Machine Intelligence 99 (2012) 1.

[57] B. Thomee, M. Lew, Interactive search in image retrieval: a survey,International Journal of Multimedia Information Retrieval 1 (2012)71–86.

[58] D. Ing Sven Siggelkow, D. Prof, D.T. Ottmann, P. Dr, T. Ottmann, B.Haasdonk, L. Bergen, O. Ronneberger, C.B.S. Utcke, S. Siggelkow,Feature histograms for content-based image retrieval, 2002.

[59] I.E. Sobel, Camera Models and Machine Perception, Ph.D. Thesis,Stanford, CA, USA, aAI7102831, 1970.

[60] A flexible image database system for content–based retrieval,special issue on content based access for image and video libraries,75 (1/2) (1999) 175–195.

[61] J. Kandola, J. Shawe-Taylor, N. Cristianini, Optimizing KernelAlignment over Combinations of Kernel, Technical Report, Depart-ment of Computer Science, Royal Holloway, University of London,UK, 2002.

[62] R. Vert, Designing a M-SVM Kernel for Protein SecondaryStructure prediction, Master’s Thesis, DEA informatique de Lor-raine, 2002.

Page 15: A kernel-based framework for image collection exploration

J.E. Camargo et al. / Journal of Visual Languages and Computing 24 (2013) 53–67 67

[63] A. Gomi, R. Miyazaki, T. Itoh, J. Li, CAT: a hierarchical image browserusing a rectangle packing technique, in: 12th International Conferenceon Information Visualisation, 2008, IV ’08, 2008, pp. 82–87.

[64] S.B. Kybernetik, A. Smola, B. Scholkopf, E. Smola, K.-R. Muller, L.Bottou, C. Burges, H. Bulthoff, K. Gegenfurtner, P. Haffner,

Nonlinear component analysis as a kernel eigenvalue problem,1998.

[65] J. Rocchio, Relevance feedback in information retrieval, in: G. Salton(Ed.), The Smart Retrieval System: Experiments in AutomaticDocument Processing, Prentice Hall, 1971, pp. 313–323.