[Studies in Computational Intelligence] Advanced Methods for Computational Collective Intelligence Volume 457 ||

Download [Studies in Computational Intelligence] Advanced Methods for Computational Collective Intelligence Volume 457 ||

Post on 04-Dec-2016

221 views

Category:

Documents

9 download

Embed Size (px)

TRANSCRIPT

<ul><li><p>A Multi-agent Model for Image Browsing</p><p>and Retrieval</p><p>Hue Cao Hong1,2, Guillaume Chiron3, and Alain Boucher1</p><p>1 IFI, MSI team; IRD, UMI 209 UMMISCO; Vietnam National University,42 Ta Quang Buu, Hanoi, Vietnam</p><p>2 Hanoi Pedagogical University N02, Xuan Hoa, Phuc Yen, Vinh Phuc, Vietnam3 L3I, University of La Rochelle, 17042 La Rochelle cedex 1, France</p><p>Abstract. This paper presents a new and original model for imagebrowsing and retrieval based on a reactive multi-agent system orientedtoward visualization and user interaction. Each agent represents animage. This model simplies the problem of mapping a high-dimensionalfeature space onto a 2D screen interface and allows intuitive userinteraction. Within a unify and local model, as opposed to globaltraditional CBIR, we present how agents can interact through an at-traction/repulsion model. These forces are computed based on the vi-sual and textual similarities between an agent and its neighbors. Thisunique model allows to do several tasks, like image browsing and re-trieval, single/multiple querying, performing relevance feedback withpositive/nagative examples, all with heteregeneous data (image visualcontent and text keywords). Specic adjustments are proposed to allowthis model to work with large image databases. Preliminary results ontwo image databases show the feasability of this model compared withtraditional CBIR.</p><p>Keywords: Multi-Agent System, content-based image retrieval,attraction/repulsion forces.</p><p>1 Introduction</p><p>Content-based image retrieval (CBIR) is one of the major approaches for imageretrieval and it has drawn signicant attention in the past decade. CBIR usesthe image visual content, extracted using various features, for searching andretrieving images. Doing so, all images are represented into a feature space,which is often high-dimensional. Most CBIR systems return the most similarimages in a 1D linear format following decreasing similarities [11], which doesnot provide a good basis for user interaction.</p><p>Few years ago, several methods have been used in CBIR for mapping a high-dimensional feature space into a 2D space for visualization, with good results onvisualization, but less on interacting with result data. Some specic similaritymeasures are used for visualization, like the Earth Movers Distance (EMD) [14],for representing the distribution of images on a 2D screen. Principal Component</p><p>N.T. Nguyen et al. (Eds.): Adv. Methods for Comput. Collective Intelligence, SCI 457, pp. 117126.DOI: 10.1007/978-3-642-34300-1 11 c Springer-Verlag Berlin Heidelberg 2013</p></li><li><p>118 H.C. Hong, G. Chiron, and A. Boucher</p><p>Analysis (PCA) [7] is a basic and linear approach used for dimension reduction inimage retrieval. However, with PCA only linear correlations can be detected andexploited and it does not necessarily preserve at best mutual relations betweendierent data items [11]. Multi-dimensional scaling (MDS) [13] is a nonlineartransformation method to project a data set in a smaller space. MDS attemptsto preserve the original relationships of the high-dimensional space into the lowdimensional projection. MDS provides a more precise representation of relation-ships between images in the feature space compared to PCA. However, MDS hasa very large computational cost due to its quadratic dependence. Self-organizingmaps (SOM) [16] is a neural algorithm for visualizing and interpreting largehigh-dimensional data sets. SOM is ecient in handling large datasets [15] andit is robust even when the data set is noisy [5]. However, the structure of theSOM neural networks and the number of neurons in the Kohonen layer need tobe specied a priori. Determining the number of clusters is not trivial when thecharacteristics of the data set are usually not known a priori [3]. In all thesemethods, the user can browse an image database in a more intuitive way, butinteraction is still limited with these methods, especially when retrieval is basedon heterogeneous features, like image features and text features, which are di-cult to show together and eciently on a 2D screen [8]. Nowadays, the problemis still unsolved.</p><p>To cope with the existing drawbacks of the existing techniques, we explorein this paper the use of a new and innovative approach for this problem, whichis a multi-agent model for image browsing and retrieval. Multi-agent systemsare used in CBIR mostly for distributed retrieval [10], and less for the problemsof visualization and interaction. But interesting models can be found for othertypes of applications, on which we base our approach. For example, Renaulthas proposed a dynamic attraction / repulsion multi-agent model coupling [12]for the application of email classication. It aims to organize e-mails in a 2Dspace according to similarity, where each e-mail is represented by an agent andthere is no need to specify axes or how to organize information. The modelallows agents communicating with each other through virtual pheromones and tocollectively auto-organize themselves in a 2D space. Without much constraints,the system can organize (cluster/classify) information and let the user intituivelyinteracts with it. We keep the idea of using reactive agents with attraction /repulsion forces for organizing data in space and adapt it for image browsing andretrieval. For facilitating 2D display and interaction, in our model each image ofthe database is represented by an agent which moves in a 2D environment. Eachagent interacts with other agents/images using forces. These forces are computedaccording to the similarities between an agent/image and its neighbors. Thismodel allows the user to easily interact with images using an intuitive interface.The user can select (or deselect) one or many agent(s)/image(s) to perform aquery and can organize these queries by moving them in the environment. Themain drawback of this kind of techniques is the computation time for the multi-agent model organization and we propose some ways to cope this problem.</p></li><li><p>A Multi-agent Model for Image Browsing and Retrieval 119</p><p>The rest of this paper is organized as follow. In the next section, we present indetails our main contributions, which are the multi-agent model for image brows-ing and retrieval, the force computation model for images and the techniques towork with large image databases. Then, section 3 shows some qualitative andquantitative results of our model. Finally, section 4 gives a conclusion and somefuture work.</p><p>2 System Overview</p><p>2.1 Global System Model</p><p>The image database is modeled using a multi-agent system. Each agent rep-resents an image and bases its actions on the feature vector from that image.The agents move freely in a 2D environment which has no pre-dened axes ormeaning (Figure 1). They are reactive and only react to outside stimuli sentby other agents. Each agent interacts with others through forces, which can beattraction or repulsion. Forces are computed between two agents based on thethe similarity of their feature vectors. Thus, agents are attracted to similar onesand are repulsed from non-similar ones. This local behavior between two agentsproduces at the global level an auto-organization (like clustering) of all imagesin the 2D space.</p><p>An image query-by-example in that model is simply an image/agent clickedby the user. An agent-query is static (does not move), but the user can move itwhere he wants on the screen (to organize result display). Except for that, thisagent-query is behaving the same than the others, and still produces attraction/ repulsion forces toward other agents. Multiple queries are possible simply byclicking multiple images. Text queries (keywords) are given by adding an agent-text, representing the keyword, in the system. The forces are computed using textsimilarities, when annotation are available for agent-images. Queries can be pos-itive (normal queries) or negative (forces are inverted). This simple mechanismreproduces the relevance feedback behavior used in traditional CBIR systems,where the user can indicates positive or negative feedbacks to the system.</p><p>(a) (b) (c)</p><p>Fig. 1. Dynamic evolution of the system with an image query (in green in the middle).(a) The model is initialized with random placement of the images in the environment.An image is selected (in green) as a positivie query and placed in the center. (b) Systemstate after 50 time steps. (c) After 300 time steps.</p></li><li><p>120 H.C. Hong, G. Chiron, and A. Boucher</p><p>The major advantage of this model is its simplicity, allowing in a very sim-ple and intuitive model to reproduce complex behaviors observed in traditionalCBIR systems, like browsing, querying by text and/or image, interacting withuser, doing relevance feedback, mapping high-dimensional and hetereogeneousfeatures into a 2D space, etc. Moreover, where several dierent complex algo-rithms are needed to implement all these behaviors in a traditional CBIR system,in this new model, they are just consequences of the basic initial model. In thefollowing sections we detail all these aspects.</p><p>2.2 Agent Interaction Model (Force Computation Model)</p><p>A force applied between two agents can be attractive or repulsive and is rep-resented by a vector (magnitude and direction). If two agents are similar (im-age content and/or text/keyword similarities), they are attracted and if twoagents are dierent (non-similar), they are repulsed from each other. As pre-sented above, at each time step1, an agent interacts with its neighbors, gettingforces from them and moving reactively. The rst step to compute forces is there-fore to dene the neighbor list for an agent, which evolves dynamically as theagent moves. Through experiments, taking neighbours that are within specicradius in space increases computation time, reduces convergence speed and re-sult eciency too. We prefer to implement the method of selecting N agentsrandomly2 among all, which allows us to cope with these drawbacks. Thus, aneighborhood is more dened like a temporary (for one time step) relationshipbetween two agents that will interacts (react through forces) and it is not basedon spatial proximity. Once the neighbor list for an agent is known, then thisagent can simply compute the forces it is receiving from all these neighbors andreact according to them (Figure 2).</p><p>Fig. 2. Force computation. Agent A chooses (randomly) N neighbors (2 in this exam-ple) among all existing agents. The global force applied on agent A is the sum of the(repulsion) force between agents A and B and the (attraction) force between agents Aand C.</p><p>1 In our implementation, agents are executed in pseudo-parallelism managed by ascheduler, and each agent is executed for a small time unit (corresponding to oneforce computation loop), and so one time step in the model corresponds in realityto one iteration loop of all agents.</p><p>2 N=20 in our experimentations.</p></li><li><p>A Multi-agent Model for Image Browsing and Retrieval 121</p><p>Image-based forces. The visual similarity S between two agent-images is cal-culated using the Squared Chord similarity [4]: S =</p><p>ni=1(</p><p>vi wi)2, where</p><p>v = (v1, v2, ..., vn) and w = (w1, w2, ..., wn) are the image feature vectors for thetwo images v and w. For N neighbors, an agent will calculate N similarities. Allthese similarities are sorted in ascending order, in order to be able to normalizethem. No global normalization is done, so each agent has to locally normalizeits computations based on its current information. Each similarity is normalizedfollowing:</p><p>Si = Smin + i Smax SminN 1 </p><p>Smin + Smax2</p><p>(1)</p><p>where Si is the similarity of the ith ranked neighbor after sorting, i varyingfrom 0 to N 1, Smin and Smax the minimum and maximum similarities of allneighbors. The spatial distance Di between two neighbor agents is calculatedfor two points P (x, y) and Q(s, t) using the chessboard distance: D(P,Q) =maximum(|x s|, |y t|). Similarly to similarities, distances are normalizedusing: Di = Di (Dmin +Dmax)/2.</p><p>Both normalizations (similarity and distance) produce negative and positivevalues, and the nal force between an agent and a neighbor is given by Table 1.Weak and strong correspond to constant values that can be calibrated for theenvironment, that are used to scale the force, while attraction and repulsioncorrespond to the sign of the force. The force magnitude between two agents isgiven by:</p><p>F = C SiDi</p><p>(2)</p><p>where C is a factor that can be 1 or 3 corresponding to Weak and Strong forces(Table 1).</p><p>Table 1.Magnitude and direction of a force according to the similarity and the distancebetween two agents</p><p>Similarity Distance Force</p><p>- - strong repulsion+ + strong attraction- + weak repulsion+ - weak attraction</p><p>Text-based forces. The text similarity between an agent-image and an agent-text3 depends on the number of keywords of each agent and the number ofcommon keywords between the two agents: Si = (nbcommon)/(nbminimum),where nbcommon is the number of common keywords between the two agentsand nbminimum is the minimum of owned keywords between the two agents (i.e.if agent A has 3 keywords and agent B has 5 keywords, then nbminimum = 3).</p><p>3 Currently, the only agent-text is for text queries (comprising one of more keywords)given by the user, while agent-images can be queries or database images withoutdistinction.</p></li><li><p>122 H.C. Hong, G. Chiron, and A. Boucher</p><p>Text similarity can only be computed if both agents have keywords, as text query(agent-text) or as image annotations (agent-image). Like for image-based forces,the type and direction of the force between two agents follows Table 1 and themagnitude of the force is given by F = C Si/Di (see above for details).Agent global force. The nal global force for an agent is simply the vectorialsummation of all forces between that agent and its neighbors (Figure 2). Theagent movement is induced by the nal global force applied directly on itselfadded to an inertial force. The goal of the inertial force is to keep the resultsof previous computations. Various values have been experimented to determinethe inertial force at time step t, the best one is to keep 80% of the nal globalforce at time step (t 1).</p><p>2.3 Human Interaction Model</p><p>A motivation that leads the creation of this system is to provide easy and in-tuitive user interaction. For searching and retrieving, the user can give an im-age query-by-example just by clicking on the wanted image. The correspondingagent-image is then indicated as query. In the model, the query-agent is managedlike all other agents except that it stops moving in the environment, letting allthe other agent-images organizing themselves around it. This query can only bemoved (dragged on the screen) by the user, allowing him to structure the screenspace as he wants. For performing a multiple query search, the user just needs toselect multiple images and organized them on the screen (Figure 3a). Throughclicking on agent-queries, the user can indicate them as positive or negative.A negative query translate as an inversion of the concerned forces. Attractionforces become repulsion and repulsion forces be...</p></li></ul>