Download - [Studies in Computational Intelligence] Advanced Methods for Computational Collective Intelligence Volume 457 ||

A Multi-agent Model for Image Browsing

and Retrieval

Hue Cao Hong1,2, Guillaume Chiron3, and Alain Boucher1

1 IFI, MSI team; IRD, UMI 209 UMMISCO; Vietnam National University,42 Ta Quang Buu, Hanoi, Vietnam

2 Hanoi Pedagogical University N02, Xuan Hoa, Phuc Yen, Vinh Phuc, Vietnam3 L3I, University of La Rochelle, 17042 La Rochelle cedex 1, France

Abstract. This paper presents a new and original model for imagebrowsing and retrieval based on a reactive multi-agent system orientedtoward visualization and user interaction. Each agent represents animage. This model simplifies the problem of mapping a high-dimensionalfeature space onto a 2D screen interface and allows intuitive userinteraction. Within a unify and local model, as opposed to globaltraditional CBIR, we present how agents can interact through an at-traction/repulsion model. These forces are computed based on the vi-sual and textual similarities between an agent and its neighbors. Thisunique model allows to do several tasks, like image browsing and re-trieval, single/multiple querying, performing relevance feedback withpositive/nagative examples, all with heteregeneous data (image visualcontent and text keywords). Specific adjustments are proposed to allowthis model to work with large image databases. Preliminary results ontwo image databases show the feasability of this model compared withtraditional CBIR.

Keywords: Multi-Agent System, content-based image retrieval,attraction/repulsion forces.

1 Introduction

Content-based image retrieval (CBIR) is one of the major approaches for imageretrieval and it has drawn significant attention in the past decade. CBIR usesthe image visual content, extracted using various features, for searching andretrieving images. Doing so, all images are represented into a feature space,which is often high-dimensional. Most CBIR systems return the most similarimages in a 1D linear format following decreasing similarities [11], which doesnot provide a good basis for user interaction.

Few years ago, several methods have been used in CBIR for mapping a high-dimensional feature space into a 2D space for visualization, with good results onvisualization, but less on interacting with result data. Some specific similaritymeasures are used for visualization, like the Earth Mover’s Distance (EMD) [14],for representing the distribution of images on a 2D screen. Principal Component

N.T. Nguyen et al. (Eds.): Adv. Methods for Comput. Collective Intelligence, SCI 457, pp. 117–126.DOI: 10.1007/978-3-642-34300-1 11 c© Springer-Verlag Berlin Heidelberg 2013

118 H.C. Hong, G. Chiron, and A. Boucher

Analysis (PCA) [7] is a basic and linear approach used for dimension reduction inimage retrieval. However, with PCA only linear correlations can be detected andexploited and it does not necessarily preserve at best mutual relations betweendifferent data items [11]. Multi-dimensional scaling (MDS) [13] is a nonlineartransformation method to project a data set in a smaller space. MDS attemptsto preserve the original relationships of the high-dimensional space into the lowdimensional projection. MDS provides a more precise representation of relation-ships between images in the feature space compared to PCA. However, MDS hasa very large computational cost due to its quadratic dependence. Self-organizingmaps (SOM) [16] is a neural algorithm for visualizing and interpreting largehigh-dimensional data sets. SOM is efficient in handling large datasets [15] andit is robust even when the data set is noisy [5]. However, the structure of theSOM neural networks and the number of neurons in the Kohonen layer need tobe specified a priori. Determining the number of clusters is not trivial when thecharacteristics of the data set are usually not known a priori [3]. In all thesemethods, the user can browse an image database in a more intuitive way, butinteraction is still limited with these methods, especially when retrieval is basedon heterogeneous features, like image features and text features, which are diffi-cult to show together and efficiently on a 2D screen [8]. Nowadays, the problemis still unsolved.

To cope with the existing drawbacks of the existing techniques, we explorein this paper the use of a new and innovative approach for this problem, whichis a multi-agent model for image browsing and retrieval. Multi-agent systemsare used in CBIR mostly for distributed retrieval [10], and less for the problemsof visualization and interaction. But interesting models can be found for othertypes of applications, on which we base our approach. For example, Renaulthas proposed a dynamic attraction / repulsion multi-agent model coupling [12]for the application of email classification. It aims to organize e-mails in a 2Dspace according to similarity, where each e-mail is represented by an agent andthere is no need to specify axes or how to organize information. The modelallows agents communicating with each other through virtual pheromones and tocollectively auto-organize themselves in a 2D space. Without much constraints,the system can organize (cluster/classify) information and let the user intituivelyinteracts with it. We keep the idea of using reactive agents with attraction /repulsion forces for organizing data in space and adapt it for image browsing andretrieval. For facilitating 2D display and interaction, in our model each image ofthe database is represented by an agent which moves in a 2D environment. Eachagent interacts with other agents/images using forces. These forces are computedaccording to the similarities between an agent/image and its neighbors. Thismodel allows the user to easily interact with images using an intuitive interface.The user can select (or deselect) one or many agent(s)/image(s) to perform aquery and can organize these queries by moving them in the environment. Themain drawback of this kind of techniques is the computation time for the multi-agent model organization and we propose some ways to cope this problem.

A Multi-agent Model for Image Browsing and Retrieval 119

The rest of this paper is organized as follow. In the next section, we present indetails our main contributions, which are the multi-agent model for image brows-ing and retrieval, the force computation model for images and the techniques towork with large image databases. Then, section 3 shows some qualitative andquantitative results of our model. Finally, section 4 gives a conclusion and somefuture work.

2 System Overview

2.1 Global System Model

The image database is modeled using a multi-agent system. Each agent rep-resents an image and bases its actions on the feature vector from that image.The agents move freely in a 2D environment which has no pre-defined axes ormeaning (Figure 1). They are reactive and only react to outside stimuli sentby other agents. Each agent interacts with others through forces, which can beattraction or repulsion. Forces are computed between two agents based on thethe similarity of their feature vectors. Thus, agents are attracted to similar onesand are repulsed from non-similar ones. This local behavior between two agentsproduces at the global level an auto-organization (like clustering) of all imagesin the 2D space.

An image query-by-example in that model is simply an image/agent clickedby the user. An agent-query is static (does not move), but the user can move itwhere he wants on the screen (to organize result display). Except for that, thisagent-query is behaving the same than the others, and still produces attraction/ repulsion forces toward other agents. Multiple queries are possible simply byclicking multiple images. Text queries (keywords) are given by adding an agent-text, representing the keyword, in the system. The forces are computed using textsimilarities, when annotation are available for agent-images. Queries can be pos-itive (normal queries) or negative (forces are inverted). This simple mechanismreproduces the relevance feedback behavior used in traditional CBIR systems,where the user can indicates positive or negative feedbacks to the system.

(a) (b) (c)

Fig. 1. Dynamic evolution of the system with an image query (in green in the middle).(a) The model is initialized with random placement of the images in the environment.An image is selected (in green) as a positivie query and placed in the center. (b) Systemstate after 50 time steps. (c) After 300 time steps.


The major advantage of this model is its simplicity, allowing in a very sim-ple and intuitive model to reproduce complex behaviors observed in traditionalCBIR systems, like browsing, querying by text and/or image, interacting withuser, doing relevance feedback, mapping high-dimensional and hetereogeneousfeatures into a 2D space, etc. Moreover, where several different complex algo-rithms are needed to implement all these behaviors in a traditional CBIR system,in this new model, they are just consequences of the basic initial model. In thefollowing sections we detail all these aspects.

2.2 Agent Interaction Model (Force Computation Model)

A force applied between two agents can be attractive or repulsive and is rep-resented by a vector (magnitude and direction). If two agents are similar (im-age content and/or text/keyword similarities), they are attracted and if twoagents are different (non-similar), they are repulsed from each other. As pre-sented above, at each time step1, an agent interacts with its neighbors, gettingforces from them and moving reactively. The first step to compute forces is there-fore to define the neighbor list for an agent, which evolves dynamically as theagent moves. Through experiments, taking neighbours that are within specificradius in space increases computation time, reduces convergence speed and re-sult efficiency too. We prefer to implement the method of selecting N agentsrandomly2 among all, which allows us to cope with these drawbacks. Thus, aneighborhood is more defined like a temporary (for one time step) relationshipbetween two agents that will interacts (react through forces) and it is not basedon spatial proximity. Once the neighbor list for an agent is known, then thisagent can simply compute the forces it is receiving from all these neighbors andreact according to them (Figure 2).

Fig. 2. Force computation. Agent A chooses (randomly) N neighbors (2 in this exam-ple) among all existing agents. The global force applied on agent A is the sum of the(repulsion) force between agents A and B and the (attraction) force between agents Aand C.

1 In our implementation, agents are executed in pseudo-parallelism managed by ascheduler, and each agent is executed for a small time unit (corresponding to oneforce computation loop), and so one time step in the model corresponds in realityto one iteration loop of all agents.

2 N=20 in our experimentations.


Image-based forces. The visual similarity S between two agent-images is cal-culated using the Squared Chord similarity [4]: S =

∑ni=1(

√vi − √

wi)2, where

v = (v1, v2, ..., vn) and w = (w1, w2, ..., wn) are the image feature vectors for thetwo images v and w. For N neighbors, an agent will calculate N similarities. Allthese similarities are sorted in ascending order, in order to be able to normalizethem. No global normalization is done, so each agent has to locally normalizeits computations based on its current information. Each similarity is normalizedfollowing:

Si = Smin + i · Smax − Smin

N − 1− Smin + Smax

2(1)

where Si is the similarity of the ith ranked neighbor after sorting, i varyingfrom 0 to N − 1, Smin and Smax the minimum and maximum similarities of allneighbors. The spatial distance Di between two neighbor agents is calculatedfor two points P (x, y) and Q(s, t) using the chessboard distance: D(P,Q) =maximum(|x − s|, |y − t|). Similarly to similarities, distances are normalizedusing: Di = Di − (Dmin +Dmax)/2.

Both normalizations (similarity and distance) produce negative and positivevalues, and the final force between an agent and a neighbor is given by Table 1.Weak and strong correspond to constant values that can be calibrated for theenvironment, that are used to scale the force, while attraction and repulsioncorrespond to the sign of the force. The force magnitude between two agents isgiven by:

F = C · Si

Di(2)

where C is a factor that can be 1 or 3 corresponding to Weak and Strong forces(Table 1).

Table 1.Magnitude and direction of a force according to the similarity and the distancebetween two agents

Similarity Distance Force

- - strong repulsion+ + strong attraction- + weak repulsion+ - weak attraction

Text-based forces. The text similarity between an agent-image and an agent-text3 depends on the number of keywords of each agent and the number ofcommon keywords between the two agents: Si = (nbcommon)/(nbminimum),where nbcommon is the number of common keywords between the two agentsand nbminimum is the minimum of owned keywords between the two agents (i.e.if agent A has 3 keywords and agent B has 5 keywords, then nbminimum = 3).

3 Currently, the only agent-text is for text queries (comprising one of more keywords)given by the user, while agent-images can be queries or database images withoutdistinction.


Text similarity can only be computed if both agents have keywords, as text query(agent-text) or as image annotations (agent-image). Like for image-based forces,the type and direction of the force between two agents follows Table 1 and themagnitude of the force is given by F = C · Si/Di (see above for details).

Agent global force. The final global force for an agent is simply the vectorialsummation of all forces between that agent and its neighbors (Figure 2). Theagent movement is induced by the final global force applied directly on itselfadded to an inertial force. The goal of the inertial force is to keep the resultsof previous computations. Various values have been experimented to determinethe inertial force at time step t, the best one is to keep 80% of the final globalforce at time step (t− 1).

2.3 Human Interaction Model

A motivation that leads the creation of this system is to provide easy and in-tuitive user interaction. For searching and retrieving, the user can give an im-age query-by-example just by clicking on the wanted image. The correspondingagent-image is then indicated as query. In the model, the query-agent is managedlike all other agents except that it stops moving in the environment, letting allthe other agent-images organizing themselves around it. This query can only bemoved (dragged on the screen) by the user, allowing him to structure the screenspace as he wants. For performing a multiple query search, the user just needs toselect multiple images and organized them on the screen (Figure 3a). Throughclicking on agent-queries, the user can indicate them as positive or negative.A negative query translate as an inversion of the concerned forces. Attractionforces become repulsion and repulsion forces become attraction. This behaviorallows to easily perform relevance feedback, with positive and negative feedbacks(queries). Images similar to a negative query moves far from it. The user candeselect a query, i.e. let the agent returning to a normal moving behavior. Justby clicking on the agent-images the user can do different traditional tasks forbrowsing, querying and retrieving images, selecting positive/nagative feedbacks.The computation force model is the same for all these tasks, with the only twoexceptions that a query-agent will not translate its forces into motion and thatthe forces can be inverted if the agent is indicated as a negative query.

Performing text queries is similar, except for the point that text-agents donot exist at beginning, so the user needs to create them. The text query allowsthe users to search images by text (one or several keywords), if images have apriori annotations. An interface allows the user to give one or several keywordsthat will be given that the newly created text-agent (Figure 3b). After that,the process is the same, this text-agent can be moved anywhere by the user, bepositive or negative, and multiple text-agents can be created.

Without query, all agents move in the environment until they converge to astable state. This stable state varies depending on the initial (random) positionsof images. With one or several queries, the system will converge to the statedesired by the user.


(a) (b)

Fig. 3. Different types of queries. (a) Multiples image queries, positive (2 green rect-angles in opposite corners) and negative (red rectangle in the center). (b) Text query(in the center).

2.4 Working with Large Image Databases

A large image database may contain thousands or even millions of images. Com-puting all similarities for ranking all images can be very slow, and not very useful.In our model, visualization-centered, we are not interested to show all imagesand to compute all similarities, but only to show a limited number of imageswhich are the most interesting for the user. We state the hypothesis that an usercan see a maximum of 500 or 1000 images at the same time on the screen (eventhis number is still considered as high). If too much images are displayed on thescreen, they are occluding each other and the user might loose the importantones (similar to his queries). Starting from this hypothesis, the system does notneed to perform all computations, but only a limited set of them. Some adjust-ments are made to the model to take this into account and limit the number ofcomputations.

The global environment is bigger than the part of the environment visibleon the screen. A difference is made between the agents currently visible on thescreen and the others. Precision and efficiency are needed for visible agents, butnot for the others. Visible agents recompute often their forces with neighbors,while non-visible agents do this less often, from 15 to 20 times less than the oth-ers4, the longer an agent stays in the non-visible space, the lower its priority. Theneighbor random selection function only takes into account visible agents for theforce computation. This allows agents to react mostly to visible images, withoutneglecting completely non-visible images. Forces applied to non-visible agentsare bigger (the value of a strong force in the model described previously is big-ger for them). Doing so, the positions of non-visible agents are not accurate, the

4 In our implementation of pseudo-parallelism, this is done by creating two agent listsfor the scheduler, one list of visible agents with high priority of execution, and onelist of non-visible agents with a low priority. Agents can change from one list to theother depending on their current position.


important is to promote the movement between visible and non-visible sectionsof the environment as necessary. This dual representation reduces significantlythe computation time.

3 Results

We are mainly interested in evaluating the feasibility of our multi-agent model. Inorder to do so, we use two image databases. The first is the Wang database [6]containing 1000 images separated in 10 classes (100 images / class). In ourexperiment, we use the class name has a textual annotation for each image(1 keyword/image). The second is the Corel30k database [2] containing 31,695images divided into 320 classes (approximately 100 images / class). The imagesare also annotated using a total of 5587 words, with each image having from 1to 5 keywords. In our experiment, we use the annotation keywords instead ofthe classes (more rich description), but we do not use the words having too fewexamples (<10 images/word), because these words are not generic enough toevaluate the result. For all images, we use the visual features color and texture.We compute the RGB color histogram compressed into 48 bins (16 per color)and the co-occurrence matrices with 4 features: energy, entropy, contrast andinverse difference moment. For each image, we use for our experiments a visualfeature vector of 52 values and a textual vector of 1-5 keywords. For computingall results, we let the system converge completely (after 3000 time steps) toobtain stable results. The Euclidian distance is then used to rank all imageswith the query and produce the shown results.

Figure 4 shows the results of our system for both image databases using recalland precision curves, common evaluation tools used in information retrieval.These results are similar, close but not greater, than other more traditional CBIRsystems [1,9]. These preliminary results validate the feasibility of our model asa potential new CBIR method and encourage us for pursuing to improve thismodel.

Our experimentations on giving different priorities for visible and non-visibleagents (section 2.4) show no visible difference in terms of results, but they show asignificant reduction (more than 3 times) of the computation time. Using a DellVostro 1310 (CPU Intel Core 2 Duo, 2 GHz, Ram 2GB) with Ubuntu 10.04 onthe Corel30k database, and measuring after 3000 time steps (more than neededfor full convergence of agents to a stable results), the system takes 5m35s withoutthe priority mechanism and 1m46s with this mechanism implemented. This lasttime still seems very high, but one has to remind that we aim an interactive andvisually intuitive system. Partial convergence5 occurs before that time (aroundhalf) which allow the user to benefit from the results.

5 Partial convergence in such a dynamic model means that agents are roughly at theirfinal positions, but still oscillating and the exact distance and ranking with the querycan still vary, but not too much.


(a) (b)

Fig. 4. Recall/precision results for our multi-agent model. (a) Results obtained for eachindividual class of the Wang database. (b) Global results for the Corel30k database,using only text-based or only content-based retrieval.

4 Conclusion and Future Work

In this paper we introduce a new approach for browsing and image retrievalbased on a multi-agent model. It is a dynamic model where similarities betweenimages are computed locally and over time. From a random placement of all im-ages, convergence is obtained by the interactions between the agents and theirneighbors. This model is not intended for real-time computation, but more forinteractive and visually intuitive image retrieval. The main advantage of thismodel is its simplicity, a unique model can aggregate and unify several indepen-dent existing behaviors in CBIR, like single or multiple querying, text or imagequery, relevance feedback with positive and negative examples. . . Moreover, itsimplifies the mapping of high-dimensional feature space into a 2D interface,with an auto-adaptive local model, and it allows easy human interactions, bothfor browsing and retrieval. The preliminary results validate the feasibility of thismodel, but it still needs to be improved.

More image features can be added, like in most CBIR methods, but our futurework will concentrate on the force computation model, trying to simplify it evenmore for reducing computation time. Few experiments show us that an integercomputed model can be almost as accurate as a floating model, due to localagent interactions and the incremental (converging over time) model. We aim inadding more heterogeneous features (like time stamps or image metadata) andstill making it visually intuitive for the user. One potential application of thismodel is the supervised annotation of large image databases, where we need tominimize the number of user interaction and to maximize the annotated numberof images.

References

1. Boucher, A., Dang, T.H., Le, T.L.: Classification vs recherche d’information: versune caracterisation des bases d’images. 12emes Rencontres de la Societe Franco-phone de Classification (SFC), Montreal (Canada) (2005) (in French)


2. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised Learning ofSemantic Classes for Image Annotation and Retrieval. IEEE Transactions on Pat-tern Analysis and Machine Intelligence 29(3), 394–410 (2007)

3. Forgac, M.R.: Decreasing the Feature Space Dimension by Kohonen Self-Organizing Maps. In: 2nd Slovakian Hungarian Joint Symposium on Applied Ma-chine Intelligence, Herlany, Slovakia (2004)

4. Hu, R., Ruger, S., Song, D., Liu, H., Huang, Z.: Dissimilarity mesures for content-based image retrieval. In: 2008 IEEE International Conference Multimedia andExpo (ICME), Hannover, Germany (2008)

5. Mangiameli, P., Chen, S.K., West, D.: A Comparison of SOM neural network andhierarchical clustering methods. European Journal of Operational Research 93(2),402–417 (1996)

6. Li, J., Wang, J.Z.: Automatic linguistic indexing of pictures by a statisticalmodeling approach. IEEE Transactions on Pattern Analysis and Machine Intel-ligence 25(9), 1075–1088 (2003)

7. Moghaddam, B., Tian, Q., Lesh, N., Shen, C., Huang, T.S.: Visualization & User-Modeling for Browsing Personal Photo Libraries. International Journal of Com-puter Vision 56(1/2), 109–130 (2004)

8. Nguyen, N.V., Boucher, A., Ogier, J.M., Tabbone, S.: Region-Based Semi-automatic Annotation Using the Bag of Words Representation of the Keywords. In:5th International Conference on Image and Graphics (ICIG), pp. 422–427 (2009)

9. Nguyen, N.V.: Keyword Visual Representation for Interactive Image Retrievaland Image Annotation. PhD thesis, University of La Rochelle (France) (2011) (inFrench)

10. Picard, D., Cord, M., Revel, A.: CBIR in distributed databases using a multi-agentsystem. In: IEEE International Conference on Image Processing, ICIP (2006)

11. Plant, W., Schaefer, G.: Visualising Image Database. In: IEEE International Work-Shop on Multimedia Signal Processing, pp. 1–6 (2009)

12. Renault, V.: Organisation de Societe d’Agents pour la Visualisation d’InformationsDynamiques. PhD thesis, University Paris 6, France (2001) (in French)

13. Rubner, Y., Guibas, L.J., Tomasi, C.: The earth movers distance, multi-dimensional scaling, and color-based image retrieval. In: APRA Image Understand-ing Workshop, pp. 661–668 (1997)

14. Rubner, Y., Tomasi, C., Guibas, L.J.: The Earth Mover’s Distance as a Metric forImage Retrieval. International Journal of Computer Vision 40(2), 99–121 (2000)

15. Xiao, X., Dow, E.R., Eberhart, R., Miled, Z.B., Oppelt, R.J.: Gene Clustering Us-ing Self-Organizing Maps and Particle Swarm Optimization. In: IEEE InternationalWorkshop on High Performance Computational Biology (2003)

16. Laaksonen, J., Koskela, M., Oja, E.: PicSOM – Self-organizing image retrievalwith MPEG-7 content descriptors. IEEE Transactions on Neural Networks 13(4),841–853 (2002)

Download - [Studies in Computational Intelligence] Advanced Methods for Computational Collective Intelligence Volume 457 ||

Top Related