# [IEEE 2010 International Conference on Artificial Intelligence and Computational Intelligence (AICI) - Sanya, China (2010.10.23-2010.10.24)] 2010 International Conference on Artificial Intelligence and Computational Intelligence - Image Auto-annotation with Graph Learning

Post on 16-Feb-2017

214 views

Embed Size (px)

TRANSCRIPT

<ul><li><p>Image Auto-annotation With Graph Learning </p><p>Yu Tang Guo Department of Computer Science and Technology </p><p>Hefei normal University Hefei,China </p><p>aieyt@ah.edu.cn </p><p>Bin Luo School of Computer Science and Technology </p><p>Anhui University Hefei,China </p><p>Luobinahu@yahoo.com.cn </p><p>AbstractIt is important to integrate contextual information in order to improve the performance of automatic image annotation. Graph based representations allow incorporation of such information. In this paper, we propose a graph-based approach to automatic image annotation which models both feature similarities and semantic relations in a single graph. The annotation quality is enhanced by introducing graph link weighting techniques based on inverse document frequenct and the similarity of the word based on Co-occurrence relation in the training set . According to the characteristics of inear correlation ,block-wise and community-like structure in the modeled graph, we divide the graph into several subgraphs and approximate high rank adjacent matrix of the graph by using low rank matrix. Thus,we can achieve image annotation quickly. Experimental results on the Corel image dabasets show the effectiveness of the proposed approach in terms of performance. </p><p>Index Termsimage annotation, graph learning, Random walk with restart, fast solution </p><p>I. INTRODUCTION With the advent of digital imagery, the number of digital </p><p>images has been growing rapidly and an efficent image retrieval system is desirable where given a large database, we need, for example, to find the images that have tigers,or given an unseen image,find keywords that best describe its content. </p><p>Early Content Based Image Retrieval(CBIR) systems were solely based on indexing low-level visual features such as color histograms,textures,shapes and spatial layout etc. However,the problem is that visual similarity is not semantic similarty. There is a gap between low-level visual features and semantic meaning. The so-called semantic gap is the major problem that needs to be solved for most CBIR approaches. </p><p>A solution towards bridging the semantic gap is to index images using semantic features,such as keywords,describing the content of the image. The majority of automatic image annotation system incorporate machine learing approaches for finding correlations between image visual features and words used to annotate images in a training set. The learnt correlations can then be used to annotate new images. Therefore,automatic image annotation is an important and challenging work. It can shorten the semantic gap in the content-based image retrieval. With automatic image </p><p>annotation, image retrieval will be based on current powerful pure text retrieval techniques. </p><p>II. RELATED WORKS There are so many researches on this subject: Mori et al. </p><p>proposed a Co-ocurrence Model[1] in which they looked at the co-occurrence of words with image regions created using a regular grid. Duygule et al proposed translation models[2],in which they described images using a vocabulary of blobs. First,regions are create using a segmentation algorithm like normalized cuts. For each regions,features are computed and blobs are generated by clustering the image features for these regions across images .Each image is generated by using a certain number of these blobs.Then, the semantic concept was translated into the image blobs. Another approach using Cross-Media Relevance Models(CMRM) was introduced by Jeon et al[3]. Here the joint distribution of blobs and words was learned from a training set of annotated images. Assume a set of keywords is related to the set of blobs in an image, rather than one-to-one correspondence between the blob-lokens and keywords. Carneiro and Vasconcelesy presented a Supervised Multiclass Labeling (SML) with the optimization and statistical classification based on minimum error rate criteria[4-5]. Domestic scholars have also carried out relevant studies in the automatic image annotation. In order to achieve the concept-based indexing of image annotation,the paper[6] mapped the low-level image feature to the high-level semantic features by using multi-class classifier based on support vector machine(svm). </p><p>For the past years, graph-based automatic image annotation have been proposed. In the paper[7],a graph-based approach is presented ,in which image visual content and keywords are incorporated by manifold ranking. A nearest spanning chain procedure is used to derive the similarity between images and keyword in an adaptive graph-like structure. After performing a random walk with restart on the graph, candidate keywords are output and inconsistent ones are filtered out by a word correlation procedure. In the paper[8], Pan et al proposed a Gcap algorithm, which models relationships between images and words by connecting them in an undirected graph. Image nodes are linked to each other based on their similarity measured in the image feature space ,while image and word nodes are linked based on the prior knowledge provided by the human-annotatted images of the training set. To annotate a new,one appends it to the most similar images of the trained graph and perform a random </p><p>2010 International Conference on Artificial Intelligence and Computational Intelligence</p><p>978-0-7695-4225-6/10 $26.00 2010 IEEEDOI 10.1109/AICI.2010.171</p><p>235</p><p>2010 International Conference on Artificial Intelligence and Computational Intelligence</p><p>978-0-7695-4225-6/10 $26.00 2010 IEEEDOI 10.1109/AICI.2010.171</p><p>235</p></li><li><p>walk with restart algorithm to estimate the steady-state probability of annotation word for the image to be annotated. </p><p>Gcap presented a graph-based learning method for automatic image annotation. But it has some limitations. In the Gcap, only similarity of the image regions is used </p><p>Image semantic has the characters of vague, complex, and abstractive, therefore only low-level features are not enough to describing image semantics, and require a combination of related content in an image in order to improve the accuracy of the image annotation. In this paper, an image annotation algorithm based on graph is proposed. The proposed approach models the relationship between the images and words by an undirected graph. Semantic information is extracted from paired-nodes, and inverse document frequency(IDF) [9] is used to amend the edge weights between the image node and its keywords, which overcome deviations caused by high-frequency words in the traditional method, improves effectively the image annotation performance. Based on the analysis on the structure of graph, A fast solution algorithm is proposed. </p><p>III. PROPOSED METHOD OVERVIEW </p><p>A. Model relationships among images and words with graph </p><p> Let },,{ 21 niiiT = is the training set of images, each the training image Ti is represented as visual feature if and },,{ 21 lwwww = is the list of keywords.Where </p><p>),,2,1( liwi = )is the ith word in the list, We use an undirected graph G= to represent the relationships among images and words . Image nodes are linked to its k nearest neighboring image nodes based on their similarity measured in the image feature space, edge weight is denoted as ),( ji ffsim . Image and word nodes are linked thanks to the prior knowledge provided by the human-annotated images of the training set,. edge weight is denoted as </p><p>),( iwisim . Similarly, each word nodes are linked to its k nearest neighboring nodes based on their similarity measure, its edge weight is denoted as ),( ji wwsim . Figure (1) as shown. </p><p> Fig.1 Diagram of relationships among images and words with </p><p>graph We adopt the uniformed Local Binary Pattern(LBP) </p><p>image features to represent each image.the LBP feature is a </p><p>compact textue descriptor for which each comparison result between a center pixel and one of its surrounding neighbors will be encoded as a bit in a LBP code.LBP codes are computed for every pixel and accumulated into a histogram to represent the whole image . </p><p>The weight between two images can be obtained by the equation (1) calculations: </p><p>othetwise</p><p>fofKNNisfffffsim jijiji</p><p>if</p><p>0</p><p>)/||||exp(),(</p><p>2</p><p>= (1) </p><p>The similarity of the keyword can be obtained by useing Co-occurrence based on probability statistics in data sets [10], Co-occurrence is expressed by the formula (2).: </p><p>N ( , )( , )</p><p>m in ( ( ) , ( ))i j</p><p>i ji j</p><p>w ww w</p><p>N w N w=s i m</p><p> (2) Where ),( ji wwN represent the co-occurrence </p><p>frequency of word iw and word jw , )( iwN and )( jwN represent the occurrence frequency in the training set of the word iw and the word jw , respectively. </p><p>According to the similarity ),( ji wwsim = of the word, two word nodes are connected with an edge if and only if they are the k nearest neighbors. </p><p>In the list of training set, each word the frequency varied widely, for example, in Corel image database, word Water, Shy and Tree appear far more than Race, Canoe. These high-frequency words have a dominant position in the similarity matrix. Regardless of the image to be annotated, these high-frequency words will affect the results of the final annotation. </p><p>In order to overcome the partial impact of high-frequency words, we use inverse document frequency(IDF) to amend the edge weights between the image node and its annotation words. </p><p>In the image annotation, an image is equivalent to a document and the annotation words are equivalent to the keywords in the document. )( iwdf represents the number of occurrences in the training set for the word iw . We use formula (3) to calculate the edge weights between the image node and its annotation words: </p><p>( , ) ((1 ) *log((1 | |)/ ( )))* ( , ) j j jsimi w w df w i w = + + (3) Where || w is the size of the annotated vocabulary. When is </p><p>jw the annotated word of the image i, ),( jwi is equal to 1, </p><p>otherwise take the value 0. is the smoothing factor used to adjust the weight of high-frequency words and low-frequency words in the training set. </p><p>B. Annotation a new Image To annotate a new image, one appends it to K nearest </p><p>neighbor images of the trained graph and perform a random walk with restart (RWR)[11] .RWR is a MRF process started from initial nodes which is an iterative process as equation (4) shows: </p><p> 1 (1 )n nR cAR c Y+ = + (4) </p><p>236236</p></li><li><p>Where R is an N dimensional vector represented transition states in MRF for all image nodes. N is the number of nodes. A is an adjacency matrix of graph G. (1-c) is the probability of each node back to the initial node during the random walking process. Y is an N dimensional vector with all elements zero but one exception of 1 on the position which is corresponding to the initial node. To ensure the convergence of equation (4), A is normalized along columns and the summation of all the elements of R is set to one. When equation (4) is converged, the word nodes are sorted with descent order by the corresponding R component. The first n (e.g. n=5) words are the annotation of the image. </p><p>C. preprocessing In the process of annotation, iterate equation (4) until it </p><p>converges. The time to converge is usually very long. Especially in image annotation system with large training dataset, a lot of time will cost on running RWR. Thit is due to the time complexity of RWR iteration is proportional to the number of iterations and the size of graph.Therefore. , fast resolution is urgently needed. Based on the structure analysis of graph, a fast alogirthm is proposed in this paper. By keeping comparative accuracy of image annotation, our algorithm avoids iterative computation and obtains efficient result. </p><p> Equation (4) can be reformulated as: 1 1(1 )( )nR c I cA Y+ = (5) </p><p>By seting cAIQ = , equation (5) can be rewrote as: 1 1(1 )nR c Q Y+ = (6) </p><p>Equation (6) is a close form of RWR. However, in practice, the directly resolution cant be obtained because the space and time complexity of 1Q is )( 2nO and )( 3nO respectively. For image annotation system with large data processing, directly computation of 1Q is impossible. </p><p>We noticed that matrix A has two prominent properties. Firstly, row elements and column elements are linear correlated in matrix A. Secondly, the matrix A consists of sparse and dense blcok regions. Based on the two properties of the modeled graph and the balance between accuracy and speed of image annotation, we propose a fast algorithm to approximate RWR resolution. </p><p>First of all, according to the first property, dimension reduction is used to obtain a low rank matrix to approximate 1Q . Based on property two, graph G can be subdivided into k subgraphs. The adjacent matrix of each subgraph is represented by ),2,1(1 kiA i = . The adjacent matrix among subgraphs is denoted by 2A . Till now, </p><p>solution to high rank matrix 1Q is transferred into solving several inverse matrix with low rank. The detailed algorithm is described as following: </p><p>Suppose V is node set with size n in the graph G. Normalize A with Laplacian normalization: </p><p> 1 12 2L D AD</p><p> = (7) </p><p>where ===</p><p>n</p><p>kijikii jiDAD</p><p>1,0, . According to K-way </p><p>normalized segmentation algorithm[12], graph G is subdivided into k subgraphs and L is decomposed into the summation of two matrixes as equation (8) shows. </p><p>1 2L L L= + (8) where 2L is adjacency matrix among all subgraphs and </p><p>1L is represented as: </p><p>1 ,1</p><p>1 , 21</p><p>1 .</p><p>0 00 0</p><p>00 0 k</p><p>LL</p><p>L</p><p>L</p><p> = (9) </p><p>),2,1(1 kiL i = is adjancency matrix for each </p><p>subgraph. 2L is decomposed as eigenvalue decomposition: USVL =2 . Then: </p><p>1L L U S V= + (10) Hence </p><p>1 11( ) ( )I c L I c L c U S V</p><p> = (11) Based on Sherman-Morrison lemma[13] and combination </p><p>of equation (10), we obtain: </p><p>11</p><p>1 )()( = cUSVcLIcLI </p><p>11</p><p>111</p><p>111</p><p>11 )(</p><p> += VQUcVQSUcQQ1= Q </p><p>According to equation (6) </p><p>1 1(1 )nR c Q Y+ = 1 1 1 1 1 1</p><p>1 1 1 1(1 )( ( ) )c Q cQ U S cV Q U V Q Y = + </p><p>(12) Since 1L is a diagonal matrix,</p><p>11Q can be rewrote as: </p><p>1 11 1</p><p>11,1</p><p>11,2</p><p>11,</p><p>( )</p><p>0 00 0</p><p>00 0 k</p><p>Q I cL</p><p>QQ</p><p>Q</p><p>= </p><p> = (13) </p><p>Where 1,11</p><p>,1 )( = ii cLIQ . </p><p>According to equation (12) and equation (13), solution to the inverse matrix 1Q of high rank matrix Q can be </p><p>transferred into solving low rank matrixes ),2,1(1,1 kiQ i = </p><p>and 1111 )( UcVQS If the results of ),2,1(1,1 kiQ i =</p><p>and 1111 )( UcVQS are calculated and stored previously, </p><p>1Q can be calculated instantly and the same as R . </p><p>237237</p></li><li><p>D. complexity analysis The complexity of proposed algorithm is consist of the </p><p>offline preprocessing and online computation of R. We shall discuss both the time complexity and the space complexity as following: </p><p>1) the time complexity of online computation of R: according to equation (12), the computation of R can be decomposed into 6 times matrix multiplication. </p><p>The online computation of R can meet the requirement of real time output since 11</p><p>Q , 1111 )( UcVQS and V have </p><p>already been calculated and stored during the offline preprocessing stage. </p><p>2...</p></li></ul>