Transcript
Page 1: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

Abstract— For the problem of object-based image retrieval, in this paper a novel semi-supervised multiple instance learning algorithm is presented. In the framework of multiple instance learning, this algorithm regards the whole image as a bag, and low-level visual feature of the segmented regions as instances. Firstly, the algorithm clusters the instances in two sets, one of which is composed of instances in positive bags and the other is composed of instances in negative bags, so as to find potential positive instances and feature data of bag structure. Then their respective similarities are measured by radial basis function, and an alpha coefficient is introduced in bag similarity measure as the trade-off between the two similarities. Experiments on SIVAL dataset show that this algorithm is feasible and the performance is superior to other algorithms.

I.

INTRODUCTION

ITH the rapid development of multimedia technologies and the popularity of Internet, digital images’ types and quantities explode. With the

exponential growth of images, it is becoming more and more large image databases. Content-based image retrieval (CBIR) technology has been widely concerned [1]. However, a user’s search interest is usually focused on objects in an image rather than on the entire image, and the search result of traditional CBIR is hard to be satisfied. To addressing the problem, object-based image retrieval (OBIR) has been proposed. Since OBIR concerns images that contain specific objects and is more suitable for users’ search requirements, it becomes a new research focus in CBIR [2].

If an image is segmented, the image itself and its segmented regions are respectively regarded as bag and instances. The image which contains target objects is considered as positive bag; otherwise it is considered as negative bag. By learning the positive and negative image bags to generate a classifier, OBIR becomes a multiple instance learning (MIL) problem [3]. In recent years, many MIL algorithms have been proposed. Zhang and Rahmani [4] presented diverse density (DD) function and researched on OBIR by MIL. Chen [5], [6] proposed two classic MIL algorithms named DD-SVM and MILES, in which multiple instances were nonlinearly projected and embedded and

Chao Wen, Guahua Geng and Xinyi Zhu are with the School of

Information Science and Technology, Northwest University, Xi’an, 710069, China (e-mail: [email protected]).

This work was supported by National Natural Science Foundation of China under project code 60873094 and Shaanxi Natural Science Foundation of Education under project code 2010JK852.

support vector machine (SVM) was employed to solve the classification problem of MIL.

The term Multiple Instance Learning was coined by Dietterich et al., when they were investigating the problem of drug activity prediction. In MIL, the training set is composed of many bags, and each bag contains many instances. A bag is positively labeled if it contains at least one positive instance; otherwise it is labeled as a negative bag. The task is to learn some concept from the training set for correctly labeling unseen bags. But the ambiguity problem of MIL exists, for it is hard to know which instance (or instances) is positive in positive bags. Dietterich used axis-parallel rectangles (APR) to solve the MIL problem [7].

In the framework of multiple instance learning, in this paper we proposed a novel OBIR algorithm named KP-MIL, which means finding positive instance with K-means clustering to solve MIL. Firstly, our algorithm uses K clustering in instances’ sets and finds the “actual” positive instance and bag structure, so as to handle the ambiguity problem of MIL. Then radial basis function (RBF) is used to measure the similarities of positive instances and feature data of bag structure respectively, meanwhile an alpha coefficient is introduced in bag similarity measure as the trade-off between the two similarities. Experiments on testing dataset show that this algorithm is feasible.

The rest of this paper is organized as follows. Section II introduces our algorithm. Section III tests our approach on SIVAL dataset. Section IV concludes the paper.

II. THE PROPOSED METHOD

A. Problem Description

The APR [7] algorithm employs greedy back fitting algorithm to identify an APR that covers at least one instance from each positive bag. Then it utilizes this APR to select the most discriminating features. Finally, kernel density estimation is exploited to help improve the generalization through expanding the bounds of the APR so that with high probability, new instances from positive bags will fall inside the APR. The DD [3] algorithm regards each bag as a manifold, which is composed of many instances, i.e. feature vectors. If a new bag is positive then it is believed to intersect all positive feature-manifolds without intersecting any negative feature-manifolds. Intuitively, diverse density at a point in the feature space is defined to be a measure of how many different positive bags have instances near that point, and how far the negative instances are from that point. Thus, the task of MIL is transformed to the search for a point in the feature space with the maximum diverse density. Both ARP

An Algorithm of Object-based Image Retrieval Using Multiple Instance Learning

Chao Wen, Guohua Geng , and Xinyi Zhu

W

Fourth International Workshop on Advanced Computational Intelligence Wuhan, Hubei, China; October 19-21, 2011

978-1-61284-375-9/11/$26.00 @2011 IEEE 399

Page 2: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

and DD algorithms imply that potential positive instances are of spatial aggregation. Based on this understanding, firstly we adopt clustering algorithm to obtain potential positive instances’ centers and representatives of potential positive instances in bags. Then we build a proper kernel function and use SVM to classify and retrieve.

B. Obtaining Representative of Potential Positive Instances

In MIL, assuming s is the number of training images, we donate image bags as Bi, i=1,…,s. xij donates the instance of bag Bi, in which xij�Bi, j=1,…,l and l is the number of instances in Bi and yi�{0,1}is the hiding label to Bi. Multiple instance learning is to use labeled training images

1 1{( , ),..., ( , )}=l s sB B y B y to learn a classification function, so

as to classify and retrieve the unlabeled images

1{ ,..., }=u tB B B .

We arrange instances of positive bags to form set X+ and instances of negative bags to form set X-. We employ K clustering in X+ and X- respectively. By using K clustering, we obtain positive instances’ cluster centers

1 2{ , ,...., }+ + + += kO O O O and negative instances’ cluster centers

1 2{ , ,...., }− − − −= kO O O O . If an instance of Bi is within a positive

instances’ cluster center, it is possible to be the positive instance candidate in Bi due to the spatial aggregation of positive instances. Obviously, a positive instances’ cluster center, which has more instances within it, is more likely to be the “actual” positive instance. For that, a positive instances’ cluster center, which has maximum instances within it, is the potential positive instances’ center. If only one instance of Bi is within a positive instances’ center, this instance is the potential positive instance in Bi. If none of instances of Bi is within a positive instances’ center, the instance that having the nearest distance with the center is selected to be the potential positive instance in Bi. If more than one instance of Bi are within a positive instances’ center, the instance that having the nearest distance with the center is selected to be the representative of potential positive instances in Bi. The vector O=O+�O-, which is formed by clustering, expresses the data density. Thus we can estimate bag structure by comparing the similarity of distances between instances and vector O. Finally, the bag structure is obtained by calculating the minimum distance X.

This paper employs K-means algorithm to cluster. K-means [8], which was proposed by MacQueen, is a partition-based clustering algorithm. K-means clustering algorithm performs iteratively the partition step and new cluster center generation step until convergence. The core idea of this algorithm is to find K cluster centers 1 2{ , ,...., }kO O O and makes each datum

point Xi have minimum sum of squared distance to the nearest cluster center Oj. The formula of sum of squared distance is

( )

1 1= =

= −� �k n

ji j

j i

J X O .

C. Building Kernel Function

Now we analyze how to build kernel function. Giving two bags Bi and Bj, their instances are 1{ ,..., }=i i imB x x and

1{ ,..., }=j j jmB x x . K-means algorithm is used to cluster in set

X+ and set X-, and two sets’ cluster centers are

1 2{ , ,...., }+ + + += kO O O O and 1 2{ , ,...., }− − − −= kO O O O . If +

lO

contains maximum positive instances, +

lO is the potential

positive instances’ center. If arg min+

= −ip it lt

x x O and

arg min+

= −jq js ls

x x O , then ipx is the representative of

potential positive instances in Bi and jpx is the representative

of potential positive instances in Bj. Since O=O+�O-, the newly generated vectors of bag structure are

1~2 , 1~min

= == −l

i it ll k t my x O and

1~2 , 1~min

= == −l

j js ll k s m

y x O . Then

kernel functions k1 and k2 are used to measure the respective similarities of ipx , jpx and l

iy , ljy . Finally, an alpha

coefficient is introduced in bag similarity measure as the trade-off between the two similarities. The final similarity of bag is defined as:

1 2( , ) (1 ) ( , ),1 0α α α= + − > >l lij ip jq i jk k x x k y y

Theorem 1: Linear combination of multiple kernel functions with direct proportion coefficient is still a kernel function.

Proof: Suppose ki is a kernel function and 0, 1...α ≥ =i i n ,

then need to prove that α=� i ik k is a kernel function.

If ki is a kernel function, then there must be a matrix, which is symmetric with semi-definite matrix Mi, obviously

α� i iM is still a symmetric semi-definite matrix.

Proposition is proved. From Theorem 1 we know that bag similarity function k we

defined is still a kernel function. The procedure of obtaining representatives of potential positive instances and building kernel function is shown in Algorithm 1.

Algorithm 1: Building kernel function Input: Labeled training images 1 1{( , ),..., ( , )}=l s sB B y B y ,

unlabeled testing images 1{ ,..., }=u tB B B , cluster K and

coefficientα ; Output: Kernel function k; Initialize: K=0. Step1: Arrange positive bags Bi which are labeled yi=1 to

form set X+; Step2: Select K instances 1{ }+

== ki iO x arbitrarily in X+ as

initial cluster centers; Step3: Calculate the distances between each instance x and

O+, allocate x to the cluster center having the nearest distance with it.

Step4: Employ K-means algorithm to compute K cluster centers;

Step5: Compare with previous K cluster centers, if cluster center changes, go to Step3, otherwise go to Step6;

Step6: Compute the number of instances in each cluster

center, then+

lO having maximum instances is the potential

positive instances’ center;

400

Page 3: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

Step7: Suppose B=Bl�Bu, calculate the distances between

each Bi and +

lO , instance ∈ip ix B having nearest distance with +

lO is the potential positive instance;

Step8: Similar to the above steps, generate negative bags’ cluster centers O-;

Step9: Suppose O=O+ � O-, calculate the distances between ∈iB B and ∈jO O , use the minimum distance

1~2 , 1~min

= == −l

i it ll k t m

y x O to form vector Xi;

Step10: Select two bags Bi, Bj from set B arbitrarily, compute kernel function:

1 2( , ) (1 ) ( , )α α= + − l lij ip jq i jk k x x k y y �

Step11: Output kernel function matrix k.

D. Classification Using SVM

Based on the idea of maximum margin, SVM classification is to find the largest interval of separating hyper-plane. Its objective function is as follow:

21min ( , )

2: ( ( ) ) 1

φ ω ξ ω ξ

ω ϕ ξ

�= +�

�� + ≥ −�

� i

Ti i i

c

subject to y x b�� � � � � (1)

In (1), { , }, 1, ,=i ix y i m� refers to training samples, m is

the total number of training samples. ω is normal vector of classification plane, ξi is slack variable. c is penalty factor, if

= ∞c , then it is linearly separable, else it is linearly inseparable. By dual transforming, the above optimization problem becomes as:

1max ( ) ( , )

2: 0

0

ψ α α α α

α

α

�= −�

�=�

� ≤ ≤��

� ���

i i j i j i j

i i

i

y y k x x

subject to y

c

(2)

In (2), k is kernel function. If the optimal solutions of (2) are *α i and *b , then classification function is defined below:

* *( ) ( ( , ) )α= +� i i if x sgn y k x x b (3)

Now we summarize the proposed KP-MIL algorithm as follow�

Algorithm 2: KP-MIL Input: Training images 1 1{( , ),..., ( , )}=l s sB B y B y , images

waiting for retrieval 1{ ,..., }=u tB B B , cluster K, and

coefficientα ;

Output: Labels 1{ } =t

i iy for images 1{ ,..., }=u tB B B ;

Step1: Call Algorithm 1, compute kernel function matrix k; Step2: Train SVM classifier:

�1 Input the values of cluster K and direct proportion coefficientα ;

�2 Obtain solutions of (2).

Step3: Use (3) to solve and obtain labels 1{ *} { 1, 1}= ∈ − +ti iy

for�Bu�

III. EXPERIMENTS

A. Dataset and Environment

Our testing dataset is SIVAL. This benchmark emphasizes the task of OBIR. SIVAL includes 25 different image categories with 60 images for each category. The categories consist of complex objects photographed against 10 different and highly diverse backgrounds. The objects may occur anywhere spatially in the image and typically occupy 10-15% of the image. Each image in SIVAL is segmented. The bag corresponds to an image including segmented regions as instances. There are more than 30 instances for each bag, and each instance is represented by a 30-dimensional feature [4].

LIBSVM is used for SVM classification. We use one against one approach to classification. During each trial, 10 positive pictures are randomly selected from one category, and other 10 negative pictures are randomly selected as background pictures from the other 24 categories. We use RBF to be k1 and k2.

B. Performance and Analysis

KP-MIL algorithm needs preset the values of cluster K and coefficient α . In order to verify the influence of K and α on retrieval accuracy, firstly we fix 0.1α = and change K value from 1 to 9 (step by 1) and analyze the influence of K on retrieval performance individually. Then we fix the optimal value of K we obtained, and change α value from 0.1 to 0.9(step by 0.1). The experiment is based on images from GreenTeaBox category and is performed according to Algorithm 2 in Section II. Fig. 1 shows the average AUC values of retrieval results, here we fix 0.1α = and change values of K from 1 to 9.

0.720.740.760.780.8

0.820.840.86

1 2 3 4 5 6 7 8 9 10

AU

C

K

greenteabox

Fig. 1. Influence of the changes of K

In Fig. 1, while K is from 1 to 9, accordingly AUC value is between 0.73 and 0.84. When K=5, AUC has maximum value 0.84. This shows that K has greater influence on KP-MIL’s performance. The main reason is that in OBIR an object often has concurrence with other specific objects. If the number of clusters is equal to the number of concurrences, the specific objects should occur at higher frequencies after clustering. Therefore, after clustering the cluster center having maximum instances must be the potential positive instances’ center.

Fig. 2 shows that when we fix K=5 and changeα from 0.1 to 0.9, accordingly AUC value is between 0.81 and 0.86. When 0.4α = , AUC has maximum value 0.86.

401

Page 4: [IEEE 2011 Fourth International Workshop on Advanced Computational Intelligence (IWACI) - Wuhan, China (2011.10.19-2011.10.21)] The Fourth International Workshop on Advanced Computational

0.80.820.840.860.88

AU

C

alpha Fig. 2. Influence of the changes of α

For further verifying the performance of KP-MIL (set K=5, 0.4α = ), we compare the performance of KP-MIL with MILES and ACCIO!. Our test has been done by the experimental method in Section III. We repeat each trial 30 times and average AUC values are shown in Table I. In the following table, numbers(from 1 to 25) in column represent images of SIVAL: "FabricSoftenerBox", "CheckeredScarf", "FeltFlowerRug", "CokeCan", "WD40Can", "AjaxOrange", "DirtyRunningShoe", "CandleWithHolder", "GoldMedal", "GreenTeaBox", "CardboardBox", "SpriteCan", "SmileyFaceDoll", "DirtyWorkGloves", "StripedNoteBook", "DataMiningBook", "BlueScrunge", "TranslucentBowl", "RapBook", "Apple", "GlazedWoodPot", "WoodRollingPin", "JuliesPot", "Banana", "Large-Spoon".

TABLE I AVERAGE AUC VALUES WITH 95%-CONFIDENCE INTERVALS BASED ON 30

TIMES TRIALS No. KP-MIL MILES ACCIO! 1 97.6±0.3 97.1 ± 0.8 86.3 ± 3.3 2 95.2±0.5 93.9 ± 1.3 90.2 ± 1.7 3 95.9±0.9 94.1 ± 0.8 87.9 ± 1.9 4 93.6±0.6 92.8 ± 0.9 81.8 ± 3.1 5 90.9±0.9 88.5 ± 2.1 81.2 ± 2.5 6 91.3±1.8 90.6 ± 2.5 76.8 ± 3.1 7 90.2±1.3 85.6 ± 1.6 83.3 ± 1.8 8 85.6±1.8 84.2± 2.2 69.8 ± 2.7 9 82.2±1.4 80.9 ± 2.6 77.9 ± 2.4

10 84.9±2.6 91.3 ± 2.3 86.3 ± 3.5 11 82.8±1.2 82.2 ± 2.8 69.9 ± 2.7 12 82.6±1.1 81.4 ± 2.1 71.1 ± 2.9 13 80.6±1.9 77.8 ± 2.5 77.6 ± 3.8 14 82.3±1.3 78.1 ± 3.5 65.4 ± 1.7 15 76.2±2.1 73.1 ± 3.1 74.8 ± 3.7 16 72.8±2.3 67.7 ± 2.6 71.2 ± 3.1 17 74.6±2.1 73.6 ± 2.2 69.2 ± 3.7 18 73.8±2.9 72.2 ± 3.6 77.8 ± 2.3 19 71.2±2.1 69.1 ± 3.2 62.9 ± 3.2 20 70.1±1.8 61.9± 2.5 63.8 ± 1.8 21 70.3±2.2 68.3 ± 3.2 72.8 ± 2.1 22 70.2±2.6 64.8 ± 2.4 62.4 ± 3.7 23 70.0±3.5 78.8± 2.7 79.6 ± 2.7 24 67.9±1.8 62.9 ± 2.9 66.1 ± 1.9 25 62.1±1.6 58.9 ± 1.9 57.2 ± 2.7

Average 80.6 78.7 74.5

Boldfaced values in Table I are the best classification results. Table I shows that KP-MIL is generally superior to other MIL algorithms. The performance of KP-MIL is good for the following two reasons. First, K-means algorithm we employed can find the potential positive instances’ centers and discriminate the representative of potential positive instances with other instances in bags. Second, RBF, which was used to measure the bag similarity, is the trade-off between the influences of bag instances on kernel. Despite MILES maps each bag into a feature space defined by the instances in the training bags via an instance similarity

measure, its performance is greatly influenced by the spatial distribution of instances in bags. Thus, if the spatial distribution of instances is not good, the performance of MILES is poor. Meanwhile, since ACCIO! is a DD-based algorithm, it exists the problem of optimal solutions falling into a local extremum. Its final classification result is not accurate yet.

IV. CONCLUSION

We have presented a novel image retrieval algorithm named KP-MIL in this paper. In the framework of MIL, the algorithm uses K-means clustering to find positive instances and feature data of bag structure, then their respective similarities are measured by RBF kernel function, and direct proportion coefficient α is introduced as the trade-off between the two similarities. Compared with traditional DD algorithms, our KP-MIL algorithm has stronger ability on finding globally optimal solution with higher efficiency.

Our KP-MIL still has some questions, which could lead to future studies. For example, how to quickly find the proper cluster parameter K and proportion coefficient α , due to the fact that the performance of KP-MIL is greatly influenced by K and α .

REFERENCES [1] R. Datta, J. Li, J. Z. Wang, “Content-based image retrieval-approaches

and trends of the new AgeER,” Proceedings of the 7th International Workshop on Multimedia Information Retrieval in conjunction with ACM International Conference on Multimedia, Singapore, ACM Press, pp. 253-262, Nov. 2005.

[2] R. Datta, D. Joshi, J. Li et al, “Image retrieval: ideas, influences, and trends of the new age,” ACM Computing Surveys, vol. 40, no. 2, pp. 1-65, 2008.

[3] Q. Zhang, S. A. Goldman, “Content-based image retrieval using multiple-instance learning,” Proceedings of the 19th International Conference on Machine Learning, Sydney, Australia, Morgan Kauman Publishers Inc., pp. 682-689, 2002.

[4] R. Rahmani, S. A. Goldman, “Localized content-based image retrieval,” Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, Singapore, ACM Press, pp. 227-236, 2005.

[5] Y. X. Chen, J. Z. Wang, “Image categorization by learning and reasoning with regions,” Journal of Machine Learning Research, vol. 5, no. 8, pp. 913-939, 2004.

[6] Y. X. Chen, J. B. Bi, J. Z. Wang, “MILES: Multiple-Instance Learning via Embedded Instance Selection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 1931-1947, 2006.

[7] T. G. Dietterich, R. H. Lathrop, T. Lozano-Pérez, “Solving the multiple instance problem with axis-parallel rectangles,” Artificial Intelligence, vol. 89, no. 12, pp. 31-71, 1997.

[8] J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, University of California Press, pp. 281-297, 1967.

402


Top Related