lost in quantization: improving particular object retrieval in large scale image databases

Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases

CVPR 2008James PhilbinOndˇrej ChumMichael Isard

Josef SivicAndrew Zisserman

[7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.

Outline

• Introduction• Methods in this paper• Experiment & Result• Conclusion

Introduction• Goal– Specific object retrieval from an image database

• For large database– It’s achieved by systems that are inspired by text retrieval (visual words).

Flow

1. Get features– SIFT

2. Cluster– Approximate k-means

3. Feature quantization– Visual word– Soft-assignment (query)

4. Re-ranked– RANSAC

5. Query expansion– Average query expansion

Outline


Feature

• SIFT

8

Quantization (visual word)

• Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]• Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]

Soft-assignment of visual words

• Matching two image features in bag-of-visual-words in hard-assignment– Yes if assigned to the same visual word– No otherwise

• Sort-assignment– A weighted combination of visual words


A~E represent cluster centers (visual words)points 1–4 are features


• – d is the distance from the cluster center to the

descriptor• In practice is chosen so that a substantial

weight is only assigned to few cells• The essential parameters– the spatial scale – r, nearest neighbors considered


• the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized

TF–IDF weighting

• Standard index architecture

•

TF–IDF weighting

• tf– 100 vocabularies in a document, ‘a’ 3 times– 0.03 (3/100)

• idf– 1,000 documents have ‘a’, total number of

documents 10,000,000– 9.21 ( ln(10,000,000 / 1,000) )

• if-idf = 0.28( 0.03 * 9.21)

TF–IDF weighting

• In this paper– For the term frequency(tf)• we simply use the normalized weight value for each

visual word. – For the inverse document(idf)• feature measure, we found that counting an occurrence

of a visual word as one, no matter how small its weight, gave the best results

Re-ranking

• RANSAC– Affine transform Θ : Y = AX+b

• Algorithm– 1. Randomly choose n points– 2. Use n points to find Θ – 3. Input N-n points to Θ– 4. How many inlier– Repeat 1~4 K times– Pick the best Θ

Re-ranking

• In this paper– No only counting the number of inlier

correspondences ,but also scoring function, or cosine =

Average query expansion

• Obtain top (m < 50) verified results of original query

• Construct new query using average of these results

•

– where d0 is the normalized tf vector of the query region– di is the normalized tf vector of the i-th result

• Requery once

Outline


Dataset

• Crawled from Flickr & high resolution(1024x768)• Oxford buildings– About 5,062 high resolution(1024x768) images– using 11 landmarks as queries

• Paris– Used for quantization– 6,300 images

• Flickr1– 145 most popular tags– 99,782 images

Dataset

Dataset

• Query– 55 queries: 5 queries for each of 11 landmarks

Baseline

• Follow the architecture of previous work [15]• A visual vocabulary of 1M words is generated

using an approximate k-means

[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007

24

Evaluation

• Compute Average Precision (AP) score for each of the 5 queries for a landmark– Area under the precision-recall curve• Precision = RPI / TNIR• Recall = RPI / TNPCRPI = retrieved positive images

TNIR = total number of images retrievedTNPC = total number of positives in the corpus

• Average these to obtain a Mean Average Precision (MAP)

Recall

Precision

Evaluation

• Dataset– Only the Oxford (D1) 5,062 images– Oxford (D1) + Flickr1 (D2) 104,844 images

• Vector quantizers– Oxford or Paris

Result

[14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree.CVPR, 2006.

[18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.

[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007.

Parameter variation Comparison with other methods

Result

Effect of vocabulary size

Spatial verification

Result

Query expansionScaling-up to 100K images

Result

Result

ashmolean_3 goes from 0.626 AP to 0.874 APchrist_church_5 increases from 0.333 to 0.813 AP

Outline


Conclusion

• A new method of visual word assignment was introduced:– descriptor-space soft-assignment

• It improves that descriptor lost in the quantization step of previously published methods.

lost in quantization: improving particular object retrieval in large scale image databases

Documents