lec07 aggregation-and-retrieval-system

67
Image Analysis & Retrieval CS/EE 5590 Special Topics (Class Ids: 44873, 44874) Fall 2016, M/W 4-5:15pm@Bloch 0012 Lec 07 Feature Aggregation and Image Retrieval System Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: [email protected] , Ph: x 2346. http://l.web.umkc.edu/lizhu p.1 Image Analysis & Retrieval, 2016

Upload: united-states-air-force-academy

Post on 23-Jan-2018

299 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Lec07 aggregation-and-retrieval-system

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 07

Feature Aggregation and Image Retrieval System

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: [email protected], Ph: x 2346.

http://l.web.umkc.edu/lizhu

p.1Image Analysis & Retrieval, 2016

Page 2: Lec07 aggregation-and-retrieval-system

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.2

Page 3: Lec07 aggregation-and-retrieval-system

Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian The scale is controlled by 𝜎

Characteristic Scale:

Image Analysis & Retrieval, 2016 p.3

2

2

2

22

y

g

x

gg

𝑔 = 𝑒− 𝑥+𝑦 2

2𝜎

r

image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟

characteristic scale

Page 4: Lec07 aggregation-and-retrieval-system

SIFT

Use DoG to approximate LoG Separable Gaussian filter

Difference of image instead of difference of Gaussian kernel

Image Analysis & Retrieval, 2016 p.4

LoG

Scale space construction By Gaussian Filtering, and Image Difference

Page 5: Lec07 aggregation-and-retrieval-system

Peak Strength & Edge Removal

Peak Strength: Interpolate true DoG response and pixel location by Taylor

expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set

Image Analysis & Retrieval, 2016 p.5

Page 6: Lec07 aggregation-and-retrieval-system

Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the

gradients closer to the center

Image Analysis & Retrieval, 2016 p.6

Page 7: Lec07 aggregation-and-retrieval-system

SIFT Matching and Repeatability Prediction

SIFT Distance

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

Image Analysis & Retrieval, 2016 p.7

Combined scale/peak strength pmf

𝑑(𝑠11, 𝑠𝑘∗

2 )

𝑑(𝑠11, 𝑠𝑘

2)≤ 𝜃

Page 8: Lec07 aggregation-and-retrieval-system

Box Fitler – CABOX work

Basic Idea: Approximate DoG with linear combination of box filters

min.𝒉

𝒈− 𝐵 ∙ 𝒉 𝐿22 + 𝒉 𝐿1

Solution by LASSO

Image Analysis & Retrieval, 2016 p.8

= h1* h2*+ + …

Page 9: Lec07 aggregation-and-retrieval-system

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.9

Page 10: Lec07 aggregation-and-retrieval-system

Image Matching/Retrieval System

SIFT is a sub-image level feature, we actually care more on how SIFT match will translate into image level matching/retrieval accuracy

Say if we can compute a single distance from a collection of features:

Then for a data base of n images, we can compute an n x n distance matrix This gives us full information of the performance of this

feature/distance system

How to characterize the performance of such image matching and retrieval system ?

Image Analysis & Retrieval, 2016 p.10

𝑑 𝐼1, 𝐼2 =

𝑘

𝛼𝑘𝑑(𝐹𝑘1, 𝐹𝑘

2)

𝐷𝑖 ,𝑘= 𝑑(𝐼𝑗 , 𝐼𝑘)

Page 11: Lec07 aggregation-and-retrieval-system

Thresholding for Matching

Basically, for any pair of Images (documents, in IR jargon), we declare

Then for each possible image pair, or pairs we care, for a given threshold t, there will be 4 possible consequences TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;

FP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) >= t;

TN pair: {Ij, Ik} declared non-matching pairs, d(Ij, Ik) >= t;

FN pair: {Ij, Ik} declared non- matching pairs, d(Ij, Ik) < t;

Image Analysis & Retrieval, 2016 p.11

𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ, 𝑖𝑓 𝑑 𝐼𝑗 , 𝐼𝑘 < 𝑡

𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Page 12: Lec07 aggregation-and-retrieval-system

Matching System Performance

True Positive Rate/Precision: Out of retrieved matching pairs, how many are true matching

pairs

For all matching pairs with distance < t

False Positive Rate:

Out of retrieved matching pairs, how many are actually negative, false matchings

Image Analysis & Retrieval, 2016 p.12

𝑇𝑃𝑅 =𝑡𝑝

𝑡𝑝 + 𝑓𝑛

𝐹𝑃𝑅 =𝑓𝑝

𝑓𝑝 + 𝑡𝑛

Page 13: Lec07 aggregation-and-retrieval-system

TPR-FPR

Definition:

TP rate = TP/(TP+FN)

FP rate = FP/(FP+TN)

From the actual value

point of view

Image Analysis & Retrieval, 2016 p.13

Page 14: Lec07 aggregation-and-retrieval-system

ROC curve(1)

ROC = receiver operating characteristic

Y:TP rate

X:FP rate

Image Analysis & Retrieval, 2016 p.14

Page 15: Lec07 aggregation-and-retrieval-system

ROC curve(2)

Which method (A or B) is better?compute ROC area: area under ROC

curve

Image Analysis & Retrieval, 2016 p.15

Page 16: Lec07 aggregation-and-retrieval-system

Precision, Recall, F-measure

Precision = TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)

Precision:is the probability that a

retrieved document is relevant.

Recall:is the probability that a

relevant documentis retrieved in a search.

Image Analysis & Retrieval, 2016 p.16

Page 17: Lec07 aggregation-and-retrieval-system

Matlab Implementation

We will compute all image pair distances D(j,k)

How do we compute the TPR-FPR plot ? Understand that TPR and

FPR are actually function of threshold t,

Just need to parameterize TPR(t) and FPR(t), and obtaining operating points of meaningful thresholds, to generate the plot.

Matlab Implementation: [tp, fp, tn,

fn]=getPrecisionRecall()

Image Analysis & Retrieval, 2016 p.17

d_min = min(min(d0), min(d1));

d_max = max(max(d0), max(d1));

delta = (d_max - d_min) / npt;

for k=1:npt

thres = d_min + (k-1)*delta;

tp(k) = length(find(d0<=thres));

fp(k) = length(find(d1<=thres));

tn(k) = length(find(d1>thres));

fn(k) = length(find(d0>thres));

end

if dbg

figure(22); grid on; hold on;

plot(fp./(tn+fp), tp./(tp+fn), '.-r',

'DisplayName', 'tpr-fpr');legend();

end

Page 18: Lec07 aggregation-and-retrieval-system

TPR-FPR

Image Matching performance are characterized by functions TPR(FPR)

Retrieval set: we want high Precision, Short List: High Recall.

Image Analysis & Retrieval, 2016 p.18

Page 19: Lec07 aggregation-and-retrieval-system

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.19

Page 20: Lec07 aggregation-and-retrieval-system

Why Aggregation ?

What (Local) Interesting Points features bring us ? Scale and rotation invariance in the form of nk x d:

Un-cerntainty of the number of detected features nk, at query time

Permutation along rows of features are the same representation.

Problems: The feature has state, not able to draw decision boundaries,

Not directly indexable/hashable

Typically very high dimensionality

Image Analysis & Retrieval, 2016 p.20

𝑆𝑘| [𝑥𝑘 , 𝑦𝑘, 𝜃𝑘 , 𝜎𝑘, ℎ1, ℎ2, … , ℎ128] , 𝑘 = 1. . 𝑛

Page 21: Lec07 aggregation-and-retrieval-system

Decision Boundary in Matching

Can we have a decision boundary function for interesting points based representation ?

Image Analysis & Retrieval, 2016 p.21

…..

Page 22: Lec07 aggregation-and-retrieval-system

Curse of Dimensionality in Retrieval

What feature dimensions will do to the retrieval efficiency… Looking at retrieval 99% of per dimension locality, and the

total volume covered plot.

Matlab: showDimensionCurse.m

Image Analysis & Retrieval, 2016 p.22

+

Page 23: Lec07 aggregation-and-retrieval-system

Aggregation – 30,000ft view

Bag of Words Compute k centroids in feature space, called visual words

Compute histogram

k x1 feature, hard assignment

VLAD Compute centroids in feature space

Compute aggregaged difference w.r.t the centroids

k x d feature, soft assignment

Fisher Vector Compute a Gaussian Mixture Model (GMM) with 2nd order info

Compute the aggregated feature w.r.t the mean and covariance of GMM

2 x k x d feature

AKULA Adaptive centroids and feature count

Improved with covariance ?

Image Analysis & Retrieval, 2016 p.23

0.5

0.4 0.05

0.05

Page 24: Lec07 aggregation-and-retrieval-system

Visual Key Words: main idea

Extract some local features from a number of images …

Image Analysis & Retrieval, 2016 24

e.g., SIFT descriptor

space: each point is 128-

dimensional

Slide credit: D. Nister

Page 25: Lec07 aggregation-and-retrieval-system

Visual Key Words: main idea

Image Analysis & Retrieval, 2016 25Slide credit: D. Nister

Page 26: Lec07 aggregation-and-retrieval-system

Visual words: main idea

Image Analysis & Retrieval, 2016 26

Slide credit: D. Nister

Page 27: Lec07 aggregation-and-retrieval-system

Visual words: main idea

Image Analysis & Retrieval, 2016 27

Slide credit: D. Nister

Page 28: Lec07 aggregation-and-retrieval-system

Slide credit: D. Nister

Visual Key Words

Image Analysis & Retrieval, 2016 28

Each point is a local

descriptor, e.g. SIFT

vector.

Page 29: Lec07 aggregation-and-retrieval-system

Slide credit: D. Nister

Image Analysis & Retrieval, 2016 29

Page 30: Lec07 aggregation-and-retrieval-system

Visual words

Example: each group of patches belongs to the same visual word

Image Analysis & Retrieval, 2016 30

Figure from Sivic & Zisserman, ICCV 2003

Page 31: Lec07 aggregation-and-retrieval-system

Visual words

Image Analysis & Retrieval, 2016 3131

Source credit: K. Grauman, B. Leibe

• More recently used for describing scenes and objects for the sake of indexing or classification.

Sivic & Zisserman 2003;

Csurka, Bray, Dance, & Fan

2004; many others.

Page 32: Lec07 aggregation-and-retrieval-system

Object Bag of ‘words’

ICCV 2005 short course, L. Fei-Fei

Bag of Words

Image Analysis & Retrieval, 2016 32

Page 33: Lec07 aggregation-and-retrieval-system

BoW Examples

Illustration

Image Analysis & Retrieval, 2016 33

Page 34: Lec07 aggregation-and-retrieval-system

Bags of visual words

Summarize entire image based on its distribution (histogram) of word occurrences.

Analogous to bag of words representation commonly used for documents.

Image Analysis & Retrieval, 2016 34

Image credit: Fei-Fei Li

Page 35: Lec07 aggregation-and-retrieval-system

Texture Retrieval

Texons…

Image Analysis & Retrieval, 2016 35

Universal texton dictionary

histogram

Source: Lana Lazebnik

Page 36: Lec07 aggregation-and-retrieval-system

BoW Distance Metrics

Rank images by normalized scalar product between their (possibly weighted) occurrence counts---nearest neighbor search for similar images.

Image Analysis & Retrieval, 2016 p.36

[5 1 1 0][1 8 1 4]

djq

Page 37: Lec07 aggregation-and-retrieval-system

Inverted List

Image Retrieval via Inverted List

Image Analysis & Retrieval, 2016 37

Image credit: A. Zisserman

Visual

Word

number

List of image

numbers

When will this give us a significant gain in efficiency?

Page 38: Lec07 aggregation-and-retrieval-system

Indexing local features: inverted file index

For text documents, an efficient way to find all pageson which a word occurs is to use an index…

We want to find all images in which a feature occurs.

We need to index each feature by the image it appears and also we keep the # of occurrence.

Image Analysis & Retrieval, 2016 38

Source credit : K. Grauman, B. Leibe

Page 39: Lec07 aggregation-and-retrieval-system

TF-IDF Weighting

Term Frequency – Inverse Document Frequency Describe image by frequency of each visual word within

it, down-weight words that appear often in the database (Standard weighting for text retrieval)

Image Analysis & Retrieval, 2016 p.39

Total number of

words in database

Number of

occurrences of

word i in whole

database

Number of

occurrences of

word i in

document d

Number of

words in

document d

Page 40: Lec07 aggregation-and-retrieval-system

BoW Use Case with Spatial Localization

Collecting words within a query region

Image Analysis & Retrieval, 2016 40

Query region:

pull out only the SIFT

descriptors whose

positions are within the

polygon

Page 41: Lec07 aggregation-and-retrieval-system

Image Analysis & Retrieval, 2016 41

Page 42: Lec07 aggregation-and-retrieval-system

BoW Patch Search

Localizing the BoW representation

Image Analysis & Retrieval, 2016 42

Page 43: Lec07 aggregation-and-retrieval-system

Localization with BoW

Image Analysis & Retrieval, 2016 43

Page 44: Lec07 aggregation-and-retrieval-system

Hiearchical Assignment of Histogram

Tree construction:

Image Analysis & Retrieval, 2016 44

[Nister & Stewenius, CVPR’06]

Page 45: Lec07 aggregation-and-retrieval-system

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 45

[Nister & Stewenius, CVPR’06]

Page 46: Lec07 aggregation-and-retrieval-system

46

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 46Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 47: Lec07 aggregation-and-retrieval-system

47

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 47Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Page 48: Lec07 aggregation-and-retrieval-system

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 48

[Nister & Stewenius, CVPR’06]

Page 49: Lec07 aggregation-and-retrieval-system

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 49

[Nister & Stewenius, CVPR’06]

Page 50: Lec07 aggregation-and-retrieval-system

50

Vocabulary Tree

Recognition

Image Analysis & Retrieval, 2016 50Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

RANSAC

verification

Page 51: Lec07 aggregation-and-retrieval-system

Vocabulary Tree: Performance

Evaluated on large databases Indexing with up to 1M images

Online recognition for databaseof 50,000 CD covers Retrieval in ~1s

Find experimentally that large vocabularies can be beneficial for recognition

Image Analysis & Retrieval, 2016 51

[Nister & Stewenius, CVPR’06]

Page 52: Lec07 aggregation-and-retrieval-system

Larger vocabularies

can be

advantageous…

But what happens if it

is too large?

Visual Word Vocabulary Size

Performance w.r.t vocabulary size

Image Analysis & Retrieval, 2016 52

Page 53: Lec07 aggregation-and-retrieval-system

Bags of words: pros and cons

Good:+ flexible to geometry / deformations / viewpoint+ compact summary of image content+ provides vector representation for sets+ Inverted List implementation offers practical solution

against large repository

Bad:- Lost of information at quantization and histogram

generation- basic model ignores geometry – must verify afterwards,

or encode via features- background and foreground mixed when bag covers

whole image- interest points or sampling: no guarantee to capture

object-level parts

Image Analysis & Retrieval, 2016 53Source credit : K. Grauman, B. Leibe

Page 54: Lec07 aggregation-and-retrieval-system

Can we improve BoW ?

• E.g. Why isn’t our Bag of Words classifier at 90% instead of 70%?

• Training Data

– Huge issue, but not necessarily a variable you can manipulate.

• Learning method

– BoW is on top of any feature scheme

• Representation

– Are we losing too much info in the process ?

Image Analysis & Retrieval, 2016 p.54

Page 55: Lec07 aggregation-and-retrieval-system

Standard Kmeans Bag of Words

BoW revisited

Image Analysis & Retrieval, 2016 p.55

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Page 56: Lec07 aggregation-and-retrieval-system

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics/information ?

Image Analysis & Retrieval, 2016 p.56

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Page 57: Lec07 aggregation-and-retrieval-system

We already looked at the Spatial Pyramid/Pooling

Spatial Pooling

Image Analysis & Retrieval, 2016 p.57

level 2: 4x4level 0: 1x1 level 1: 2x2

Key take away: Multiple assignment ? Soft Assignment ?

Page 58: Lec07 aggregation-and-retrieval-system

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics? For instance:• mean of local descriptors

Image Analysis & Retrieval, 2016 p.58

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Page 59: Lec07 aggregation-and-retrieval-system

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics? For instance:• mean of local descriptors

• (co)variance of local descriptors

Image Analysis & Retrieval, 2016 p.59

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Page 60: Lec07 aggregation-and-retrieval-system

Simple case: Soft Assignment

Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

Image Analysis & Retrieval, 2016 p.60

Page 61: Lec07 aggregation-and-retrieval-system

Simple case: Soft Assignment

Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

This is fast and easy to implement (try it for Project 3!) but it does have some downsides for image retrieval –the inverted file index becomes less sparse.

Image Analysis & Retrieval, 2016 p.61

Page 62: Lec07 aggregation-and-retrieval-system

A first example: the VLAD

Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :

• assign:

• compute:

• concatenate vi’s + normalize

Image Analysis & Retrieval, 2016 p.62

Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.

3

x

v1 v2v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

Page 63: Lec07 aggregation-and-retrieval-system

A first example: the VLAD

A graphical representation of

Image Analysis & Retrieval, 2016 p.63

Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.

Page 64: Lec07 aggregation-and-retrieval-system

VL_FEAT Implementation

Matlab:

Image Analysis & Retrieval, 2016 p.64

function [vc]=vladSiftEncoding(sift,

codebook)

dbg=1;

if dbg

if (0) % init VL_FEAT, only need

to do once

run('../../tools/vlfeat-

0.9.20/toolbox/vl_setup.m');

end

im = imread('../pics/flarsheim-

2.jpg');

[f, sift] =

vl_sift(single(rgb2gray(im))); sift =

single(sift');

[indx, codebook] = kmeans(sift,

16);

% make sift # smaller

sift = sift(1:800,:);

end

[n, kd]=size(sift);

[m, kd]=size(codebook);

% compute assignment

dist = pdist2(codebook, sift);

mdist = mean(mean(dist));

% normalize the heat kernel s.t. mean

dist is mapped to 0.5

a = -log(0.5)/mdist;

indx = exp(-a*dist);

vc=vl_vlad(sift', codebook', indx);

if dbg

figure(41); colormap(gray);

subplot(2,2,1); imshow(im);

title('image');

subplot(2,2,2); imagesc(dist);

title('m x n distance');

subplot(2,2,3); imagesc(indx);

title('m x n assignment');

subplot(2,2,4); imagesc(reshape(vc,

[m, kd]));title('vlad code');

end

Page 65: Lec07 aggregation-and-retrieval-system

VLAD Code

What are the tweaks ? Code book design

Soft Assignment options

Image Analysis & Retrieval, 2016 p.65

Page 66: Lec07 aggregation-and-retrieval-system

References

Vocabulary Tree: David Nistér, Henrik Stewénius: Scalable Recognition with a Vocabulary

Tree. CVPR (2) 2006: 2161-2168

VLAD: Herve Jegou, Matthijs Douze, Cordelia Schmid:

Improving Bag-of-Features for Large Scale Image Search. International Journal of Computer Vision 87(3): 316-336 (2010)

Fisher Vector: Florent Perronnin, Jorge Sánchez, Thomas Mensink:

Improving the Fisher Kernel for Large-Scale Image Classification. ECCV (4) 2010: 143-156

AKULA: Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park:

AKULA - Adaptive Cluster Aggregation for Visual Search. DCC 2014: 13-22

Image Analysis & Retrieval, 2016 p.66

Page 67: Lec07 aggregation-and-retrieval-system

Lec 07 Summary

Image Retrieval System Metric What is true positive, false positive, true negative, false

negative ?

What is precision, recall, F-score ?

Why Aggregation ? Decision boundary

Indexing/Hashing

Bag of Words A histogram with bins visual words

Variations: hierarchical assignment with vocabulary tree

Implementation: Inverted List

VLAD Richer encoding of aggregated info

Soft assignment of features to codebook bins

Vectorized representation – no need for inverted list

Image Analysis & Retrieval, 2016 p.67