8.2 bag of visual words16385/s17/slides/8.2_bag_of_visual_words.pdf · a collection of local...

49
Bag-of-Visual-Words 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Upload: others

Post on 05-Feb-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Bag-of-Visual-Words16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

Page 2: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

What object do these parts belong to?

Page 3: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

a collection of local features (bag-of-features)

An object as

Some local feature are very informative

• deals well with occlusion • scale invariant • rotation invariant

Page 4: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

(not so) crazy assumption

spatial information of local features can be ignored for object recognition (i.e., verification)

Page 5: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Works pretty well for image-level classification

CalTech6 dataset

Page 6: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Bag-of-features

represent a data item (document, texture, image) as a histogram over features

Page 7: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Bag-of-features

an old idea(e.g., texture recognition and information retrieval)

represent a data item (document, texture, image) as a histogram over features

Page 8: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Texture recognition

Universal texton dictionary

histogram

Mori, Belongie and Malik, 2001Julesz, 1981

Page 9: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Vector Space ModelG. Salton. ‘Mathematics and Information Retrieval’ Journal of Documentation,1979

1 6 2 1 0 0 0 1

Tartan robot CHIMP CMU bio soft ankle sensor

0 4 0 1 4 5 3 2

Tartan robot CHIMP CMU bio soft ankle sensor

http://www.fodey.com/generators/newspaper/snippet.asp

Page 10: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

A document (datapoint) is a vector of counts over each word (feature)

What is the similarity between two documents?

vd = [n(w1,d) n(w2,d) · · · n(wT,d)]

n(·) counts the number of occurrences just a histogram over words

Page 11: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

A document (datapoint) is a vector of counts over each word (feature)

What is the similarity between two documents?

vd = [n(w1,d) n(w2,d) · · · n(wT,d)]

n(·) counts the number of occurrences just a histogram over words

Use any distance you want but the cosine distance is fast.

d(vi,vj) = cos ✓

=

vi · vj

kvikkvjk✓

vi

vj

Page 12: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

but not all words are created equal

Page 13: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

TF-IDF

weigh each word by a heuristic

Term Frequency Inverse Document Frequency

vd = [n(w1,d) n(w2,d) · · · n(wT,d)]

vd = [n(w1,d)↵1 n(w2,d)↵2 · · · n(wT,d)↵T ]

term frequency

inverse document frequency

n(wi,d)↵i = n(wi,d) log

⇢DP

d0 1[wi 2 d0]

(down-weights common terms)

Page 14: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Standard BOW pipeline (for image classification)

Page 15: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Dictionary Learning: Learn Visual Words using clustering

Encode: build Bags-of-Words (BOW) vectors

for each image

Classify: Train and test data using BOWs

Page 16: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Dictionary Learning: Learn Visual Words using clustering

1. extract features (e.g., SIFT) from images

Page 17: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Dictionary Learning: Learn Visual Words using clustering

2. Learn visual dictionary (e.g., K-means clustering)

Page 18: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Dictionary Learning: Learn Visual Words using clustering

Encode: build Bags-of-Words (BOW) vectors

for each image

Classify: Train and test data using BOWs

Page 19: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Encode: build Bags-of-Words (BOW) vectors

for each image

1. Quantization: image features gets associated to a visual word (nearest

cluster center)

Page 20: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Encode: build Bags-of-Words (BOW) vectors

for each image 2. Histogram: count the number of visual word

occurrences

Page 21: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Feature ExtractionWhat kinds of features can we extract?

Page 22: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

• Regular  grid  • Vogel  &  Schiele,  2003 • Fei-­‐Fei  &  Perona,  2005

• Interest  point  detector  • Csurka  et  al.  2004 • Fei-­‐Fei  &  Perona,  2005 • Sivic  et  al.  2005  

• Other  methods  • Random  sampling  (Vidal-­‐Naquet  &  

Ullman,  2002) • Segmentation-­‐based  patches  (Barnard  

et  al.  2003)

Page 23: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Normalize  patch

Detect  patches  [Mikojaczyk  and  Schmid  ’02]  

[Mata,  Chum,  Urban  &  Pajdla,  ’02]    

[Sivic  &  Zisserman,  ’03]

Compute  SIFT  descriptor  

           [Lowe’99]

Page 24: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Page 25: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Visual Vocabulary (coding and vector quantization)

Page 26: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

A vector quantizer takes a feature vector and maps it to the index of the nearest code

vector in a codebook

visual vocabulary = code book

visual word = code vector

The codebook is used for quantizing features

Alternative perspective…

Page 27: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Page 28: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Clustering

Page 29: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Clustering

…Visual  vocabulary

Page 30: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

K-means ClusteringGiven k:

1.Select initial centroids at random.

2.Assign each object to the cluster with the nearest centroid.

3.Compute each centroid as the mean of the objects assigned to it.

4.Repeat previous 2 steps until no change.

Page 31: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

1. Select initial centroids at random

Page 32: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

1. Select initial centroids at random

2. Assign each object to the cluster with the nearest centroid.

Page 33: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

1. Select initial centroids at random

2. Assign each object to the cluster with the nearest centroid.

3. Compute each centroid as the mean of the objects assigned to it (go to 2)

Page 34: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

1. Select initial centroids at random

2. Assign each object to the cluster with the nearest centroid.

3. Compute each centroid as the mean of the objects assigned to it (go to 2)

2. Assign each object to the cluster with the nearest centroid.

Page 35: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

1. Select initial centroids at random

2. Assign each object to the cluster with the nearest centroid.

3. Compute each centroid as the mean of the objects assigned to it (go to 2)

2. Assign each object to the cluster with the nearest centroid.

Repeat previous 2 steps until no change

Page 36: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

From what data should I learn the code book?

• Codebook can be learned on separate training set

• Provided the training set is sufficiently representative, the codebook will be “universal”

Page 37: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Example  visual  vocabulary

Fei-­‐Fei  et  al.  2005

Page 38: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Example codebook

Source: B. Leibe

Appearance codebook

Page 39: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Another codebook

Appearance codebook…

………

Source: B. Leibe

Page 40: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Visual vocabularies: Issues

• How to choose vocabulary size? • Too small: visual words not representative of all patches • Too large: quantization artifacts, overfitting

• Computational efficiency • Vocabulary trees

(Nister & Stewenius, 2006)

Page 41: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Histogram

Page 42: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

…..

freq

uency

codewords

Page 43: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Classification

Page 44: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Given the bag-of-features representations of images from different classes, learn a classifier using machine learning

(more on this soon)

Page 45: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Extension to bag-of-words models

Page 46: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

How can we encode the spatial layout?

All of these images have the same color histogram!

Page 47: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

Spatial Pyramid representation

level 0

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 48: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

level 1level 0

Spatial Pyramid representation

Lazebnik, Schmid & Ponce (CVPR 2006)

Page 49: 8.2 Bag of Visual Words16385/s17/Slides/8.2_Bag_of_Visual_Words.pdf · a collection of local features (bag-of-features) An object as Some local feature are very informative • deals

level 1level 0

Spatial Pyramid representation

level 2

Lazebnik, Schmid & Ponce (CVPR 2006)