!lambertoballanandlorenzoseidenari!hands!on!advanced!bag.of.words!models!for!visual!recogni8on!!lambertoballanandlorenzoseidenari...
TRANSCRIPT
![Page 1: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/1.jpg)
Hands on Advanced Bag-‐of-‐Words Models for Visual Recogni8on
Lamberto Ballan and Lorenzo Seidenari MICC -‐ University of Florence
September 9, 2013 – Villa Doria D’Angri, Napoli (Italy)
-‐ The tutorial will start at 14:30 -‐ In the meanwhile please download the matlab code and images
from: h:ps://sites.google.com/site/iciap13handsonbow/ -‐ We have also some USB pendrives with the material
-‐ The starEng point is the exercises.m file (we provide you also the exercises_soluEons.m script)
![Page 2: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/2.jpg)
Hands on Advanced Bag-‐of-‐Words Models for Visual Recogni8on
Lamberto Ballan and Lorenzo Seidenari MICC -‐ University of Florence
![Page 3: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/3.jpg)
Outline of this tutorial • Introduc8on
• Visual recogniEon problem definiEon • Bag of Words models (BoW) • Main drawbacks and soluEons
• Session I (pracEcal session): Standard BoW pipeline • Feature sampling strategies • Codebook creaEon and feature quanEzaEon • Classifiers
• Session II (pracEcal session): Advanced BoW models for Visual recogniEon • Feature fusion • Modern feature representaEon: reconstrucEon based
approaches LLC • SpaEal pooling: max pooling, spaEal pyramid
![Page 4: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/4.jpg)
Visual Recogni8on
PredicEng the presence (absence) of an object in an image Does this image contains a church? [Where?]
![Page 5: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/5.jpg)
Visual Recogni8on
PredicEng the presence (absence) of an object in an image Does this image contains a church? [Where?]
Church
![Page 6: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/6.jpg)
Visual Recogni8on
Single instance versus category recogniEon Does this image contains «Santa Maria Del Fiore Cathedral»?
S.M. Del Fiore
![Page 7: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/7.jpg)
Visual Recogni8on
Single instance versus category recogniEon Does this image contain a face?
Face
![Page 8: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/8.jpg)
Visual Recogni8on
Single instance versus category recogniEon Does this image contain Barak Obama?
Obama
![Page 9: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/9.jpg)
Scale • Objects of different size
• PerspecEve
Visual Recogni8on Challenges
![Page 10: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/10.jpg)
Viewpoint • Object pose
Michelangelo’s David
Visual Recogni8on Challenges
![Page 11: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/11.jpg)
Occlusion • 3D scene layout
• ArEculated enEEes
Magri:e’s “The Son of Man”
Visual Recogni8on Challenges
![Page 12: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/12.jpg)
Clu:er
Visual Recogni8on Challenges
![Page 13: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/13.jpg)
Intra-‐class variaEon • All these are chairs
Visual Recogni8on Challenges
Image cred
it: L. Fei-‐Fei
![Page 14: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/14.jpg)
Visual Recogni8on Challenges
Wolf
Dog
Inter-‐class similarity • A dog can be very similar to a wolf
![Page 15: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/15.jpg)
Bag-‐of-‐Words models
• “Bag of Words” (BoW) model, combined with advanced classificaEon techniques, reaches state-‐of-‐the-‐art results
• The approach: – A text document is represented as an unordered collecEon of words,
disregarding grammar and word order; – Method ingredients are: vocabulary, word histograms, a classifier
is it something about medicine/biology ?
is it a document about business ?
• Text categoriza8on: the task is to assign a textual document to one or more categories based on its content
Image cred
it: L. Fei-‐Fei
![Page 16: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/16.jpg)
image of an “object” category bag of visual words
Same approach usable with visual data • An image can be treated as a document, and features extracted from the
image are considered as the "visual words"...
D1: “face” D2: “bike” D3: ”violin”
Vocabulary
(codewords)
Bag of (visual) Words: an image is represented as an unordered collecEon of visual words
Image cred
it: L. Fei-‐Fei
![Page 17: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/17.jpg)
Pipeline
1. Feature detecEon (sampling) and descripEon
2. Codebook formaEon and image representaEon
3. Learning and recogniEon
![Page 18: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/18.jpg)
Pipeline
1. Feature detecEon (sampling) and descripEon
2. Codebook formaEon and image representaEon
3. Learning and recogniEon
The focus of this tutorial
![Page 19: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/19.jpg)
Pipeline
… …
Feature Descrip8on
e.g. SIFT
Feature Sampling
Codebook Forma8on e.g. K-‐means
Image Representa8on
Learning and Recogni8on
Category label
![Page 20: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/20.jpg)
Pipeline
… …
Feature Descrip8on
e.g. SIFT
Feature Sampling
Codebook Forma8on e.g. K-‐means
Image Representa8on
Learning and Recogni8on
Category label
Exercise 1
Exercise 2
Exercises 3, 4
![Page 21: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/21.jpg)
Feature Sampling Interest operators (e.g. DoG, Harris, MSER …)
exercises.m extract_siL_features.m
![Page 22: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/22.jpg)
Feature Sampling Dense sampling (regular grid)
exercises.m extract_siL_features.m
![Page 23: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/23.jpg)
Codebook Forma8on
Feature Descrip8on
e.g. SIFT Codebook Forma8on
Clustering: K-‐means Exercise 1
exercises.m
exercises.m
TODO: Exercise 1 to complete the codebook formaEon
![Page 24: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/24.jpg)
Codebook Forma8on
Visual Word example 1: what’s inside a cluster
![Page 25: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/25.jpg)
Codebook Forma8on
Visual Word example 2: what’s inside a cluster
![Page 26: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/26.jpg)
Codebook Forma8on
Visual Word example 3: what’s inside a cluster
![Page 27: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/27.jpg)
Codebook Issues
How to choose vocabulary size? -‐ Too small: visual words not representaEve of all patches -‐ Too large: quanEzaEon arEfacts, overfijng ComputaEonal efficiency -‐ Vocabulary trees (Nister’06)
Lecture 14! - !!!
Fei-Fei Li!
Visual'vocabularies:'Issues'• How'to'choose'vocabulary'size?'
– Too'small:'visual'words'not'representa/ve'of'all'patches'– Too'large:'quan/za/on'ar/facts,'overfirng'
• Computa/onal'efficiency'– Vocabulary'trees''
(Nister'&'Stewenius,'2006)'
309Oct912'68'
Image cred
it: D. N
ister
![Page 28: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/28.jpg)
Bag-‐of-‐Words Representa8on
Quan8za8on: assign each feature to the most representaEve visual word
SIFT Hard assignment -‐ Nearest-‐neighbors assignment -‐ K-‐D tree search strategy
codebook
![Page 29: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/29.jpg)
Image Representa8on
… … Image
Representa8on
Exercise 2 Visual Codebook Histogram of Visual Words
TODO: Exercise 2 to represent images as BoW histograms
exercises.m
Once each feature is assigned to a visual word we can compute our image representaEon
![Page 30: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/30.jpg)
Image Representa8on
Compute histograms of visual word frequencies
codebook
visual words
freq
uency
![Page 31: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/31.jpg)
Learning and Recogni8on
Learn category models/classifiers from a training set DiscriminaEve methods (covered by this tutorial) -‐ k-‐NN -‐ SVM: linear and non-‐linear kernels (RBF, IntersecEon, …)
GeneraEve methods -‐ graphical models (pLSA, LDA, …)
![Page 32: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/32.jpg)
Discrimina8ve Classifiers
Category models
.
.
.
.
.
.
Class 1 Class 2
Model space
Test Image
?
![Page 33: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/33.jpg)
k-‐Nearest Neighbors Classifier
-‐ For a test image find the k closest points from training data
-‐ Labels of the k points vote to classify
Model space
Test Image
Works well if there is lots of data and the distance funcEon is good
TODO: Exercise 3, NN image classificaEon using Chi-‐2 distance
![Page 34: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/34.jpg)
SVM Classifier
Find hyperplane that maximizes the margin between the posiEve and negaEve examples
Model space
Test Image
-‐ Datasets that are linearly separable work out great
-‐ But what if the dataset is not linearly
separable? We can map it to a higher dimensional space (liLing)
TODO: Exercise 4, SVM image classificaEon using different pre-‐computed kernels
![Page 35: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/35.jpg)
score for class cow
• We should discount small changes in large feature values
Kernels for histograms
• Linear classificaEon with BoW histograms: – Each occurrence of a visual word index leads to same score increment – ClassificaEon score proporEonal to object size!
Slide cred
it: J. Verbe
ek
![Page 36: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/36.jpg)
• Hellinger and Chi-‐2 distances apply a discount to large values changes • Hellinger distance: element-‐wise square rooEng • Chi-‐2 distance between vectors
• DiscounEng effect of distances: L2 Hellinger Chi-2
Slide cred
it: J. Verbe
ek
![Page 37: !LambertoBallanandLorenzoSeidenari!Hands!on!Advanced!Bag.of.Words!Models!for!Visual!Recogni8on!!LambertoBallanandLorenzoSeidenari MICC!.!University!of!Florence!](https://reader036.vdocuments.mx/reader036/viewer/2022081410/609269c9cdbe2323123ca3a6/html5/thumbnails/37.jpg)
Experiments
• Two different datasets (both are subsets of Caltech-‐101) – 4 Object Categories: faces, airplanes, cars, motorbikes – 15 Object Categories: bonsai, buSerfly, crab, elephant, euphonium, faces,
grandpiano, joshuatree, leopards, lotus, motorbikes, schooner, stopsign, sunflower, watch
• Experimental protocol – for each class 30 images are selected for train and (up to) 50 for test – results are reported by measuring accuracy
Confusion matrices obtained on the 4 Objects dataset (using NN and linear SVM)