lecture 03 internet video search
TRANSCRIPT
![Page 1: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/1.jpg)
6: Location and context
![Page 2: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/2.jpg)
What makes a cow a cow?
Google knows because other people know
We think we know
“because it has four legs” But the fact of the matter: not all cows show four legs nor are they brown … not all…
How do you know?
![Page 3: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/3.jpg)
What is the object in the middle?
No segmentation … Not even the pixel values of the object …
![Page 4: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/4.jpg)
Where is evidence for an object?
Uijlings IJCV 2011
![Page 5: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/5.jpg)
Where is evidence for an object?
Uijlings IJCV 2011
![Page 6: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/6.jpg)
What is the visual extent of an object?
Uijlings IJCV 2012
![Page 7: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/7.jpg)
Where: exhaustive search
Look everywhere for the object window Imposes computational constraints on
Very many locations and windows (coarse grid/fixed aspect ratio) Evaluation cost per location (weak features/classifiers)
Impressive but takes long.
Viola IJCV 2004 Dalal CVPR 2005 Felzenszwalb PAMI 2010 Vedaldi ICCV 2009 7
![Page 8: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/8.jpg)
Where: the need for a hierarchy
An image is intrinsically hierarchical.
Gu CVPR 2009
![Page 9: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/9.jpg)
Selective search
Van de Sande ICCV 2011
Windows formed by hierarchical grouping. Adjacent grouping on color/texture/shape cues. Felzenszwalb 2004
![Page 10: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/10.jpg)
Selective search example
![Page 11: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/11.jpg)
11
Selective search example
![Page 12: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/12.jpg)
Average best overlap ~88%
… looks like this
High recall cat
![Page 13: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/13.jpg)
Pairs of concepts
Uijlings ICCV demo 2012
![Page 14: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/14.jpg)
6 Conclusion
Selective search gives good localization. Localization needed to understand pairs of concepts.
![Page 15: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/15.jpg)
7 Data and metadata
http://bit.ly/visualsearchengines
![Page 16: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/16.jpg)
How many concepts?
Li Fei Fei slide. Biederman, Psychological Rev. 1987
![Page 17: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/17.jpg)
How many examples?
Once you are over 100 – 1000 examples, success is there.
![Page 18: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/18.jpg)
Russell IJCV 2008
LabelMe 290,000 object annotations
Amateur labeling
![Page 19: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/19.jpg)
Amateur labeling
![Page 20: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/20.jpg)
Amateur labeling
![Page 21: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/21.jpg)
Xirong Li, TMM 2009
Tag relevance by social annotation
Consistency in tagging between users on similar images.
![Page 22: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/22.jpg)
Tag relevance by social annotation
Pretty good for snow not so good for rainbow.
![Page 23: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/23.jpg)
Social negative bootstrapping
Xirong Li ACM MM 2009
Negative images are as important as positive images to learn. Not just random negative images, but close ones. • We want to learn positive
example from an expert, and obtain as many negative samples as we like for free from the web.
• We iteratively aim for the hardest negatives.
![Page 24: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/24.jpg)
Social negative bootstrapping
Xirong Li ICMR 2011
![Page 25: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/25.jpg)
Knowledge ontology ImageNet
![Page 26: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/26.jpg)
acknowledgement WordNet friends
Christiane Fellbaum Dan Osherson
Princeton Kai Li
Princeton Alex Berg Columbia
Jia Deng Princeton/Stanford
Hao Su Stanford
![Page 27: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/27.jpg)
PASCAL VOC
The PASCAL Visual Object Classes (VOC). 500,000 Images downloaded from flickr. Queries like “car”, “vehicle”, “street”, “downtown”. 10,000 objects, 25,000 labels. Mark Everingham, Luc Van Gool, Chris Williams, John Winn, Andrew Zisserman
![Page 28: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/28.jpg)
7. Conclusion
Data is king. The data are beginning to reflect the human cognition capacity [at a basic level]. Harvesting social data requires advanced computer vision control.
![Page 29: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/29.jpg)
8 Performance
![Page 30: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/30.jpg)
PASCAL 2010 Aeroplane
Bus
Bicycle Bird Boat Bottle
Car Cat Chair Cow
![Page 31: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/31.jpg)
True Positives - Person UOCTTI_LSVM_MDPM
NLPR_HOGLBP_MC_LCEGCHLC
NUS_HOGLBP_CTX_CLS_RESCORE_V2
![Page 32: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/32.jpg)
False Positives - Person UOCTTI_LSVM_MDPM
NLPR_HOGLBP_MC_LCEGCHLC
NUS_HOGLBP_CTX_CLS_RESCORE_V2
![Page 33: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/33.jpg)
Non-birds & non-boats
Non-bird images: Highest ranked
Non-boat images: Highest ranked
Water texture and scene composition?
![Page 34: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/34.jpg)
Non-chair
![Page 35: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/35.jpg)
True Positives - Motorbike MITUCLA_HIERARCHY
NLPR_HOGLBP_MC_LCEGCHLC
NUS_HOGLBP_CTX_CLS_RESCORE_V2
![Page 36: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/36.jpg)
False Positives - Motorbike MITUCLA_HIERARCHY
NLPR_HOGLBP_MC_LCEGCHLC
NUS_HOGLBP_CTX_CLS_RESCORE_V2
![Page 37: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/37.jpg)
Object localization 2008-2010
Results on 2008 data improve for 2010 methods for all categories, by over 100% for some categories.
0
10
20
30
40
50
60
aerop
lane
bicyc
le bird
boat
bottle bu
s car cat
chair cow
dining
table dog
horse
motor
bike
perso
n
potte
dplan
t
shee
pso
fa
train
tvmon
itor
Max A
P (%
)
200820092010
![Page 38: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/38.jpg)
TRECvid evaluation standard
![Page 39: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/39.jpg)
Concept detection
Aircraft
Beach
Mountain
People marching
Police/Security
Flower
![Page 40: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/40.jpg)
Measuring performance
• Precision
Set of retrieved items
Set of relevant items
Set of relevant retrieved items
inverse relationship Recall
1.
2.
3.
4.
5.
Results
![Page 41: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/41.jpg)
UvA-MediaMill@TRECVID
• other systems
Snoek et al, TRECVID 04-10
![Page 42: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/42.jpg)
Performance doubled in just 3 years
• 36 concept detectors
Snoek & Smeulders, IEEE Computer 2010
Even when using training data of different origin, great progress. But the number of concepts is still limited.
![Page 43: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/43.jpg)
8. Conclusion
Impressive results and quickly improving per year. Very valuable competition. Best non-classes start to make sense!
![Page 44: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/44.jpg)
9 Speed
![Page 45: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/45.jpg)
SURF based on integral images
Introduced by Viola & Jones in the context of face detection: sliding windows in left to right / up to bottom integral images.
46
![Page 46: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/46.jpg)
SURF principle
LREC 2004, 26 May 2004, Lisbon 47
LyyLyyLxyLxy
Lyy
Lyy
L L L xx yy xy
Approximate Gaussian derivatives with box filters:
![Page 47: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/47.jpg)
SURF speed
LREC 2004, 26 May 2004, Lisbon 48
Computation time: 6 times faster than DoG (~100msec). Independent of filter scale.
Sca
le
![Page 48: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/48.jpg)
Dense descriptor extraction
Pixel-wise Responses Final Descriptor
Factor 16 speed improvement, Another factor 2 by the use of matrix libs.
![Page 49: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/49.jpg)
Projection: Random Forest
Binary decision trees
Moosmann et al. 2008 ......
.... ....
![Page 50: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/50.jpg)
Real-time bag of words
D-SURF 2x2 <empty> Random
Forest RBF
Descriptor Extraction
Projection Classification
Pre-projection Actual projection SVM kernel
MAP: 0.370
Total computation time is 38 milliseconds per image
26 frames per second on a normal PC in any 20 concepts.
15 10 13
![Page 51: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/51.jpg)
9. Conclusion
SURF scale and rotation invariant Fast due to the use of integral images Download: http://www.vision.ee.ethz.ch/~surf/ DURF extraction is 6x faster than Dense-SIFT. Projection using Random Forest 50x faster than NN.
![Page 52: Lecture 03 internet video search](https://reader035.vdocuments.mx/reader035/viewer/2022081400/555a7672d8b42a972b8b5320/html5/thumbnails/52.jpg)
Internet Video Search: the beginning
concept
detection
telling stories
browsing
video video
video measuring
features
lexicon
learning