cs5670: intro to computer vision · introduction to recognition cs5670: intro to computer vision...
TRANSCRIPT
![Page 1: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/1.jpg)
Introduction to Recognition
CS5670: Intro to Computer VisionNoah Snavely
mountain
building
tree
banner
vendor
people
street lamp
![Page 2: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/2.jpg)
Where we go from here
• What we know: Geometry
– What is the shape of the world?
– How does that shape appear in images?
– How can we infer that shape from one or more images?
• What’s next: Recognition
– What are we looking at?
![Page 3: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/3.jpg)
What do we mean by “object recognition”?
Next slides adapted from Li, Fergus, & Torralba’s excellent short course on category and object recognition
![Page 4: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/4.jpg)
Verification: is that a lamp?
![Page 5: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/5.jpg)
Detection: where are the people?
![Page 6: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/6.jpg)
Identification: is that Potala Palace?
![Page 7: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/7.jpg)
Object categorization
mountain
building
tree
banner
vendor
people
street lamp
![Page 8: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/8.jpg)
Scene and context categorization
• outdoor
• city
• …
![Page 9: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/9.jpg)
Activity / Event Recognition
what are these people doing?
![Page 10: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/10.jpg)
Object recognitionIs it really so hard?
This is a chair
Find the chair in this image Output of normalized correlation
![Page 11: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/11.jpg)
Object recognitionIs it really so hard?
Find the chair in this image
Pretty much garbageSimple template matching is not going to do the trick
![Page 12: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/12.jpg)
Object recognitionIs it really so hard?
Find the chair in this image
A “popular method is that of template matching, by point to point correlation of a model pattern with the image pattern. These techniques are inadequate for three-dimensional scene analysis for many reasons, such as occlusion, changes in viewing angle, and articulation of parts.” Nivatia & Binford, 1977.
![Page 13: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/13.jpg)
Why not use SIFT matching for everything?
• Works well for object instances (or distinctive images such as logos)
• Not great for generic object categories
![Page 14: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/14.jpg)
Brady, M. J., & Kersten, D. (2003). Bootstrapped learning of novel objects. J Vis, 3(6), 413-422
And it can get a lot harder
![Page 15: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/15.jpg)
Applications: Photography
![Page 16: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/16.jpg)
Applications: Shutter-free Photography
https://ai.googleblog.com/2019/04/take-your-best-selfie-automatically.html(Also features “kiss detection”)
Take Your Best Selfie Automatically, with Photobooth on Pixel 3
![Page 17: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/17.jpg)
Applications: Assisted / autonomous driving
https://www.extremetech.com/extreme/226071-nvidia-goes-all-in-on-self-driving-cars-including-a-robotic-car-racing-league
![Page 18: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/18.jpg)
Applications: Photo organization
Source: Google Photos
![Page 19: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/19.jpg)
Applications: medical imaging
Dermatologist-level classification of skin cancer
https://cs.stanford.edu/people/esteva/nature/
![Page 20: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/20.jpg)
Variability: Camera position
Illumination
Shape parameters
Why is this hard?
Svetlana Lazebnik
![Page 21: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/21.jpg)
![Page 22: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/22.jpg)
Challenge: variable viewpoint
Michelangelo 1475-1564
![Page 23: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/23.jpg)
Challenge: variable illumination
image credit: J. Koenderink
![Page 24: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/24.jpg)
Challenge: scale
![Page 25: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/25.jpg)
Challenge: deformation
![Page 26: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/26.jpg)
Challenge: Occlusion
Magritte, 1957
![Page 27: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/27.jpg)
Challenge: background clutter
Kilmeny Niland. 1995
![Page 28: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/28.jpg)
Challenge: intra-class variations
Svetlana Lazebnik
![Page 29: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/29.jpg)
A brief history of image recognition
• What worked in 2011 (pre-deep-learning era in computer vision)
– Optical character recognition
– Face detection
– Instance-level recognition (what logo is this?)
– Pedestrian detection (sort of)
– … that’s about it
![Page 30: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/30.jpg)
A brief history of image recognition
• What works now, post-2012 (deep learning era)
– Robust object classification across thousands of object categories (outperforming humans)
“Spotted salamander”
![Page 31: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/31.jpg)
A brief history of image recognition
• What works now, post-2012 (deep learning era)
– Face recognition at scale
![Page 32: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/32.jpg)
A brief history of image recognition
• What works now, post-2012 (deep learning era)
– High-quality face synthesis (but not yet for completely general scenes)
![Page 33: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/33.jpg)
A Style-Based Generator Architecture for Generative Adversarial NetworksTero Karras (NVIDIA), Samuli Laine (NVIDIA), Timo Aila (NVIDIA)http://stylegan.xyz/paper
These people are not real – they were produced by our generator that allows control over different aspects of the image.
![Page 34: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/34.jpg)
What Matters in Recognition?
• Learning Techniques– E.g. choice of classifier or inference method
• Representation– Low level: SIFT, HoG, GIST, edges
– Mid level: Bag of words, sliding window, deformable model
– High level: Contextual dependence
– Deep learned features
• Data– More is always better (as long as it is good data)
– Annotation is the hard part
![Page 35: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/35.jpg)
What Matters in Recognition?
• Learning Techniques– E.g. choice of classifier or inference method
• Representation– Low level: SIFT, HoG, GIST, edges
– Mid level: Bag of words, sliding window, deformable model
– High level: Contextual dependence
– Deep learned features
• Data– More is always better (as long as it is good data)
– Annotation is the hard part
![Page 36: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/36.jpg)
installation by Erik Kessels
24 Hrs in Photos
http://www.kesselskramer.com/exhibitions/24-hrs-of-photos
![Page 37: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/37.jpg)
Data Sets• ImageNet
– Huge, Crowdsourced, Hierarchical, Iconic objects
• PASCAL VOC– Not Crowdsourced, bounding boxes, 20 categories
• SUN Scene Database, Places– Not Crowdsourced, 397 (or 720) scene categories
• LabelMe (Overlaps with SUN)– Sort of Crowdsourced, Segmentations, Open ended
• SUN Attribute database (Overlaps with SUN)– Crowdsourced, 102 attributes for every scene
• OpenSurfaces– Crowdsourced, materials
• Microsoft COCO– Crowdsourced, large-scale objects
![Page 38: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/38.jpg)
Large Scale Visual
Recognition Challenge (ILSVRC) 2010-2012
20 object classes 22,591 images
1000 object classes 1,431,167 images
Dalmatian
http://image-net.org/challenges/LSVRC/{2010,2011,2012}
![Page 39: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/39.jpg)
Variety of object classes in ILSVRC
![Page 40: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/40.jpg)
Variety of object classes in ILSVRC
![Page 41: CS5670: Intro to Computer Vision · Introduction to Recognition CS5670: Intro to Computer Vision Noah Snavely mountain building tree banner vendor people street lamp](https://reader034.vdocuments.mx/reader034/viewer/2022042621/5f68d0bad57a7c48542aa4d7/html5/thumbnails/41.jpg)
Questions?