visual recognition: the big picture
DESCRIPTION
Visual Recognition: The Big Picture. Jitendra Malik University of California at Berkeley. The more you look, the more you see!. PASCAL Visual Object Challenge. We want to locate the object. Orig. Image. Segmentation. Orig. Image. Segmentation. And we want to detect and label parts. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/1.jpg)
Visual Recognition:The Big Picture
Jitendra MalikUniversity of California at
Berkeley
![Page 2: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/2.jpg)
The more you look, the more you see!
![Page 3: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/3.jpg)
PASCAL Visual Object Challenge
![Page 4: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/4.jpg)
We want to locate the objectOrig. Image Segmentation Orig. Image Segmentation
![Page 5: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/5.jpg)
The Visually Tagged Human Project
And we want to detect and label parts..
![Page 6: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/6.jpg)
Computer Vision GroupUC Berkeley
Categorization at Multiple Levels
TigerGrass
Water
Sand
outdoorwildlife
Tiger
tail
eye
legs
head
back
shadow
mouth
![Page 7: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/7.jpg)
![Page 8: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/8.jpg)
Examples of Actions• Movement and posture change
– run, walk, crawl, jump, hop, swim, skate, sit, stand, kneel, lie, dance (various), …
• Object manipulation– pick, carry, hold, lift, throw, catch, push, pull, write, type, touch, hit,
press, stroke, shake, stir, turn, eat, drink, cut, stab, kick, point, drive, bike, insert, extract, juggle, play musical instrument (various)…
• Conversational gesture– point, …
• Sign Language
![Page 9: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/9.jpg)
Computer Vision GroupUniversity of California Berkeley
We need to identify
• Objects• Agents• Relationships among objects with objects, objects
with agents, agents with agents …• Events and Actions
![Page 10: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/10.jpg)
Computer Vision GroupUC Berkeley
Taxonomy and Partonomy
• Taxonomy: E.g. Cats are in the order Felidae which in turn is in the class Mammalia– Recognition can be at multiple levels of categorization, or be identification at
the level of specific individuals , as in faces.• Partonomy: Objects have parts, they have subparts and so on. The human
body contains the head, which in turn contains the eyes.• These notions apply equally well to scenes and to activities. • Psychologists have argued that there is a “basic-level” at which
categorization is fastest (Eleanor Rosch et al).• In a partonomy each level contributes useful information for recognition.
![Page 11: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/11.jpg)
Visual Processing Areas
![Page 12: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/12.jpg)
Macaque Visual Areas
![Page 13: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/13.jpg)
Computer Vision GroupUC Berkeley
Object Detection can be very fast
• On a task of judging animal vs no animal, humans can make mostly correct saccades in 150 ms (Kirchner & Thorpe, 2006)
– Comparable to synaptic delay in the retina, LGN, V1, V2, V4, IT pathway.
– Doesn’t rule out feed back but shows feed forward only is very powerful
• Detection and categorization are practically simultaneous (Grill-Spector & Kanwisher, 2005)
![Page 14: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/14.jpg)
Hubel and Wiesel (1962) discovered orientation sensitive neurons in V1
![Page 15: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/15.jpg)
These cells respond to edges and bars ..
![Page 16: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/16.jpg)
![Page 17: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/17.jpg)
Computer Vision GroupUC Berkeley
Orientation based features were inspired by V1 (SIFT, GIST, HOG, GB etc)
![Page 18: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/18.jpg)
Computer Vision GroupUC Berkeley
Attneave’s Cat (1954)Line drawings convey most of the information
![Page 19: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/19.jpg)
Rolls et al (2000)
![Page 20: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/20.jpg)
![Page 21: Visual Recognition: The Big Picture](https://reader035.vdocuments.mx/reader035/viewer/2022062310/56816367550346895dd43f1c/html5/thumbnails/21.jpg)
Convolutional Neural Networks (LeCun et al)