cs 1674: intro to computer visionkovashka/cs1674_fa20/vision_01_intro.pdf · computer vision is not...
TRANSCRIPT
![Page 1: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/1.jpg)
CS 1674: Intro to Computer Vision
Introduction
Prof. Adriana KovashkaUniversity of Pittsburgh
August 20, 2020
![Page 2: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/2.jpg)
Course Info
• Course website: http://people.cs.pitt.edu/~kovashka/cs1674_fa20
• Instructor: Adriana Kovashka([email protected])
• Office: https://pitt.zoom.us/s/4168010698
• Class: Tue/Thu, 11:05am-12:20pm
• Office hours: Tue, Wed, Thu, 9am-11am
• Hangout room: (ID on Canvas)
![Page 3: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/3.jpg)
About the Instructor
Born 1985 in Sofia, Bulgaria
Got BA in 2008 atPomona College, CA(Computer Science & Media Studies)
Got PhD in 2014at University of Texas at Austin(Computer Vision)
![Page 4: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/4.jpg)
About the TA
• TA (grader): Mingda Zhang
• Office: Zoom TBD
• Office hours: TBD
– Do this Doodle by the end of Monday: https://doodle.com/poll/ri6ret4bmcucit4d
– Mark all times you’re available during any week
![Page 5: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/5.jpg)
Course Goals
• To learn the basics of low-level image analysis
• To learn about some classic and modern approaches to high-level computer vision tasks (e.g. object recognition)
• To get experience with vision techniques
• To learn/apply basic machine learning (a key component of modern computer vision)
• To think critically about vision approaches, and to see connections between works
![Page 6: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/6.jpg)
Textbooks
• Computer Vision: Algorithms and Applicationsby Richard Szeliski
• Visual Object Recognition by Kristen Graumanand Bastian Leibe
• More resources available on course webpage
• Your notes from class are your best study material, slides are not complete with notes
![Page 7: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/7.jpg)
Programming Language
• We’ll use Matlab
• It can be downloaded for free from MyPitt
• We’ll do a short tutorial; ask TA if you need further help
![Page 8: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/8.jpg)
Course Structure
• Lectures
• Near-daily quizzes (4% of total grade)
• Participation in class (8%)
• Weekly assignments: programming, essays (11x8%)
![Page 9: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/9.jpg)
Zoom Etiquette
• Please turn your camera on if you are joining class on Zoom
• Please send me email if you are unable to do so and explain the reason
![Page 10: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/10.jpg)
Health and Well-being
• Follow university policies
• Please be kind to others and be patient
• Wear your mask over your nose and mouth
• Please reach out to the instructor if you want to talk about anything privately, and don’t hesitate to ask for help
• Use hangout channel: talk to your peers about anything
![Page 11: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/11.jpg)
Warning #1
• This class is a lot of work
• I’ve opted for shorter, more manageable HW assignments, but there is a lot of them
• I expect you’d be spending 6-8 hours on homework each week
• … But you get to understand algorithms and concepts in detail!
![Page 12: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/12.jpg)
Warning #2
• Some parts will be hard and require that you pay close attention!
• We will have low-stakes near-daily quizzes to help you stay focused and help me know how you are doing
• Use instructor’s and TA’s office hours
• … You will learn a lot!
![Page 13: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/13.jpg)
Policies and Schedule
http://people.cs.pitt.edu/~kovashka/cs1674_fa20
Breakout: Read, discuss, come up with questions, report back
![Page 14: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/14.jpg)
Questions?
![Page 15: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/15.jpg)
Plan for Today
• Blitz introductions
• What is computer vision?– Why do we care?
– What are the challenges?
– What is recent research like?
• Overview of topics
• Review and tutorial– Linear algebra
– Matlab
![Page 16: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/16.jpg)
Blitz introductions (15 sec)
• What is your name?
• What one thing outside of school are you passionate about?
• What do you hope to get out of this class?
• Every time you speak, please remind me your name
![Page 17: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/17.jpg)
Computer Vision
![Page 18: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/18.jpg)
What is computer vision?
Done?
Kristen Grauman (adapted)
"We see with our brains, not with our eyes“ (Oliver Sacks and others)
![Page 19: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/19.jpg)
• Automatic understanding of images and video
– Algorithms and representations to allow a machine to
recognize objects, people, scenes, and activities
– Algorithms to mine, search, and interact with visual data
– Computing properties and navigating within the 3D world
using visual data
– Generating realistic synthetic visual data
Adapted from Kristen Grauman
What is computer vision?
![Page 20: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/20.jpg)
sky
water
Ferris
wheel
amusement park
Cedar Point
12 E
tree
tree
tree
carouseldeck
people waiting in line
ride
ride
ride
umbrellas
pedestrians
maxair
bench
tree
Lake Erie
people sitting on ride
Objects
Activities
Scenes
Locations
Text / writing
Faces
Gestures
Motions
Emotions…
The Wicked
Twister
Perception and interpretation
Kristen Grauman
How to do? What are the challenges?
![Page 21: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/21.jpg)
Visual search, organization
Image or video
archives
Query Relevant
content
Kristen Grauman
How to do? What are the challenges?
![Page 22: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/22.jpg)
Measurement
Real-time stereo Structure from motion
NASA Mars Rover
Pollefeys et al.
Multi-view stereo for
community photo collections
Goesele et al.
Slide credit: L. LazebnikHow to do? What are the challenges?
![Page 23: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/23.jpg)
Karras et al., “Progressive Growing of GANs for Improved Quality, Stability, and Variation”, ICLR 2018
Generation
How to do? What are the challenges?
![Page 24: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/24.jpg)
Related disciplines
Cognitive
science
Algorithms
Image
processing
Artificial
intelligence
GraphicsMachine
learningComputer
vision
Kristen Grauman
![Page 25: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/25.jpg)
Vision and graphics
ModelImages Vision
Graphics
Inverse problems: analysis and synthesis.
Kristen Grauman
![Page 26: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/26.jpg)
Why vision?• Images and video are everywhere!
Personal photo albums
Surveillance and security
Movies, news, sports
Medical and scientific images
Adapted from Lana Lazebnik
144k hours uploaded to YouTube daily
4.5 mil photos uploaded to Flickr daily
10 bil images indexed by Google
![Page 27: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/27.jpg)
• As image sources multiply, so do applications
– Relieve humans of boring, easy tasks
– Perception for robotics / autonomous agents
– Organize and give access to visual content
– Description of content for the visually impaired
– Human-computer interaction
– Fun applications (e.g. art styles to my photos)
Adapted from Kristen Grauman
Why vision?
Breakout: Come up with 3 specific applications,
discuss which ones most useful, report back.
![Page 28: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/28.jpg)
Things that work well
![Page 29: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/29.jpg)
Faces and digital cameras
Setting camera
focus via face
detection
Camera waits for
everyone to smile to
take a photo [Canon]
Kristen Grauman
![Page 30: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/30.jpg)
Face recognition
Devi Parikh
![Page 31: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/31.jpg)
Optical character recognition
https://www.mathworks.com/matlabcentral/fileexchange/18169-optical-character-recognition-ocr
![Page 32: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/32.jpg)
Snavely et al.
Kristen Grauman
Exploring photo collections
![Page 33: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/33.jpg)
Linking to info with a mobile device
kooaba
Situated search
Yeh et al., MIT
MSR Lincoln
Kristen Grauman
![Page 34: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/34.jpg)
Safety & security
Navigation,
driver safety Monitoring pool (Poseidon)
SurveillancePedestrian detection
MERL, Viola et al.Kristen Grauman
![Page 35: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/35.jpg)
Yong Jae Lee
Interactive systems
![Page 36: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/36.jpg)
Video-based interfaces
Human joystick
NewsBreaker Live
Assistive technology systems
Camera Mouse
Boston College
Kristen Grauman
YouTube Link
![Page 37: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/37.jpg)
Vision for medical & neuroimages
Image guided surgery
MIT AI Vision Group
fMRI data
Golland et al.
Kristen Grauman
![Page 38: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/38.jpg)
Things that need more work
The latest at CVPR, ICCV, ECCV
CVPR = IEEE/CVF Conference on Computer Vision and Pattern Recognition
ICCV = IEEE/CVF International Conference on Computer Vision
ECCV = European Conference on Computer Vision
![Page 39: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/39.jpg)
Our ability to detect objects has gone
from 34 mAP in 2008
to 73 mAP at 7 FPS (frames per second)
or 63 mAP at 45 FPS
in 2016
![Page 40: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/40.jpg)
Accurate object detection in real time
Pascal 2007 mAP Speed
DPM v5 33.7 .07 FPS 14 s/img
R-CNN 66.0 .05 FPS 20 s/img
Fast R-CNN 70.0 .5 FPS 2 s/img
Faster R-CNN 73.2 7 FPS 140 ms/img
YOLO 69.0 45 FPS 22 ms/img
2 feet
Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016
![Page 41: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/41.jpg)
Redmon et al., CVPR 2016
Recognition in novel modalities
![Page 42: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/42.jpg)
Chen et al., CVPR 2017
Learning from weak supervision
![Page 43: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/43.jpg)
Doersch et al., ICCV 2015
Unsupervised learning
![Page 44: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/44.jpg)
A B
1 2 3
54
6 7 8Doersch et al., ICCV 2015
Unsupervised learning
![Page 45: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/45.jpg)
Bagautdinov et al., CVPR 2017
Understanding activities and intents
![Page 46: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/46.jpg)
Pirsiavash et al., ECCV 2014
Understanding how well activity performed
![Page 47: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/47.jpg)
Gao et al., CVPR 2018
Imagining motion in static images
![Page 48: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/48.jpg)
Park et al., CVPR 2016
Decoding physics from video
![Page 49: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/49.jpg)
Tapaswi et al., CVPR 2016
Understanding stories in film
![Page 50: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/50.jpg)
Vicol et al., CVPR 2018
Understanding stories in film
![Page 51: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/51.jpg)
Automatic Understanding of Image and Video Advertisements
- I should buy Volkswagen because it can hold a big bear.- I should buy VW SUV because it can fit anything and everything in it.- I should buy this car because it can hold everything I need.
What should the viewer do, and why should they do this?
Cars, automobiles
What’s being advertised in this image?
Amused, Creative, Impressed, Youthful, Conscious
What sentiments are provoked in the viewer?
Symbolism, Contrast, Straightforward, Transferred qualities
What strategies are used to persuade viewer?
We collect an advertisement dataset containing 64,832 images and 3,477 videos, each annotated by 3-5 human workers from Amazon Mechanical Turk.
Image
Topic 204,340 Strategy 20,000
Sentiment 102,340 Symbol 64,131
Q+A Pair 202,090 Slogan 11,130
Video
Topic 17,345 Fun/Exciting 15,380
Sentiment 17,345 English? 17,374
Q+A Pair 17,345 Effective 16,721
Atypical ObjectsSymbolism Culture/Memes
Understanding advertisements is more challenging than simply recognizing physical content from images, as ads employ a variety of strategies to persuade viewers.
Here are some sample annotations in our dataset.
Zaeem Hussain, Mingda Zhang, Xiaozhong Zhang, Keren Ye, Christopher Thomas,
Zuha Agha, Nathan Ong, Adriana Kovashka
University of Pittsburgh
More information available at http://cs.pitt.edu/~kovashka/ads
Hussein et al., CVPR 2017
![Page 52: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/52.jpg)
Das et al., CVPR 2018
Reasoning and acting:
Embodied question answering
![Page 53: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/53.jpg)
Radford et al., ICLR 2016
Reed et al., ICML 2016
Image generation
![Page 54: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/54.jpg)
Choi et al., CVPR 2018
Image generation
![Page 55: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/55.jpg)
Gatys et al., CVPR 2016DeepArt.io – try it for yourself!
Transferring art styles
![Page 56: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/56.jpg)
Vondrick and Torralba, CVPR 2017
Video generation
![Page 57: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/57.jpg)
Computer vision is not solved
• Deep learning makes excellent use of massive data (labeled for the task of interest?)
– But it’s hard to understand how it does so, makes it hard to fix when it doesn’t work well
– It doesn’t work well when massive data is not available and your task is different than tasks for which data is available
– We can recognize objects with 97% accuracy but reasoning about relationships and intent is harder
![Page 58: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/58.jpg)
YouTube link
Seeing AI
![Page 59: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/59.jpg)
Obstacles?
Kristen GraumanRead more about the history: Szeliski Sec. 1.2
![Page 60: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/60.jpg)
Why is vision difficult?
• Ill-posed problem: real world much more
complex than what we can measure in
images
– 3D → 2D
– Motion → static
• Impossible to literally “invert” image formation
process with limited information
– Need information outside of this particular image
to generalize what image portrays (e.g. to resolve
occlusion)
Adapted from Kristen Grauman
![Page 61: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/61.jpg)
What the computer gets
Adapted from Kristen Grauman and Lana Lazebnik
Why is this problematic?
![Page 62: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/62.jpg)
Challenges: many nuisance parameters
Illumination Object pose Clutter
ViewpointIntra-class
appearanceOcclusions
Kristen Grauman
Think again about the pixels…
![Page 63: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/63.jpg)
Challenges: intra-class variation
slide credit: Fei-Fei, Fergus & Torralba
CMOA Pittsburgh
![Page 64: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/64.jpg)
Challenges: importance of context
slide credit: Fei-Fei, Fergus & Torralba
![Page 65: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/65.jpg)
• Thousands to millions of pixels in an image
• 3,000-30,000 human recognizable object categories
• 30+ degrees of freedom in the pose of articulated
objects (humans)
• Billions of images indexed by Google Image Search
• 1.424 billion smart camera phones sold in 2015
• About half of the cerebral cortex in primates is
devoted to processing visual information [Felleman
and van Essen 1991]
Kristen Grauman
Challenges: Complexity
![Page 66: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/66.jpg)
Challenges: Limited supervision
MoreLess
Kristen Grauman
![Page 67: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/67.jpg)
Challenges: Vision requires reasoning
Antol et al., “VQA: Visual Question Answering”, ICCV 2015
![Page 68: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/68.jpg)
Evolution of datasets
• Challenging problem → active research area
PASCAL: 20 categories, 12k images
ImageNet: 22k categories, 14mil images
Microsoft COCO: 80 categories, 300k images
![Page 69: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/69.jpg)
Some Visual Recognition Problems: Why are they challenging?
![Page 70: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/70.jpg)
Recognition: What objects do you see?
carriagehorse
person
person
truck
street
building
table
balcony
car
![Page 71: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/71.jpg)
Detection: Where are the cars?
![Page 72: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/72.jpg)
Activity: What is this person doing?
![Page 73: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/73.jpg)
Scene: Is this an indoor scene?
![Page 74: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/74.jpg)
Instance: Which city? Which building?
![Page 75: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/75.jpg)
Visual question answering: Why is there a carriage in the street?
![Page 76: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/76.jpg)
What to do about variance? (albino koalas)
• You can’t always do anything about it
• Basic assumption: training and test set are sampled from the same distribution
• Practically speaking: Make sure training set is representative
• Tricks (e.g. data preprocessing) make learning stable
• You can try to artificially expand variance in training set
• You can detect outliers, report confidence of prediction
• Methods exist that explicitly try to handle out-of-distribution data, but it’s an open problem
![Page 77: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/77.jpg)
Problem with categorization (Borges' Animals)
“These ambiguities, redundancies and deficiencies recall those that Dr. Franz Kuhn attributes to a certain Chinese dictionary entitled The Celestial Emporium of Benevolent Knowledge. In its remote pages it is written that animals can be divided into (a) those belonging to the Emperor, (b) those that are embalmed, (c) those that are tame, (d) pigs, (e) sirens, (f) imaginary animals, (g) wild dogs, (h) those included in this classification, (i) those that are crazy-acting, (j) those that are uncountable, (k) those painted with the finest brush made of camel hair, (l) miscellaneous, (m) those which have just broken a vase, and (n) those which, from a distance, look like flies.“
Jorge Luis Borges, The Analytical Language of John Wilkins, https://www.entish.org/essays/Wilkins.html
![Page 78: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/78.jpg)
Overview of topics
![Page 79: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/79.jpg)
Features and filters
• Transforming and
describing images;
textures, colors, edgesKristen Grauman
![Page 80: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/80.jpg)
• Detecting distinctive + repeatable features
• Describing images with local statistics
Features and filters
![Page 81: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/81.jpg)
[fig from Shi et al]
• Clustering,
segmentation,
fitting; what parts
belong together?Kristen Grauman
Grouping and fitting
![Page 82: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/82.jpg)
Hartley and Zisserman
Lowe
• Multi-view geometry,
matching, invariant
features, stereo vision
Fei-Fei Li
Kristen Grauman
Multiple views
![Page 83: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/83.jpg)
Image categorization
• Fine-grained recognition
Visipedia ProjectSlide credit: D. Hoiem
![Page 84: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/84.jpg)
Image categorization
• Material recognition
[Bell et al. CVPR 2015]Slide credit: D. Hoiem
![Page 85: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/85.jpg)
Image categorization
• Image style recognition
[Karayev et al. BMVC 2014] Slide credit: D. Hoiem
![Page 86: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/86.jpg)
• Recognizing objects
and categories,
learning techniquesAdapted from Kristen Grauman
Visual recognition and SVMs
![Page 87: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/87.jpg)
Convolutional neural networks (CNNs)
• State-of-the-art on many recognition tasks
ImagePrediction
Yosinski et al., ICML DL workshop 2015
Krizhevsky et al., NIPS 2012
![Page 88: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/88.jpg)
Recurrent neural networks
• Sequence processing, e.g. question answering
Wu et al., CVPR 2016
![Page 89: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/89.jpg)
• Tracking objects, video analysis
Tomas Izo
Kristen Grauman
Motion and tracking
![Page 90: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/90.jpg)
Pose and actions
• Automatically annotating human pose (joints)
• Recognizing actions in first-person video
![Page 91: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/91.jpg)
H/T Kirk Pruhs
![Page 92: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/92.jpg)
Linear algebra review
See http://cs229.stanford.edu/section/cs229-linalg.pdf for more
![Page 93: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/93.jpg)
What are images? (in Matlab)
• Matlab treats images as matrices of numbers
• To proceed, let’s talk very briefly about how images are formed
![Page 94: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/94.jpg)
Image formation
Slide credit: Derek Hoiem
(film)
![Page 95: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/95.jpg)
Slide credit: Derek Hoiem, Steve Seitz
Digital images
• Sample the 2D space on a regular grid
• Quantize each sample (round to nearest integer)
![Page 96: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/96.jpg)
• Sample the 2D space on a regular grid
• Quantize each sample (round to nearest integer)
• What does quantizing signal look like?
• Image thus represented as a matrix of integer values.
Adapted from S. Seitz
2D
1D
5.9 4.6
Digital images
6 5
![Page 97: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/97.jpg)
Slide credit: Kristen Grauman
Digital color images
![Page 98: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/98.jpg)
R G B
Color images, RGB color space:
Split image into three channels
Digital color images
Adapted from Kristen Grauman
![Page 99: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/99.jpg)
Images in Matlab• Color images represented as a matrix with multiple channels (=1 if grayscale)
• Suppose we have a NxM RGB image called “im”
– im(1,1,1) = top-left pixel value in R-channel
– im(y, x, b) = y pixels down, x pixels to right in the bth channel
– im(N, M, 3) = bottom-right pixel in B-channel
• imread(filename) returns a uint8 image (values 0 to 255)
– Convert to double format with double or im2double
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
R
G
B
rowcolumn
Adapted from Derek Hoiem
![Page 100: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/100.jpg)
Vectors and Matrices
• Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, word counts, movie ratings, pixel brightnesses, etc.
• We’ll define some common uses and standard operations on them.
Fei-Fei Li 3
![Page 101: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/101.jpg)
Vector
• A column vector where
• A row vector where
denotes the transpose operation
Fei-Fei Li 101
![Page 102: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/102.jpg)
Vector
• You’ll want to keep track of the orientation of your vectors when programming in MATLAB.
• You can transpose a vector V in MATLAB by writing V’.
Fei-Fei Li 102
![Page 103: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/103.jpg)
Vectors have two main uses
• Vectors can represent an offset in 2D or 3D space
• Points are just vectors from the origin
Fei-Fei Li 103
• Data can also be treated as a vector
• Such vectors don’t have a geometric interpretation, but calculations like “distance” still have value
![Page 104: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/104.jpg)
Example: Feature representation
• A vector representing measurable characteristics of a data sample we have
• E.g. a glass of juice can be represented via its color = {yellow=1, red=2, green=3, purple=4} and taste = {sweet=1, sour=2}
• A given glass i can be represented as a vector: xi = [3 2] represents sour, green juice
• For D features, this defines a D-dimensional space where we can measure similarity between samples
![Page 105: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/105.jpg)
Example: Feature representation
0 1 2 3 4
2
1
color
taste
E.g. a glass of juice can be represented via its color = {yellow=1, red=2, green=3, purple=4} and taste = {sweet=1, sour=2}
x2 = [3 2]x1 = [1 2]
x3 = [1 1]
Breakout: How many ways to measure distance between the samples?
L2 distance:d(x1, x2) = sqrt(4+0)d(x1, x3) = sqrt(0+1)d(x2, x3) = sqrt(4+1)
L1 distance:d(x1, x2) = 2+0d(x1, x3) = 0+1d(x2, x3) = 2+1
![Page 106: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/106.jpg)
Euclidean distance
![Page 107: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/107.jpg)
• L1 norm
• L2 norm
• Lp norm (for real numbers p ≥ 1)
Norms
![Page 108: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/108.jpg)
Matrix
• A matrix is an array of numbers with size 𝑚 ↓ by 𝑛 →, i.e. m rows and n columns.
• If , we say that is square.
Fei-Fei Li 108
![Page 109: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/109.jpg)
Matrix Operations• Addition
– Can only add a matrix with matching dimensions, or a scalar.
• Scaling
28-Aug-20 109
![Page 110: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/110.jpg)
• Let X be an axb matrix, Y be an bxc matrix
• Then Z = X*Y is an axc matrix
• Second dimension of first matrix, and first dimension of second matrix have to be the same, for matrix multiplication to be possible
• Practice: Let X be an 10x5 matrix. Let’s factorize it into 3 matrices…
Matrix Multiplication
![Page 111: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/111.jpg)
Matrix Multiplication• The product AB is:
• Each entry in the result is (that row of A) dot product with (that column of B)
Fei-Fei Li 111
![Page 112: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/112.jpg)
Matrix Multiplication• Example:
Fei-Fei Li 112
– Each entry of the matrix product is made by taking the dot product of the corresponding row in the left matrix, with the corresponding column in the right one.
![Page 113: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/113.jpg)
Inner Product
• Multiply corresponding entries of two vectors and add up the result
• x·y is also |x||y|Cos( angle between x and y )
• If B is a unit vector, then A·B gives the length of A which lies in the direction of B (projection)
Fei-Fei Li 113
(if B is unit-length hence norm is 1)
![Page 114: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/114.jpg)
Different Types of Product
• x, y = column vectors (nx1)• X, Y = matrices (mxn)• x, y = scalars (1x1)
• xT y = x · y = inner product (1xn x nx1 = scalar)• x⊗ y = x yT = outer product (nx1 x 1xn = matrix)
• X * Y = matrix product• X .* Y = element-wise product
![Page 115: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/115.jpg)
Matrix Operations
• Transpose – flip matrix, so row 1 becomes column 1
• A useful identity:
Fei-Fei Li 115
![Page 116: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/116.jpg)
• MATLAB example:
Fei-Fei Li 116
Matrix Operations
>> x = A\B
x =
1.0000
-0.5000
![Page 117: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/117.jpg)
Matrix Operation Properties
• Matrix addition is commutative and associative – A + B = B + A
– A + (B + C) = (A + B) + C
• Matrix multiplication is associative and distributive but not commutative – A(B*C) = (A*B)C
– A(B + C) = A*B + A*C
– A*B != B*A
![Page 118: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/118.jpg)
Special Matrices
• Identity matrix I– Square matrix, 1’s along
diagonal, 0’s elsewhere– I ∙ [another matrix] = [that
matrix]
• Diagonal matrix– Square matrix with numbers
along diagonal, 0’s elsewhere– A diagonal ∙ [another matrix]
scales the rows of that matrix
Fei-Fei Li 118
![Page 119: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/119.jpg)
Special Matrices
• Symmetric matrix
Fei-Fei Li 119
![Page 120: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/120.jpg)
Matlab
![Page 121: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/121.jpg)
Matlab tutorial
http://www.cs.pitt.edu/~kovashka/cs1674_fa18/tutorial.m http://www.cs.pitt.edu/~kovashka/cs1674_fa18/myfunction.m
http://www.cs.pitt.edu/~kovashka/cs1674_fa18/myotherfunction.m
Please cover whatever we don’t finish at home.
![Page 122: CS 1674: Intro to Computer Visionkovashka/cs1674_fa20/vision_01_intro.pdf · Computer vision is not solved • Deep learning makes excellent use of massive data (labeled for the task](https://reader036.vdocuments.mx/reader036/viewer/2022062415/605b049836c6fc7138134ad0/html5/thumbnails/122.jpg)
Other tutorials and exercises
• https://people.cs.pitt.edu/~milos/courses/cs2750/Tutorial/
• http://www.math.udel.edu/~braun/M349/Matlab_probs2.pdf
• http://www.facstaff.bucknell.edu/maneval/help211/basicexercises.html
– Do Problems 1-8, 12
– Most also have solutions
– Ask the TA if you have any problems