computer vision - biomisabiomisa.org/uploads/2017/04/lect-1.pdf · multimedia retrieval internet...
TRANSCRIPT
Computer Vision
Lecture # 1Introduction & Fundamentals
Introduction• Area of research: Analysis of medical
images/signals using Image/signal processing and Machine Learning Techniques
• Current Research Areas:
– Biomedical Image/Signal Analysis (Retina, Cardiac, Dental, EEG, Breath Sounds etc)
– Biometrics (FP, Dental, Retina, Dorsal hand veins etc)
*www.biomisa.org/usman
*www.biomisa.org
Text Book & References:
• David A. Forsyth and Jean Ponce,
Computer Vision− A Modern Approach,
2002 Ed (available from local market)
• Class slides & selected research papers
to be distributed by the instructor
• Mubarak Shah, Fundamentals of Computer Vision, 1997 (soft copy available online)
• Linda Shapiro and George Stockman, Computer Vision, 2000 (soft copy available online)
• Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 3rd Edition, 2009 (available from local market)
Course Information• Course Material
– Lectures slides, assignments (computer/written), solutions to problems, projects, and announcements will be uploaded on course web page.
http://biomisa.org/usman/computervision
Course ContentsIntroduction
Camera
Geometry and Transformations
Camera Model and Parameters
Camera Calibrations
Multiview Geometry and
Stereopsis
Segmentation
K means algorithm
Mean Shift Algorithm
Background subtraction
Line fitting by RANSAC
Graph cut
Graph Theory Dynamic
Programming
Coherent Tensors
Hyperspectral Images
Image
Registration
MAC filters
Template Matching
Hausdroff Distance
Texture
Analysis
Gabor Filters
Wavelets
Oriented pyramids (Gaussian
and Laplacians)
Spot and Bar filters
Law Texture energy
Synthesis
Local Binary Patterns
Tracking
KLT
Optical flow
Motion vectors
Kalman Filters
MeanShift
Classificat
ion
Markov Models for Compute
Vision
Deep Learning
Prerequisites
• Linear algebra, basic calculus, and probability
• Experience with image processing or Matlab will help but is not necessary
CODE OF ETHICS
• All students must come to class on time (Attendance will be taken in first 5 to 10 mins)
• Students should remain attentive during class and avoid use of Mobile phone, Laptops or any gadgets
• Obedience to all laws, discipline code, rules and community norms
• Respect peers, faculty and staff through actions and speech
• Student should not be sleeping during class
• Bring writing material and books
• Class participation is encouraged
Policies• No extensions in assignment deadlines.
• Quizzes will be unannounced.
• Exams will be closed book.
• Never cheat.
– “Better fail NOW or else will fail somewhere LATER in life”
• Plagiarism will also have strict penalties.
Adapted from What is Plagiarism PowerPoint
http://mciu.org/~spjvweb/plagiarism.pptCourtesy Dr. Khawar
Grading Policy
Sessional Exams: 25%
Quizzes (4-6): 8%
Computer and numerical assignments: 7%
Paper + Presentation 10%
Project 10%
Final Exam: 40%
What is computer vision?
• Automatic understanding of images and video
– Computing properties of the 3D world from visual data (measurement)
– Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)
Computer Vision
goal is to emulate human vision (which is limited tothe visual band of electromagnetic (EM) spectrum),including learning and being able to make inferencesand take actions based on visual inputs
Why Computer Vision?
• An image is worth 1000 words
• Many biological systems rely on vision
• The world is 3D and dynamic
• Cameras and computers are cheap
• …
Overview
Image Formation
and Camera
Geometry Modeling and
Calibration
Image rectification
Segmentation Impose some order
on group of pixels to
separate them from
each other or infer
shape information
Processing on
Single Image Linear Filters
Edge detection
Texture
Multiple Images Multi-view geometry
Stereo imaging
Structure from motion
Interpretation Interpret objects
using geometric
information
Recognition Recognize
objects using
probabilistic
techniques
Real World
Action
What is Computer Vision?
given an image or more, extract properties of the 3Dworld:
- Traffic scene
- Number of vehicles
- Type of vehicles
- Location of closest obstacle
- Assessment of congestion
- Location of the scene captured
- …
sky
water
Ferris
wheel
amusement park
Cedar Point
12 E
tree
tree
tree
carouseldeck
people waiting in line
ride
ride
ride
umbrellas
pedestrians
maxair
bench
tree
Lake Erie
people sitting on ride
ObjectsActivitiesScenesLocationsText / writingFacesGesturesMotionsEmotions…
The Wicked
Twister
Vision for perception, interpretation
Related disciplines
Cognitive
science
Algorithms
Image
processing
Artificial
intelligence
GraphicsMachine
learningComputer
vision
Computer Vision and Nearby Fields
Derogatory summary of computer vision:
“Machine learning applied to visual data.”
J
H
Computer Vision and Nearby Fields
Derogatory summary of computer vision:
“Machine learning applied to visual data.”
J
H
Model of the world
Images, videos,sensor data…
Images, videos,interaction
Digital worldReal world
Computer Graphics Computer Vision
Question
answering
Why vision?• Images and video are everywhere!
Personal photo albums
Surveillance and security
Movies, news, sports
Medical and scientific images
Slide credit; L. Lazebnik
Optical character recognition (OCR)
Digit recognition, AT&T labs
http://www.research.att.com/~yann/
Technology to convert scanned docs to text• If you have a scanner, it probably came with OCR software
License plate readershttp://en.wikipedia.org/wiki/Automatic_number_plate_recognition
J
H
22
Examples: HCI
Try to make human computer interfaces more natural
Gesture recognition
Facial Expression Recognition
Lip reading
23
Examples: Sign Language/Gesture Recognition
British Sign Language Alphabet
24
Examples: Robotics
Safety and Security
Surveillance
Autonomous robots Driver assistanceMonitoring pools
(Poseidon)
Pedestrian detection[MERL, Viola et al.]
Face detection
• Almost all digital cameras detect faces
• Snapchat face filters
Smile detection
Sony Cyber-shot® T70 Digital Still Camera J
H
Object recognition (in supermarkets)
How does it work? Think-Pair-Share
http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-
adv.html&r=1&p=1&f=G&l=50&d=PG01&S1=(Steven.IN.+AND+Kessel.IN.)&OS=IN/Steven+and+IN/Kessel&RS=(IN/Steven+
AND+IN/Kessel)
Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns”
Read the story (Wikipedia)
J
H
Login without a password…
Login without a password…
Object recognition (in mobile phones)
e.g., Google Lens
3D from images
Building Rome in a Day: Agarwal et al. 2009
Human shape capture
Human shape capture
Human shape capture
Human shape capture
Star Wars: Rogue One – Peter Cushing / Admiral Tarkin
Special effects: shape capture
Special effects: shape capture
Special effects: motion capture
Interactive Games: Kinect
• Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o
• Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg
• 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A
• Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY
J
H
Sports
Sportvision first down line
Nice explanation on www.howstuffworks.com
http://www.sportvision.com/video.html
J
H
Medical imaging
Image guided surgery
Grimson et al., MIT3D imaging
MRI, CT
J
H
AutoCars - Uber bought CMU’s lab
Industrial robots
Vision-guided robots position nut runners on wheels
J
H
Vision in space
Vision systems (JPL) used for several tasks• Panorama stitching
• 3D terrain modeling
• Obstacle detection, position tracking
• For more, read “Computer Vision on Mars” by Matthies et al.
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
J
H
Mobile robots
http://www.robocup.org/NASA’s Mars Spirit Rover
http://en.wikipedia.org/wiki/Spirit_rover
Saxena et al. 2008
STAIR at StanfordJ
H
Augmented Reality and Virtual Reality
MS HoloLens, Oculus, Magic Leap,
ARCore / ARKit
56
Problem Domain Application Input Pattern Output Class
Document Image
Analysis
Optical Character
Recognition
Document Image Characters/words
Document
Classification
Internet search Text Document Semantic categories
Document
Classification
Junk mail filtering Email Junk/Non-Junk
Multimedia retrieval Internet search Video clip Video genres
Speech Recognition Telephone directory
assistance
Speech waveform Spoken words
Natural Language
Processing
Information extraction Sentence Parts of Speech
Biometric Recognition Personal identification Face, finger print, Iris Authorized users for
access control
Medical Computer aided
diagnosis
Microscopic Image Healthy/cancerous cell
Military Automatic target
recognition
Infrared image Target type
Industrial automation Fruit sorting Images taken on
conveyor belt
Grade of quality
Bioinformatics Sequence analysis DNA sequence Known types of genes
Summary of Applications
Jitendra Malik, UC Berkeley
Three ‘R’s of Computer Vision
“[Further progress in] the classic problems of computational vision:
reconstruction
recognition
(re)organization
[requires us to study the interaction among these processes].”
Recognition, Reconstruction & Reorganization
Recognition
ReorganizationReconstruction
The Three R’s of Vision
Each of the 6 directed arcs in this diagram is a useful direction
of information flow
Recognition
Reconstruction Reorganization
The Three R’s of Vision
Recognition
Reconstruction Reorganization
Superpixel
assemblies as
candidates
PASCAL Visual Object Challenge (Everingham et al)
How about the other direction…
Recognition
Reconstruction Reorganization
Recognition Helps Reorganization
We train classifiers to predict top-downthe pixels belonging to the object
Original detection
Search nearby
Regress boxes
Segment
Score
Score
Score
Actions and Attributes from Wholes and PartsG. Gkioxari, R. Girshick & J. Malik
The Three R’s of Vision
We have explored category-specific 3D reconstruction.
Recognition
Reconstruction Reorganization
Category Specific Object ReconstructionKar, Tulisiani, Carreira & Malik
Basis Shape Models
Results
The Three R’s of Vision
These ideas apply equally well in a video setting
Recognition
Reconstruction Reorganization
Image classification
“Is there a dog in the
image?”
Object detection
“Is there a dog and
where is it in the
image?”
Images Video
Action detection
“Is there a person
diving and where is
it in the video?”
Action classification
“Is there a person
diving in the video?”
Assignment 1: Image Filtering and Hybrid Images
• Implement image filtering to separate high and low frequencies.
• Combine high frequencies and low frequencies from different images to create a scale-dependent image.
J
H
Assignment 2: Local Feature Matching
• Implement interest point detector, SIFT-like local feature descriptor, and simple matching algorithm.
J
H
Assignment 3: Scene Recognition with Bag of Words
• Quantize local features into a “vocabulary”, describe images as histograms of “visual words”, train classifiers to recognize scenes based on these histograms.
J
H
Assignment 4: Convolutional Neural Nets
• Asg 3 again, but state of the art.
J
H
Computer Vision Publications
• Journals
– IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI)
– Internal Journal of Computer Vision (IJCV)
– IEEE Trans. on Image Processing
– …
Computer Vision Publications
• Conferences
– International Conference on Computer Vision (ICCV), once every two years
– IEEE Conf. of Computer Vision and Pattern Recognition (CVPR), once a year
– Europe Conference on Computer Vision (ECCV), once every two years
– …
Today’s Class
• PART II
– Transformation Matrix
1100
10
01
1
0
0
y
x
y
x
y
x
11000
100
010
001
1
0
0
0
z
y
x
z
y
x
z
y
x
Translation:
(2D)(3D)
Images courtesy of Dr Imtiaz A Taj
(MAJU)
)zz'z,yy'y,xx'x( 000
Basic Transformations
Cartesian Coordinate
System
Homogeneous Coordinate
System
Z
Y
X
W
k
kZ
kY
kX
Wh
(Euclidean Geometry) (Projective Geometry)
4h3h
4h2h
4h1h
3
2
1
WW
WW
WW
W
W
W
W
Scaling: )zS'z,yS'y,xS'x( zyx
Basic Transformations
1100
00
00
1
y
x
s
s
y
x
y
x
11000
000
000
000
1
z
y
x
s
s
s
z
y
x
z
y
x
(2D) (3D)
Rotation (2D):
- around origin
Basic Transformations
1100
0
0
1
y
x
CosSin
SinCos
y
x
p)T p(RT r-r
- around an arbitrary point(not origin)
r
Rotation (3D):
Basic Transformations (Cont.)
zR
xR
yR
1
z
y
x
1000
0100
00CosSin
00SinCos
1
z
y
x
1
z
y
x
1000
0CosSin0
0SinCos0
0001
1
z
y
x
1
z
y
x
1000
0Cos0Sin
0010
0Sin0Cos
1
z
y
x
aro
und x
-
axis
aro
und y
-
axis
aro
und z
-
axis
3D Rotation of Points
Rotation around the coordinate axes, counter-clockwise:
100
0cossin
0sincos
)(
cos0sin
010
sin0cos
)(
cossin0
sincos0
001
)(
z
y
x
R
R
R
p
p’
y
z
Slide Credit: Saverese
HomeTask- 1 (Ungraded)
• Download and install the latest release of OpenCV. Build and run your first openCV program.
Related Tutorials:
- Installing OpenCV 3 on Ubuntu: http://rodrigoberriel.com/2014/10/installing-opencv-3-0-0-on-ubuntu-14-04/
- Using OpenCV 3 with Eclipse: http://rodrigoberriel.com/2014/10/using-opencv-3-0-0-with-eclipse/
103
Acknowledgements
Some Slide material has been taken from Dr. Mehmood and Dr. Imtiaz Ali Taj Computer Vision Lectures
CSCI 1430: Introduction to Computer Vision by James Tompkin
Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
Pattern Recognition and Analysis Course – A.K. Jain, MSU
Pattern Classification” by Duda et al., John Wiley & Sons.
Digital Image Processing”, Rafael C. Gonzalez & Richard E. Woods, Addison-Wesley, 2002
Machine Vision: Automated Visual Inspection and Robot Vision”, David Vernon, Prentice Hall, 1991
www.eu.aibo.com/
Advances in Human Computer Interaction, Shane Pinder, InTech, Austria, October 2008
Computer Vision A modern Approach by Frosyth
Mat
eria
l in
th
ese
slid
es h
as b
een
tak
en f
rom
, th
e fo
llow
ing
reso
urc
es