computer vision - biomisabiomisa.org/uploads/2017/04/lect-1.pdf · multimedia retrieval internet...

Post on 09-Mar-2020

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Computer Vision

Lecture # 1Introduction & Fundamentals

Introduction• Area of research: Analysis of medical

images/signals using Image/signal processing and Machine Learning Techniques

• Current Research Areas:

– Biomedical Image/Signal Analysis (Retina, Cardiac, Dental, EEG, Breath Sounds etc)

– Biometrics (FP, Dental, Retina, Dorsal hand veins etc)

*www.biomisa.org/usman

*www.biomisa.org

Text Book & References:

• David A. Forsyth and Jean Ponce,

Computer Vision− A Modern Approach,

2002 Ed (available from local market)

• Class slides & selected research papers

to be distributed by the instructor

• Mubarak Shah, Fundamentals of Computer Vision, 1997 (soft copy available online)

• Linda Shapiro and George Stockman, Computer Vision, 2000 (soft copy available online)

• Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, 3rd Edition, 2009 (available from local market)

Course Information• Course Material

– Lectures slides, assignments (computer/written), solutions to problems, projects, and announcements will be uploaded on course web page.

http://biomisa.org/usman/computervision

Course ContentsIntroduction

Camera

Geometry and Transformations

Camera Model and Parameters

Camera Calibrations

Multiview Geometry and

Stereopsis

Segmentation

K means algorithm

Mean Shift Algorithm

Background subtraction

Line fitting by RANSAC

Graph cut

Graph Theory Dynamic

Programming

Coherent Tensors

Hyperspectral Images

Image

Registration

MAC filters

Template Matching

Hausdroff Distance

Texture

Analysis

Gabor Filters

Wavelets

Oriented pyramids (Gaussian

and Laplacians)

Spot and Bar filters

Law Texture energy

Synthesis

Local Binary Patterns

Tracking

KLT

Optical flow

Motion vectors

Kalman Filters

MeanShift

Classificat

ion

Markov Models for Compute

Vision

Deep Learning

Prerequisites

• Linear algebra, basic calculus, and probability

• Experience with image processing or Matlab will help but is not necessary

CODE OF ETHICS

• All students must come to class on time (Attendance will be taken in first 5 to 10 mins)

• Students should remain attentive during class and avoid use of Mobile phone, Laptops or any gadgets

• Obedience to all laws, discipline code, rules and community norms

• Respect peers, faculty and staff through actions and speech

• Student should not be sleeping during class

• Bring writing material and books

• Class participation is encouraged

Policies• No extensions in assignment deadlines.

• Quizzes will be unannounced.

• Exams will be closed book.

• Never cheat.

– “Better fail NOW or else will fail somewhere LATER in life”

• Plagiarism will also have strict penalties.

Adapted from What is Plagiarism PowerPoint

http://mciu.org/~spjvweb/plagiarism.pptCourtesy Dr. Khawar

Grading Policy

Sessional Exams: 25%

Quizzes (4-6): 8%

Computer and numerical assignments: 7%

Paper + Presentation 10%

Project 10%

Final Exam: 40%

What is computer vision?

• Automatic understanding of images and video

– Computing properties of the 3D world from visual data (measurement)

– Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)

Computer Vision

goal is to emulate human vision (which is limited tothe visual band of electromagnetic (EM) spectrum),including learning and being able to make inferencesand take actions based on visual inputs

Why Computer Vision?

• An image is worth 1000 words

• Many biological systems rely on vision

• The world is 3D and dynamic

• Cameras and computers are cheap

• …

Overview

Image Formation

and Camera

Geometry Modeling and

Calibration

Image rectification

Segmentation Impose some order

on group of pixels to

separate them from

each other or infer

shape information

Processing on

Single Image Linear Filters

Edge detection

Texture

Multiple Images Multi-view geometry

Stereo imaging

Structure from motion

Interpretation Interpret objects

using geometric

information

Recognition Recognize

objects using

probabilistic

techniques

Real World

Action

What is Computer Vision?

given an image or more, extract properties of the 3Dworld:

- Traffic scene

- Number of vehicles

- Type of vehicles

- Location of closest obstacle

- Assessment of congestion

- Location of the scene captured

- …

sky

water

Ferris

wheel

amusement park

Cedar Point

12 E

tree

tree

tree

carouseldeck

people waiting in line

ride

ride

ride

umbrellas

pedestrians

maxair

bench

tree

Lake Erie

people sitting on ride

ObjectsActivitiesScenesLocationsText / writingFacesGesturesMotionsEmotions…

The Wicked

Twister

Vision for perception, interpretation

Related disciplines

Cognitive

science

Algorithms

Image

processing

Artificial

intelligence

GraphicsMachine

learningComputer

vision

Computer Vision and Nearby Fields

Derogatory summary of computer vision:

“Machine learning applied to visual data.”

J

H

Computer Vision and Nearby Fields

Derogatory summary of computer vision:

“Machine learning applied to visual data.”

J

H

Model of the world

Images, videos,sensor data…

Images, videos,interaction

Digital worldReal world

Computer Graphics Computer Vision

Question

answering

Why vision?• Images and video are everywhere!

Personal photo albums

Surveillance and security

Movies, news, sports

Medical and scientific images

Slide credit; L. Lazebnik

Optical character recognition (OCR)

Digit recognition, AT&T labs

http://www.research.att.com/~yann/

Technology to convert scanned docs to text• If you have a scanner, it probably came with OCR software

License plate readershttp://en.wikipedia.org/wiki/Automatic_number_plate_recognition

J

H

22

Examples: HCI

Try to make human computer interfaces more natural

Gesture recognition

Facial Expression Recognition

Lip reading

23

Examples: Sign Language/Gesture Recognition

British Sign Language Alphabet

24

Examples: Robotics

Safety and Security

Surveillance

Autonomous robots Driver assistanceMonitoring pools

(Poseidon)

Pedestrian detection[MERL, Viola et al.]

Face detection

• Almost all digital cameras detect faces

• Snapchat face filters

Object recognition (in supermarkets)

How does it work? Think-Pair-Share

http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-

adv.html&r=1&p=1&f=G&l=50&d=PG01&S1=(Steven.IN.+AND+Kessel.IN.)&OS=IN/Steven+and+IN/Kessel&RS=(IN/Steven+

AND+IN/Kessel)

Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns”

Read the story (Wikipedia)

J

H

Login without a password…

Login without a password…

Object recognition (in mobile phones)

e.g., Google Lens

3D from images

Building Rome in a Day: Agarwal et al. 2009

Human shape capture

Human shape capture

Human shape capture

Human shape capture

Star Wars: Rogue One – Peter Cushing / Admiral Tarkin

Special effects: shape capture

Special effects: shape capture

Special effects: motion capture

Interactive Games: Kinect

• Object Recognition: http://www.youtube.com/watch?feature=iv&v=fQ59dXOo63o

• Mario: http://www.youtube.com/watch?v=8CTJL5lUjHg

• 3D: http://www.youtube.com/watch?v=7QrnwoO1-8A

• Robot: http://www.youtube.com/watch?v=w8BmgtMKFbY

J

H

Sports

Sportvision first down line

Nice explanation on www.howstuffworks.com

http://www.sportvision.com/video.html

J

H

Medical imaging

Image guided surgery

Grimson et al., MIT3D imaging

MRI, CT

J

H

AutoCars - Uber bought CMU’s lab

Industrial robots

Vision-guided robots position nut runners on wheels

J

H

Vision in space

Vision systems (JPL) used for several tasks• Panorama stitching

• 3D terrain modeling

• Obstacle detection, position tracking

• For more, read “Computer Vision on Mars” by Matthies et al.

NASA'S Mars Exploration Rover Spirit captured this westward view from atop

a low plateau where Spirit spent the closing months of 2007.

J

H

Mobile robots

http://www.robocup.org/NASA’s Mars Spirit Rover

http://en.wikipedia.org/wiki/Spirit_rover

Saxena et al. 2008

STAIR at StanfordJ

H

Augmented Reality and Virtual Reality

MS HoloLens, Oculus, Magic Leap,

ARCore / ARKit

56

Problem Domain Application Input Pattern Output Class

Document Image

Analysis

Optical Character

Recognition

Document Image Characters/words

Document

Classification

Internet search Text Document Semantic categories

Document

Classification

Junk mail filtering Email Junk/Non-Junk

Multimedia retrieval Internet search Video clip Video genres

Speech Recognition Telephone directory

assistance

Speech waveform Spoken words

Natural Language

Processing

Information extraction Sentence Parts of Speech

Biometric Recognition Personal identification Face, finger print, Iris Authorized users for

access control

Medical Computer aided

diagnosis

Microscopic Image Healthy/cancerous cell

Military Automatic target

recognition

Infrared image Target type

Industrial automation Fruit sorting Images taken on

conveyor belt

Grade of quality

Bioinformatics Sequence analysis DNA sequence Known types of genes

Summary of Applications

Jitendra Malik, UC Berkeley

Three ‘R’s of Computer Vision

“[Further progress in] the classic problems of computational vision:

reconstruction

recognition

(re)organization

[requires us to study the interaction among these processes].”

Recognition, Reconstruction & Reorganization

Recognition

ReorganizationReconstruction

The Three R’s of Vision

Each of the 6 directed arcs in this diagram is a useful direction

of information flow

Recognition

Reconstruction Reorganization

The Three R’s of Vision

Recognition

Reconstruction Reorganization

Superpixel

assemblies as

candidates

PASCAL Visual Object Challenge (Everingham et al)

How about the other direction…

Recognition

Reconstruction Reorganization

Recognition Helps Reorganization

We train classifiers to predict top-downthe pixels belonging to the object

Original detection

Search nearby

Regress boxes

Segment

Score

Score

Score

Actions and Attributes from Wholes and PartsG. Gkioxari, R. Girshick & J. Malik

The Three R’s of Vision

We have explored category-specific 3D reconstruction.

Recognition

Reconstruction Reorganization

Category Specific Object ReconstructionKar, Tulisiani, Carreira & Malik

Basis Shape Models

Results

The Three R’s of Vision

These ideas apply equally well in a video setting

Recognition

Reconstruction Reorganization

Image classification

“Is there a dog in the

image?”

Object detection

“Is there a dog and

where is it in the

image?”

Images Video

Action detection

“Is there a person

diving and where is

it in the video?”

Action classification

“Is there a person

diving in the video?”

Assignment 1: Image Filtering and Hybrid Images

• Implement image filtering to separate high and low frequencies.

• Combine high frequencies and low frequencies from different images to create a scale-dependent image.

J

H

Assignment 2: Local Feature Matching

• Implement interest point detector, SIFT-like local feature descriptor, and simple matching algorithm.

J

H

Assignment 3: Scene Recognition with Bag of Words

• Quantize local features into a “vocabulary”, describe images as histograms of “visual words”, train classifiers to recognize scenes based on these histograms.

J

H

Assignment 4: Convolutional Neural Nets

• Asg 3 again, but state of the art.

J

H

Computer Vision Publications

• Journals

– IEEE Trans. on Pattern Analysis and Machine Intelligence (TPAMI)

– Internal Journal of Computer Vision (IJCV)

– IEEE Trans. on Image Processing

– …

Computer Vision Publications

• Conferences

– International Conference on Computer Vision (ICCV), once every two years

– IEEE Conf. of Computer Vision and Pattern Recognition (CVPR), once a year

– Europe Conference on Computer Vision (ECCV), once every two years

– …

Today’s Class

• PART II

– Transformation Matrix

1100

10

01

1

0

0

y

x

y

x

y

x

11000

100

010

001

1

0

0

0

z

y

x

z

y

x

z

y

x

Translation:

(2D)(3D)

Images courtesy of Dr Imtiaz A Taj

(MAJU)

)zz'z,yy'y,xx'x( 000

Basic Transformations

Cartesian Coordinate

System

Homogeneous Coordinate

System

Z

Y

X

W

k

kZ

kY

kX

Wh

(Euclidean Geometry) (Projective Geometry)

4h3h

4h2h

4h1h

3

2

1

WW

WW

WW

W

W

W

W

Scaling: )zS'z,yS'y,xS'x( zyx

Basic Transformations

1100

00

00

1

y

x

s

s

y

x

y

x

11000

000

000

000

1

z

y

x

s

s

s

z

y

x

z

y

x

(2D) (3D)

Rotation (2D):

- around origin

Basic Transformations

1100

0

0

1

y

x

CosSin

SinCos

y

x

p)T p(RT r-r

- around an arbitrary point(not origin)

r

Rotation (3D):

Basic Transformations (Cont.)

zR

xR

yR

1

z

y

x

1000

0100

00CosSin

00SinCos

1

z

y

x

1

z

y

x

1000

0CosSin0

0SinCos0

0001

1

z

y

x

1

z

y

x

1000

0Cos0Sin

0010

0Sin0Cos

1

z

y

x

aro

und x

-

axis

aro

und y

-

axis

aro

und z

-

axis

3D Rotation of Points

Rotation around the coordinate axes, counter-clockwise:

100

0cossin

0sincos

)(

cos0sin

010

sin0cos

)(

cossin0

sincos0

001

)(

z

y

x

R

R

R

p

p’

y

z

Slide Credit: Saverese

HomeTask- 1 (Ungraded)

• Download and install the latest release of OpenCV. Build and run your first openCV program.

Related Tutorials:

- Installing OpenCV 3 on Ubuntu: http://rodrigoberriel.com/2014/10/installing-opencv-3-0-0-on-ubuntu-14-04/

- Using OpenCV 3 with Eclipse: http://rodrigoberriel.com/2014/10/using-opencv-3-0-0-with-eclipse/

103

Acknowledgements

Some Slide material has been taken from Dr. Mehmood and Dr. Imtiaz Ali Taj Computer Vision Lectures

CSCI 1430: Introduction to Computer Vision by James Tompkin

Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000

Pattern Recognition and Analysis Course – A.K. Jain, MSU

Pattern Classification” by Duda et al., John Wiley & Sons.

Digital Image Processing”, Rafael C. Gonzalez & Richard E. Woods, Addison-Wesley, 2002

Machine Vision: Automated Visual Inspection and Robot Vision”, David Vernon, Prentice Hall, 1991

www.eu.aibo.com/

Advances in Human Computer Interaction, Shane Pinder, InTech, Austria, October 2008

Computer Vision A modern Approach by Frosyth

Mat

eria

l in

th

ese

slid

es h

as b

een

tak

en f

rom

, th

e fo

llow

ing

reso

urc

es

top related