applied computer vision - a deep learning approach

61
Applied Computer Vision FOR UNDERGRADS A Deep Learning Approach J. Berengueres

Upload: jose-berengueres

Post on 28-Nov-2014

359 views

Category:

Education


6 download

DESCRIPTION

text book for undergrads

TRANSCRIPT

Page 1: Applied Computer Vision - a Deep Learning Approach

Applied Computer Vision

FOR UNDERGRADS

A Deep Learning ApproachJ. Berengueres

Page 2: Applied Computer Vision - a Deep Learning Approach

Applied Computer Vision for Undergrads version 2.2

EditorJose Berengueres

Edition First Edition. August 24th, 2014.

Text Copyright© Jose Berengueres 2014. All Rights Reserved.

i

©

Page 3: Applied Computer Vision - a Deep Learning Approach

Video, Audio & Artwork CopyrightArtwork appearing in this work is subject to their corresponding original Copyright or Creative Commons License. Except where otherwise noted a Creative Commons Attribution 3.0 License applies.

Limit of Liability

The editor makes no representations or warranties concerning the accuracy or exhaustivity of the contents and theories hereby presented and particularly disclaim any implied warranties regarding merchantability or fitness for a particular use including but not limited to educational, industrial and academic application. Neither the editor or the authors are liable for any loss or profit or any commercial damages including but not limited to incidental, consequential or other damages.

Support

This work was supported by:

UAE University V ii

Page 4: Applied Computer Vision - a Deep Learning Approach

1 | Intro

http://www.howwedrive.com/2012/01/23/let-the-robot-drive/

Page 5: Applied Computer Vision - a Deep Learning Approach

How face recognition woks

http://vimeo.com/12774628

Andrew NG

http://www.youtube.com/watch?v=AY4ajbu_G3k

Google Self driving Car

https://www.youtube.com/watch?v=YXylqtEQ0tk

4

1 | Intro - Topics

Introdu

ction

videos

Page 7: Applied Computer Vision - a Deep Learning Approach

6

Page 8: Applied Computer Vision - a Deep Learning Approach

Projective Geometry

Many Books on Computer vision start by this topic. Which is quite irrelevant. Let me explain, Picasso did not benefit from learning the chemistry of Yellow paint manufacturing. You neither. You don’t need 3D geometry knowledge, you just need great 2D geometry knowledge. Additionally, his paintings were one of the first (since the middle ages) to ignore the laws of projective geometry. And they where a big success. His father was a frustrated art teacher, but he made sure that young Picasso would not become like himself. Picasso like Tiger Woods was a product of his father.

7

2 | Theoretical Stuff - Projective Geometry

Caring about projective Geom in Computer Vsion is the same mistake as “model of the world” vs. “ground model” - you should not care about the world because you already have one model of it:

Your retina is your world and your model! Do not complicate it by adding an additional layer or

model of a model that u have to update continuously. You don’t even know if the world is

real. See Godl.------

Reality?Your Reality!

Page 9: Applied Computer Vision - a Deep Learning Approach

Pin-hole Design

Pin-hole design is one of the few cheap ways to convert a 3D world into a 2D picture (simplification). That is why man-made cameras use the same principle. For a fascinating story on how the pinhole eye is used by biological systems I recommend Climbing mount improbable. There are other ways in which nature “sees”:

the bat’s ultrasound vision

dolphins’ ultrasound vision

the compound eye of fly

Polarized light vision

From a computer vision point of view it should not mater if your vision device is pinhole based or not.

Compound Eye. What is the pixel resolution of the compound eye?

8

2 | Theoretical Stuff - The Design of the Eye

Page 10: Applied Computer Vision - a Deep Learning Approach

9

2 | Theoretical Stuff - The Design of the Eye

Page 11: Applied Computer Vision - a Deep Learning Approach

http://www.detectingdesign.com/humaneye.html

10

2 | Theoretical Stuff - The Design of the Eye

Page 12: Applied Computer Vision - a Deep Learning Approach

Hit a WallWhen Maja Rudinac hit the wall of unpractical computer vision, she turned to developmental psychologists for strategies to cope with large amounts of image pixels. This is what she found (so you don’t have to):

Visual Developmental Psychology

Basics(Abridged from Maja Rudinac PhD thesis, TUDelft)

Babies at the 4th month can already tell if a character is bad or good because we can see who they hug longer.

Infants look longer at new faces or new objects

Independent of where are born, all babies know

boundaries of objects.

Can predict collisions

Basic additive and subtractive cognition

Can identify members of own group versus non-own group

Spontaneous motor movement is not goal directed at the

onset. The baby explores the degrees of freedomGoal directed arm-grasp appears at the 4th month

The ability to engage and disengage attention on targets

appears from day 1 in babies.

Smooth visual tracking is present at birth

How baby cognition “works”Development of actions of babies is goal directed by two motives. Actions are either,

1. To discover novelty

2. To discover regularity

3. To discover the potential of their own body

Development of Perception Perception in babies is driven by two processes:

1. Detection of structure or patterns

11

2 | Theoretical Stuff - Developmental Psychology

Page 13: Applied Computer Vision - a Deep Learning Approach

2. Discarding of irrelevant info and keeping relevant info

Cognitive Robot Shopping List

So if we want to make a minimum viable product (MVP)

that can understand the world at least (as well or as poorly) as a baby does, this are the functions that according to Mrs. Maja (pronounced Maya) we will need:

A WebCam

Object Rracking

Object Discrimination

Attraction to peoples faces

Face Recognition

Use the hand to move objects to scan them form various angles

Shades and 3DTurns out that shades have a disproportionate influence in helping us figure out 3D info from 2D retina pixels. When researchers at Univ. of Texas used fake shades in a virtual

reality world, participants got head aches (because the faking of the shades was not precise enough to fool the brain. The brain got confused by the imperceptible mismatches... that’s why smart people get head aches in 3D cinemas)

12

2 | Theoretical Stuff - Developmental Psychology

Page 14: Applied Computer Vision - a Deep Learning Approach

3 | Number Recognition Workshop

Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html

2014. Number recognition workshop purpose is to learn computer vision basics.

In this chapter we will learn the four basic components of a typical computer vision program:

1. Features

2. Clustering

3. Filtering/Morph ops

4. Validation

we will use the example to learn OpenCV. And finally we will learn why manual feature making is obsolete because deep learning.

Let’s lea

rn by m

eans

of a sim

ple exam

ple

What does it mean to classify numbers?

------

1 2 3 4 5 6 7 8 9 0

Page 15: Applied Computer Vision - a Deep Learning Approach

IntroThis workshop drives the introduction OpenCV functions as needs arise. Let’s identify written numbers from 0 to 9. You can get some inspiration in this video:

http://www.youtube.com/watch?v=D_cZBdfw-hQ

In Feb 2011 (just before the tsunami), we programmed HRP-IV to play a game. We used the histogram method to separate Caucasian skin from a background and then we counted the number of valleys and mountains between the hull vertexes. A more primitive approach is to average the area, but then it is not as robust.

Workshop Time

Teams of 4. You have 15 minutes to come up with some algorithm, trick or rule of thumb to classify the numbers.

Note The students will try to find saliency features. Good saliency features are robust to:

noise

partial occlusions and,

confusion

Here is a typical list:

# of horizontal segments

# of pixels belonging to horizontal segments

Length of segments

# contains closed loops

# start and end points

Relative orientation of end points

Extracting features from pixels is called feature extraction. Salient features are the ones that are most

useful in classifying pixels (by Information Entropy).

14

3 | Number Recognition Workshop - Number Recognition

Page 16: Applied Computer Vision - a Deep Learning Approach

1 2 3 4 5 6 7 8 9 0Feature Reduction as a Necessity

Consider the letter ‘E’, its representation a s a vector is e = { 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,0,0,1,1,1,1,1,1,1,0,0 0,0,1,0,0,0,0,0,0 ... } that is a a vector of dimension 11 x 13 = +100. There is no way for the

retinal nerve to have enough bandwidth to send all

that information for processing to the brain neurons. What happens in reality is that, already at retina level, features

15

3 | Number Recognition Workshop - Number Recognition

Page 17: Applied Computer Vision - a Deep Learning Approach

are already being extracted, information is compressed (edge filters).

Minimum Number of FeaturesIf we find five features that allow us to distinguish between al 10 different number shapes, then the problem becomes

a dimension 5 problem, much more manageable.

Minimum Number of features formula

N> Log2 (different labels)

Because the problem can be complex. Students can start by trying to distinguish just two numbers (the 1 and the 0). Manually extracting two features: vertical and horizontal features (feature reduction). Then, we can map the numbers in clusters. This helps them understand the need of more features when clusters overlap.

TrainingSo if we start with webcam photos of figures at 600x400 pixels how would you design the program? At this stage most students fail to recognize the need for training with lots of examples. In fact the hard part of this workshop is training (understanding what features work to differentiate 0 from 1 and a 1 from a 2...) (See also Andrew Ng video in Chapter 1)

16

3 | Number Recognition Workshop - Number Recognition

Page 18: Applied Computer Vision - a Deep Learning Approach

ReflectionSome teams came up with valley mountain features. Others with number of lines, or crossings.

Let’s make a competition to see which is the winning team! (this is also a great excuse to help them learn openCV)

Cognition Leads to Play?Once we have a working program with the ability to recognize numbers. We can Apply Toyota’s Kaizen (see The Brown Book of Design Thinking Chapter 3). What kind of games or apps can we make? What kind of applications can u imagine? Make a useful app.

17

3 | Number Recognition Workshop - Number Recognition

Page 19: Applied Computer Vision - a Deep Learning Approach

18

3 | Number Recognition Workshop - Number Recognition

Page 20: Applied Computer Vision - a Deep Learning Approach

All com

puter vi

sion

apps tod

ay

fundamenta

lly work

the sam

e as this

example

Now we just need to learn

how to:

1. Do tracking

2. Clustering methods

3. How extract features with

opencv without reinventing

the wheel (next)

If you can make an

app that recognizes my hand writting better than my

app, I will give you a....

A +

19

3 | Number Recognition Workshop - Number Recognition

Page 21: Applied Computer Vision - a Deep Learning Approach

How to Cluster Features (review of three methods)

By Closest Neighbor By Centroids (Center of mass) k-means

20

3 | Number Recognition Workshop - Clustering of Features

Let’s

have an

honest

discussio

n abou

t

clusteri

ng

Page 22: Applied Computer Vision - a Deep Learning Approach

By Probability

Given x is in x position what minimizes Prob. Error

Comparison of methods Centroid --> x belongs to Green

Neighbor --> Green

Statistic --> Red

x

21

3 | Number Recognition Workshop - Clustering of Features

Where does ‘x’ belong?-----

What is the best “strategy”

Page 23: Applied Computer Vision - a Deep Learning Approach

http://en.wikipedia.org/wiki/Cluster_analysis

22

3 | Number Recognition Workshop - Clustering of Features

Where does ‘x’ belong?-----

What is the best “strategy”

The pre

vious ar

e the

main 3 un

supervis

ed

clusteri

ng metho

ds

Page 24: Applied Computer Vision - a Deep Learning Approach

The Feature - Cluster ConundrumNow ask the students to make a flowchart of a program that can label hand written numbers 0 to 9. At this point most students come up with this flow chart

The purpose of the exercise is to let the student rea l ize the difference between training and labeling. Training is the hard part. Another hard part to get is how to discard useful from not useful features in the Dimension reduction phase. Previusly we said to use Entropy as a indicator of how useful a feature can be. However, if I plug random noise as a feature it will socre high on entropy. Because the quality of resutls depends on clustering and clusters depends on what features we choose, it is not a good idea to decouple features discarding from clustering itself. This is called the feature-cluster conundrum.

In the next chapter, lets find how feature-finding has been traditionally approached in the 80’s and 90’s (this is now of course obsolete knowledge but I will include it here as a “nostalgic” note).

23

3 | Number Recognition Workshop - Clustering of Features

So, learnt

(knowledge) = List of Feature + Cluster

boundaries?

Page 25: Applied Computer Vision - a Deep Learning Approach

Morphological OpsMorphs ops is different from linear filters in that they are not linear. Imagine you have the letter E but that the corner pixel has been erased because of some noise. Your brain

(Gestalt Theory) can reconstruct, in fact, it is

designed to reconstruct intersections of lines.

However, a computer is not you. It does not know anything about Gestalt and so how can we reconstruct this missing corner automatically? If we don’t the computer might think this is an F underlined.

You can reconstruct this by use of a so called

‘morph op’ called closing

24

3 | Number Recognition Workshop - Morphological Ops

F!E!

Page 26: Applied Computer Vision - a Deep Learning Approach

Closing1. Dilate - enlarge black pixel by adding black pixel next to

pre existing black pixels using some kind of rule

http://www.youtube.com/watch?v=xO3ED27rMHs

2. Erode - the reverse process

http://www.youtube.com/watch?v=fmyE7DiaIYQ

before after

OpeningSame as previous but in reverse 2 and 1 in order

before after

Case uses Closing is used to connect missing lines, parts.

Opening is used to remove noise, that does not belong to the largest object in the scene.

25

3 | Number Recognition Workshop - Morphological Ops

Page 27: Applied Computer Vision - a Deep Learning Approach

Structuring element

In this case the blue cross is the structuring element. IT can be any other shape

For more details: http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm

http://bigwww.epfl.ch/demo/demoteaching.html Demos

PracticeWe can use Excel to practice manual closing and opening

26

3 | Number Recognition Workshop - Morphological Ops

Page 28: Applied Computer Vision - a Deep Learning Approach

L shape Excel exercise

In this exercise, I asked the students to come up with a structuring element to reconstruct the E shape. Most of them propose the L-shape 3x3. After two closings we can realize the following: We managed to reconstruct the missing corner, but if the structuring element size will obliterate details inthe picture smaller than itself. This is one drawback of morphological operation.

27

3 | Number Recognition Workshop - Morphological Ops

Page 29: Applied Computer Vision - a Deep Learning Approach

Code review - What’s wrong with openCV

thinkinghttps://github.com/orioli/MAID-ROBOT

https://github.com/orioli/MAID-ROBOT/blob/master/uEyeCameraHIRO/camShiftDemo/camShiftDemo.cpp

What the robot sees.

Results of the code review:

400 lines of code to count 5 fingers.

Not robust to skin color change

Not resuable code or a general purpose solution

it is too custom --> obsolescence assured

28

3 | Number Recognition Workshop - OpenCV Code Review

This project of 2007 is an example of everything

what is wrong with OpenCV thinking. It is not the way forward. The brain does not work like this. It

does not scale. So whats next?------

Page 30: Applied Computer Vision - a Deep Learning Approach

Now students have all the knowledge to make the sw to identify numbers. Ask the students to draft a detailed action plan to classify characters from 0 to 9 from photos.

Here is an example:

1.Get the training dataset

1.How many need? 100 of each number?

1. Organize folders /1 /2 /3 …

2. Go to cafeteria and ask people to give sample

of number

3. Digitize it. How? Take a picture with the

iPhone

2.Training SW

1.Find features for numbers

1. How many do we need?

1. Five?

2. Let the SW choose the useful ones

2. Edges

3. Horizontal lines

4. Closed spaces

5. Shapes?

3.Cluster dataset

1. How good is the clustering? ! Testing

2. Choose the method with highest accuracy across

testing subsets

3. The model will tell you which features are useful and

which are not at labeling

4. How do you test it?

5. Bring other (new) numbers and check

6. Split dataset into training set and testing set. 50% -

50%

7. Prevent overfitting - Divide test data into chunks of 10

and see prediction accuracy for each individual chunk

Now that students have drafted a plan let them do something concrete. draw a matrix of 1 and 0 that represents the binary matrix image of number 7 and ask them how would they extract features.

29

3 | Number Recognition Workshop - Edge Detectors & convolution

Page 31: Applied Computer Vision - a Deep Learning Approach

Feature Extraction

In Excel we can use conditional highlighting to visualize the filtering process. We start by a 7. Ask the students

How would you extract a feature from the ones and zeroes?

30

3 | Number Recognition Workshop - Edge Detectors & convolution

Page 32: Applied Computer Vision - a Deep Learning Approach

Gaussian filtering by convolution

31

3 | Number Recognition Workshop - Edge Detectors & convolution

Page 33: Applied Computer Vision - a Deep Learning Approach

Vertical edge detector by convolution

See also Canny Filters and Laplacians1. http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html

2. http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe

3. http://www.youtube.com/watch?v=pIFnFhDsYlk

32

3 | Number Recognition Workshop - Edge Detectors & convolution

Page 34: Applied Computer Vision - a Deep Learning Approach

33

3 | Number Recognition Workshop - Edge Detectors & convolution

At this

point we ca

n realiz

e that

numbers

are tra

ces so w

hat we

need is

to clas

sify typ

es of

traces?

not the

detectio

n of the

edges?

What kind

of edg

e classi

fier can

we use? v

iola-jon

es?

convolu

tion? we want

to know

:

type of

edge, p

osition

1 2 3 4 5 6 7 8 9 0

Page 35: Applied Computer Vision - a Deep Learning Approach

http://practicalquant.blogspot.ae/2013/10/deep-learning-oral-traditions.html

34

3 | Number Recognition Workshop - Edge Detectors & convolution

4 (0,0.5,4,13,33) it is a four!

Convert to +100 features found by hand

Example feature: Has a cross in the lower mid feature

Traditio

nal

approac

h

Page 36: Applied Computer Vision - a Deep Learning Approach

35

3 | Number Recognition Workshop - Edge Detectors & convolution

it is a three!

LEVEL 3SPACIAL RELATIONSHIPDETECTOR

Layer -

abstract

ion

Deep le

arning

approac

h

LEVEL 2PRIMITIVE SHAPE DETECTORS

LEVEL 1EDGE DETECTOR BANK

http://cs.brown.edu/courses/cs143/2011/results/proj2/thuhe/

Page 37: Applied Computer Vision - a Deep Learning Approach

Feature Extraction

http://www.youtube.com/watch?v=n1ViNeWhC24

36

3 | Number Recognition Workshop - Features and Sparse Coding

Page 38: Applied Computer Vision - a Deep Learning Approach

At this point, it is probably a good time to ask the students to try to make a little system to classify handwritten numbers. See what error rate they come up. For the training set, they can ask friends to write numbers. Whatever you do, do not forget to set up a deadline, otherwise they will seize the opportunity. ( #rookiemistake )

How to do automatic

feature extraction?

37

3 | Number Recognition Workshop - Features and Sparse Coding

Manual Feature Extraction is not the way forward

Most of Kaggle’s comp.

winners are decided by how lucky they are at

finding useful features----

D. Efimov

Page 39: Applied Computer Vision - a Deep Learning Approach

4 | Numer Recognition Workshop Solution

Number Recognition Workshop Solution

Page 40: Applied Computer Vision - a Deep Learning Approach

RecapitulationIn the previous chapter we saw how to manually make features (aka feature extraction). We also so that feature extraction is about 50% of the work to win a kaggle competition. The other 50% is optimizing the mathematical prediction model ( Efimov 2012 ). We also saw that some geniuses, like Andrew Ng, postulate that manual feature extraction is a waste of time, that this is the kind of nitty gritty job that should be done by computers. We also saw in the section feature-cluster conundrum one more reason why feature engineering is part of the problem itself. In the section about Sparse Coding we saw a mathematical foundation that is a good reverse engineering of hwo the brain finds good features because both sparse coding and the brain end up with similar edge detector filters.

DNN online, on-demandThe company Ersats allows you to try Deep Neural Networks models online. They have a demo based on number recognition that we will use now (which is what we have been doing in the last chapter). The handwritten training sets are available from NIST at http://yann.lecun.com/exdb/mnist/. The author, Yann LeCun, compared different methods to to the job...Convolutional neural networks do the job better than metric approaches such as SVM.

39

4 | Numer Recognition Workshop Solution - Deep Learning Workshop

Page 41: Applied Computer Vision - a Deep Learning Approach

The MNIST tutorial

This tutorial explains how to use the cloud infrastructure to solve the number recognition problem via Deep Neural Networks.

http://www.ersatzlabs.com/documentation/sharedMNIST/

More background info, current backdrops and videos on history of ANN at: (playlist of 5 videos)

History of neural nets (playlist)

https://www.youtube.com/watch?v=4B-XY8a4RGk

https://gigaom.com/2014/06/11/more-deep-learning-for-the-masses-courtesy-of-ersatz-labs/

40

4 | Numer Recognition Workshop Solution - Deep Learning Workshop

Page 42: Applied Computer Vision - a Deep Learning Approach

Additionally, Ersats published an interesting introduction to Neural Nets at:

http://neuralnetworksanddeeplearning.com/chap1.html

41

4 | Numer Recognition Workshop Solution - Deep Learning Workshop

Page 43: Applied Computer Vision - a Deep Learning Approach

5 | Advanced Topics

Advanced Topics

Page 44: Applied Computer Vision - a Deep Learning Approach

Using Vision to Navigate

http://www.youtube.com/watch?v=8c2SFXQ5zHM Sir James explains

http://www.youtube.com/watch?v=oguKCHP7jNQ Navigation

http://www.youtube.com/watch?v=xlaqYDZwoWo#t=43 Suction

NT: Hoover tried to steal his idea. They lost in court with punitive damages. Top right photo HEAD SPORTS.

43

5 | Advanced Topics - Vision for Navigation

One day my wife was sick so I had to vacuum the

house. Then I realized that bag based vacuum cleaners do not suck, So I decided to make one that sucks more. I took longer than expected

though. ------

Sir J. Dyson

Page 45: Applied Computer Vision - a Deep Learning Approach

What People Cured of Blindness See

Abridged from “What People Cured of Blindness See”

BY PATRICK HOUSE” The New Yorker

How quickly, if at all, does the brain adapt and vision return after surgery? A simple answer, and a correct one, is that it depends entirely on circumstance. Back in 1993, Oliver Sacks wrote a story in the magazine about Virgil, a man with

limited to no vision as a child who had developed cataracts at the age of six. After his cataracts were removed, fifty years

later, Virgil had trouble adjusting. (For example, he could not always distinguish the letter “A” from the letter “H” and, when given Molyneux’s test, could not tell a square he felt from a square he saw.)*

Since the surgeries, Sinha has followed up with the Prakash children and found that, while they continued to suffer from poor acuity, many higher-order aspects of vision seemed to be improving. Within a week to a few months after surgery, the children could match felt objects to their visual counterparts. They also improved on spatial-navigation tasks requiring mental imagery, which tested their ability to follow a series of up, down, left, and right directions on a visually imagined game board. This finding was particularly important because previous work by Kosslyn and others had found that the congenitally blind have a capacity for mental imagery, but it is limited in some ways and becomes increasingly poor as the task becomes more complex. (In one example, a sighted person will imagine a typewriter a few feet away as larger than the same one imagined a hundred feet away. Among the congenitally blind, however, the imagined typewriter—a composite of experiences of touch and sound alone—is the same size at all distances.)

44

5 | Advanced Topics - Blindness

Page 46: Applied Computer Vision - a Deep Learning Approach

Kosslyn believes that any improvements in mental imagery will require a “catalogue of visual memories” that can then be used to build expectations about the visual world. “When you develop expectations, you can use the fruits of previous experience to help you process what’s coming in now,” Kosslyn said. “But you need to have had that experience.” An example is depth perception: to the sighted, with a lifetime of practice, rules about occlusion (if A occludes B, object A is closer) and foreshortening (objects farther away appear smaller) are continually used to combine incoming light into a rich, three-dimensional world. The absence of these rules can frustrate the newly sighted, whose visual world can be both blurry and two-dimensional—paintings and people are often

described as “flat, with dark patches”; a far-

away house is “nearby, but requiring the taking of a lot of steps”; streetlights seen through glass are “luminous stains stuck to the window”; sunbeams through tree branches collapse into a single “tree with all the lights in it.” (The writer Jorge Luis Borges, who went blind at age fifty-five, described going blind as a process by which “everything near becomes distant.” In the newly sighted, without depth perception, the opposite seems true: the distant—tiny houses on the horizon, clouds in the impossibly high sky—suddenly looks nearby.)

45

5 | Advanced Topics - Blindness

Page 47: Applied Computer Vision - a Deep Learning Approach

Ways of tracking

http://www.youtube.com/watch?v=InqV34BcheM

1. By Color histogram (HSV is less dependent on illumination)

2. By Blob (OpenCV Library, not very robust)

3. By Face Detection

4. By Saliency (robust to occlusions

Traditional methodsTypical Keypoint Extraction for recognition of objects independent of view:

Harris Afine

MSER

Hessian Afine

Maja’s method InsightFeatures extracted

Use of HSV histogram (robust to ilumination changes)

Texture by Gray level co-occurrence matrix

Edge orientation histogram (6 bins)

Mean, skewness and sd for each color channel

Discard all but 25 top features.

Tested on Columbia Object Image Library. Beats previous methods.

46

5 | Advanced Topics - Easy Tracking

We are looking for an algo that is invariant to

partial oclusions------

Maja R.

Page 48: Applied Computer Vision - a Deep Learning Approach

47

5 | Advanced Topics - Easy Tracking

Page 49: Applied Computer Vision - a Deep Learning Approach

For more:

48

5 | Advanced Topics - Easy Tracking

Page 50: Applied Computer Vision - a Deep Learning Approach

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. See these course notes for a brief introduction to Machine Learning for AI and an introduction toDeep Learning algorithms. www.deeplearning.net/tutorial/

Deep Learning explained(Abridged from the original from Pete Warden | @petewarden)http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html

Inside an ANN

The functions that are run inside an ANN are controlled by the memory of the neural network, arrays of numbers known as weights that define how the inputs are combined and recombined to produce the results. Dealing with real-world problems like cat-detection requires very complex functions, which mean these arrays are very large, containing around

60 million (60MBytes) numbers in the case of one of

the recent computer vision networks. The biggest obstacle to using neural networks has been figuring out how to set all these massive arrays to values that will do a good job transforming the input signals into output predictions.

Renaissance

It has always been difficult to train an ANN. But in 2012, a

breakthrough, a paper sparks a renaissance in ANN. Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton bring together a

whole bunch of different ways of accelerating the

learning process, including convolutional networks, clever use of GPUs, and some novel mathematical tricks like ReLU and dropout, and showed that in a few weeks they could

49

5 | Advanced Topics - Deep Learning

Page 51: Applied Computer Vision - a Deep Learning Approach

train a very complex network to a level that outperformed conventional approaches to computer vision.

GPU photo by Pete Warden slides (Jetpack)

Listen to the Webcast at Strata 2013http://www.oreilly.com/pub/e/3121

http://www.iro.umontreal.ca/~pift6266/H10/intro_diapos.pdf

Deep NN failed unitl 2006....

50

5 | Advanced Topics - Deep Learning

Page 52: Applied Computer Vision - a Deep Learning Approach

Automatic speech recognitionThe results shown in the table below are for automatic speech recognition on the popular TIMIT data set. This is a common data set used for initial evaluations of deep learning architectures. The entire set contains 630 speakers from eight major dialects of American English, with each speaker reading 10 different sentences.[48] Its small size allows many different configurations to be tried effectively with it. The error rates presented are phone error rates (PER).

http://en.wikipedia.org/wiki/Deep_learning#Fundamental_concepts

51

5 | Advanced Topics - Deep Learning

Page 53: Applied Computer Vision - a Deep Learning Approach

Andrew Ng on Deep Learning where AI will learn from untagged data

52

5 | Advanced Topics - Deep Learning

Page 54: Applied Computer Vision - a Deep Learning Approach

https://www.youtube.com/watch?v=W15K9PegQt0#t=221 [39:56]

53

5 | Advanced Topics - Deep Learning

Page 55: Applied Computer Vision - a Deep Learning Approach

To learn more about Andrew Ng on Deep Learning and the future of #AI

- http://new.livestream.com/gigaom/FutureofAI (~1:20:00)

- https://www.youtube.com/watch?v=W15K9PegQt0#t=221

-http://deeplearning.stanford.edu/

Neural Nets talk by Ilya Sutskeverhttps://vimeo.com/77050653 [25:00 - end]

54

5 | Advanced Topics - Deep Learning

Page 56: Applied Computer Vision - a Deep Learning Approach

6 | Sample Questions

Sample Questions

Page 57: Applied Computer Vision - a Deep Learning Approach

1. OpenCV is a software app that is useful because

1. it not a software app

2. false

3. true

4. it is useful because it integrates functionality commonly used to perform computer vision

2. Projective geometry is useful because

1. every pinhole based visual system is subject to the laws of projective geometry

2. compound eyes are subject to projective geometry

3. all are false

4. all are true

3. Pinhole eye design is popular in nature

1. true

2. false

4. A feature is

1. a characteristic of a image

2. the output of a filter

3. the output of non linear filter (morphological)

4. a feature must be a number

5. a feature cannot be binary

6. a feature can be true or false

5. A feature that is a constant can still be useful for clustering purposes,

1. true

2. false

6. A good feature is a salient feature,

1. true

2. false

3. not always

56

6 | Sample Questions - A

Page 58: Applied Computer Vision - a Deep Learning Approach

7. The absolute minimum number of features to cluster 7 labels is six

1. true

2. false

3. its 8

4. 7

5. all are true

6. depends on the features

7. none

8. Use of features instead of dealing with pixels allows to simplify the problem of labeling a image

1. true

2. false

9. Clustering by closest neighbor is the most effective method to classify two overlapping clouds of points

1. true

2. k-means is in general superior

3. all of the above

4. none

10. Likelihood is the best way to cluster labels

1. true

2. false

3. it is better than k-means

11. A drawback of k-means is that you have to set the number of clusters

1. true

2. false

12. Learning to recognize a pattern is equivalent to,

1. Knowing the cluster boundaries and knowing which features to use

2. is to make a database of possible images and then when one of the images is presented to tell which one is same in the database

3. all

4. none

13. In morphological operations, closing is...

1. used to clean salt a pepper noise which is bigger than the objects on interest

2. used to clean salt a pepper noise which is smaller than the objects on interest

3. opening and dilation by this order

4. dilation and then erosion

14. Opening is

1. a tool to remove white noise form a black and white picture

57

6 | Sample Questions - A

Page 59: Applied Computer Vision - a Deep Learning Approach

2. it only works on black and white picture

3. erosion followed by closing

4. erosion followed by dilation

5. none

15. Erosion is

1. a linear operation

2. is reversible

3. all are true

16. Dilation is

1. used to connect disconnected lines

2. to remove vertical lines

3. none

4. all

17. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate

1. [1,1,1,1,1,1,1,1] (as a vertical column)

2. [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

3. [1,1,1,3,4,3,1,1]

4. [0,0,0,1,0,0]

5. [1,1,1]

6. [1,1]

7. none

18. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate

1. a 4x4 matrix of ones

2. a 7x1 matrix of ones

3. a 7x7 matrix of ones

4. a 1x7 matrix of ones

5. none

19. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate

1. a 1x5 matrix of ones

2. a 5x1 matrix of ones

3. a 5x5 matrix of zeroes

4. a 5x1 matrix of zeroes with a 1 in the middle

5. none

20. Ideally the features to do arabic number recognition would be

1. correspond to traces,

2. not correspond to traces but numbers

3. both

4. none

58

6 | Sample Questions - A

Page 60: Applied Computer Vision - a Deep Learning Approach

21. The best features for pattern recognition are those that appear infrequently in the dataset because

1. they have a lot of entropy

2. they discriminate numbers better because the frequency

3. none

4. all

22. Sparse coding is related to how the brain detects edges by means of basis functions

1. true

2. false

3. not always, depends on the variance of the images

59

6 | Sample Questions - A

Page 61: Applied Computer Vision - a Deep Learning Approach

Dr. Jose Berengueres joined UAE University as Assistant Professor in 2011. He received MEE from Polytechnic University of Catalonia in 1999 and a PhD in bio-inspired robotics from Tokyo Institute of Technology in 2007.

He has authored books on:The Toyota Production SystemDesign ThinkingHuman Computer InteractionUX women designersBusiness Models Innovation

He has given talks and workshops on Design Thinking & Business Models in Germany, Mexico, Dubai, and California.