applied computer vision - a deep learning approach

Applied Computer Vision

FOR UNDERGRADS

A Deep Learning ApproachJ. Berengueres

Applied Computer Vision for Undergrads version 2.2

EditorJose Berengueres

Edition First Edition. August 24th, 2014.

Text Copyright© Jose Berengueres 2014. All Rights Reserved.

i

©

Video, Audio & Artwork CopyrightArtwork appearing in this work is subject to their corresponding original Copyright or Creative Commons License. Except where otherwise noted a Creative Commons Attribution 3.0 License applies.

Limit of Liability

The editor makes no representations or warranties concerning the accuracy or exhaustivity of the contents and theories hereby presented and particularly disclaim any implied warranties regarding merchantability or fitness for a particular use including but not limited to educational, industrial and academic application. Neither the editor or the authors are liable for any loss or profit or any commercial damages including but not limited to incidental, consequential or other damages.

Support

This work was supported by:

UAE University V ii

1 | Intro

http://www.howwedrive.com/2012/01/23/let-the-robot-drive/



How face recognition woks

http://vimeo.com/12774628

Andrew NG

http://www.youtube.com/watch?v=AY4ajbu_G3k

Google Self driving Car

https://www.youtube.com/watch?v=YXylqtEQ0tk

4

1 | Intro - Topics

Introdu

ction

videos







2 | Theoretical Stuff

Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html

OpenCV was developed by Intel Russia research center at Nizhny Novgorod

How to program a PC so it learns how to see?

What does it mean to “see” something?

1. Projective geometry

2. The Eye Design

3. Saliency Sense in babies

Optical

Illusion

s are

proof of

inform

ation

discarding

at retin

a

level (se

e also d

eep

learning

)What does it mean to See?------

http://personal.cfw.com/~renders/nizhnymall_photos.html


http://en.wikipedia.org/wiki/OpenCV

http://en.wikipedia.org/wiki/OpenCV

http://www.free-for-kids.com/optical-illusions.shtml












Projective Geometry

Many Books on Computer vision start by this topic. Which is quite irrelevant. Let me explain, Picasso did not benefit from learning the chemistry of Yellow paint manufacturing. You neither. You don’t need 3D geometry knowledge, you just need great 2D geometry knowledge. Additionally, his paintings were one of the first (since the middle ages) to ignore the laws of projective geometry. And they where a big success. His father was a frustrated art teacher, but he made sure that young Picasso would not become like himself. Picasso like Tiger Woods was a product of his father.

7

2 | Theoretical Stuff - Projective Geometry

Caring about projective Geom in Computer Vsion is the same mistake as “model of the world” vs. “ground model” - you should not care about the world because you already have one model of it:

Your retina is your world and your model! Do not complicate it by adding an additional layer or

model of a model that u have to update continuously. You don’t even know if the world is

real. See Godl.------

Reality?Your Reality!

Pin-hole Design

Pin-hole design is one of the few cheap ways to convert a 3D world into a 2D picture (simplification). That is why man-made cameras use the same principle. For a fascinating story on how the pinhole eye is used by biological systems I recommend Climbing mount improbable. There are other ways in which nature “sees”:

the bat’s ultrasound vision

dolphins’ ultrasound vision

the compound eye of fly

Polarized light vision

From a computer vision point of view it should not mater if your vision device is pinhole based or not.

Compound Eye. What is the pixel resolution of the compound eye?

8

2 | Theoretical Stuff - The Design of the Eye

9


http://www.detectingdesign.com/humaneye.html

10




Hit a WallWhen Maja Rudinac hit the wall of unpractical computer vision, she turned to developmental psychologists for strategies to cope with large amounts of image pixels. This is what she found (so you don’t have to):

Visual Developmental Psychology

Basics(Abridged from Maja Rudinac PhD thesis, TUDelft)

Babies at the 4th month can already tell if a character is bad or good because we can see who they hug longer.

Infants look longer at new faces or new objects

Independent of where are born, all babies know

boundaries of objects.

Can predict collisions

Basic additive and subtractive cognition

Can identify members of own group versus non-own group

Spontaneous motor movement is not goal directed at the

onset. The baby explores the degrees of freedomGoal directed arm-grasp appears at the 4th month

The ability to engage and disengage attention on targets

appears from day 1 in babies.

Smooth visual tracking is present at birth

How baby cognition “works”Development of actions of babies is goal directed by two motives. Actions are either,

1. To discover novelty

2. To discover regularity

3. To discover the potential of their own body

Development of Perception Perception in babies is driven by two processes:

1. Detection of structure or patterns

11

2 | Theoretical Stuff - Developmental Psychology

2. Discarding of irrelevant info and keeping relevant info

Cognitive Robot Shopping List

So if we want to make a minimum viable product (MVP)

that can understand the world at least (as well or as poorly) as a baby does, this are the functions that according to Mrs. Maja (pronounced Maya) we will need:

A WebCam

Object Rracking

Object Discrimination

Attraction to peoples faces

Face Recognition

Use the hand to move objects to scan them form various angles

Shades and 3DTurns out that shades have a disproportionate influence in helping us figure out 3D info from 2D retina pixels. When researchers at Univ. of Texas used fake shades in a virtual

reality world, participants got head aches (because the faking of the shades was not precise enough to fool the brain. The brain got confused by the imperceptible mismatches... that’s why smart people get head aches in 3D cinemas)

12

2 | Theoretical Stuff - Developmental Psychology

3 | Number Recognition Workshop

Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html

2014. Number recognition workshop purpose is to learn computer vision basics.

In this chapter we will learn the four basic components of a typical computer vision program:

1. Features

2. Clustering

3. Filtering/Morph ops

4. Validation

we will use the example to learn OpenCV. And finally we will learn why manual feature making is obsolete because deep learning.

Let’s lea

rn by m

eans

of a sim

ple exam

ple

What does it mean to classify numbers?

------

1 2 3 4 5 6 7 8 9 0



IntroThis workshop drives the introduction OpenCV functions as needs arise. Let’s identify written numbers from 0 to 9. You can get some inspiration in this video:

http://www.youtube.com/watch?v=D_cZBdfw-hQ

In Feb 2011 (just before the tsunami), we programmed HRP-IV to play a game. We used the histogram method to separate Caucasian skin from a background and then we counted the number of valleys and mountains between the hull vertexes. A more primitive approach is to average the area, but then it is not as robust.

Workshop Time

Teams of 4. You have 15 minutes to come up with some algorithm, trick or rule of thumb to classify the numbers.

Note The students will try to find saliency features. Good saliency features are robust to:

noise

partial occlusions and,

confusion

Here is a typical list:

# of horizontal segments

# of pixels belonging to horizontal segments

Length of segments

# contains closed loops

# start and end points

Relative orientation of end points

Extracting features from pixels is called feature extraction. Salient features are the ones that are most

useful in classifying pixels (by Information Entropy).

14

3 | Number Recognition Workshop - Number Recognition



http://en.wikipedia.org/wiki/2011_T%C5%8Dhoku_earthquake_and_tsunami

http://en.wikipedia.org/wiki/2011_T%C5%8Dhoku_earthquake_and_tsunami

1 2 3 4 5 6 7 8 9 0Feature Reduction as a Necessity

Consider the letter ‘E’, its representation a s a vector is e = { 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,0,0,1,1,1,1,1,1,1,0,0 0,0,1,0,0,0,0,0,0 ... } that is a a vector of dimension 11 x 13 = +100. There is no way for the

retinal nerve to have enough bandwidth to send all

that information for processing to the brain neurons. What happens in reality is that, already at retina level, features

15


are already being extracted, information is compressed (edge filters).

Minimum Number of FeaturesIf we find five features that allow us to distinguish between al 10 different number shapes, then the problem becomes

a dimension 5 problem, much more manageable.

Minimum Number of features formula

N> Log2 (different labels)

Because the problem can be complex. Students can start by trying to distinguish just two numbers (the 1 and the 0). Manually extracting two features: vertical and horizontal features (feature reduction). Then, we can map the numbers in clusters. This helps them understand the need of more features when clusters overlap.

TrainingSo if we start with webcam photos of figures at 600x400 pixels how would you design the program? At this stage most students fail to recognize the need for training with lots of examples. In fact the hard part of this workshop is training (understanding what features work to differentiate 0 from 1 and a 1 from a 2...) (See also Andrew Ng video in Chapter 1)

16


ReflectionSome teams came up with valley mountain features. Others with number of lines, or crossings.

Let’s make a competition to see which is the winning team! (this is also a great excuse to help them learn openCV)

Cognition Leads to Play?Once we have a working program with the ability to recognize numbers. We can Apply Toyota’s Kaizen (see The Brown Book of Design Thinking Chapter 3). What kind of games or apps can we make? What kind of applications can u imagine? Make a useful app.

17


18


All com

puter vi

sion

apps tod

ay

fundamenta

lly work

the sam

e as this

example

Now we just need to learn

how to:

1. Do tracking

2. Clustering methods

3. How extract features with

opencv without reinventing

the wheel (next)

If you can make an

app that recognizes my hand writting better than my

app, I will give you a....

A +

19


How to Cluster Features (review of three methods)

By Closest Neighbor By Centroids (Center of mass) k-means

20

3 | Number Recognition Workshop - Clustering of Features

Let’s

have an

honest

discussio

n abou

t

clusteri

ng

By Probability

Given x is in x position what minimizes Prob. Error

Comparison of methods Centroid --> x belongs to Green

Neighbor --> Green

Statistic --> Red

x

21


Where does ‘x’ belong?-----

What is the best “strategy”

http://en.wikipedia.org/wiki/Cluster_analysis

22


Where does ‘x’ belong?-----

What is the best “strategy”

The pre

vious ar

e the

main 3 un

supervis

ed

clusteri

ng metho

ds



The Feature - Cluster ConundrumNow ask the students to make a flowchart of a program that can label hand written numbers 0 to 9. At this point most students come up with this flow chart

The purpose of the exercise is to let the student rea l ize the difference between training and labeling. Training is the hard part. Another hard part to get is how to discard useful from not useful features in the Dimension reduction phase. Previusly we said to use Entropy as a indicator of how useful a feature can be. However, if I plug random noise as a feature it will socre high on entropy. Because the quality of resutls depends on clustering and clusters depends on what features we choose, it is not a good idea to decouple features discarding from clustering itself. This is called the feature-cluster conundrum.

In the next chapter, lets find how feature-finding has been traditionally approached in the 80’s and 90’s (this is now of course obsolete knowledge but I will include it here as a “nostalgic” note).

23


So, learnt

(knowledge) = List of Feature + Cluster

boundaries?

http://en.wikipedia.org/wiki/Flowchart

http://en.wikipedia.org/wiki/Flowchart

Morphological OpsMorphs ops is different from linear filters in that they are not linear. Imagine you have the letter E but that the corner pixel has been erased because of some noise. Your brain

(Gestalt Theory) can reconstruct, in fact, it is

designed to reconstruct intersections of lines.

However, a computer is not you. It does not know anything about Gestalt and so how can we reconstruct this missing corner automatically? If we don’t the computer might think this is an F underlined.

You can reconstruct this by use of a so called

‘morph op’ called closing

24

3 | Number Recognition Workshop - Morphological Ops

F!E!

Closing1. Dilate - enlarge black pixel by adding black pixel next to

pre existing black pixels using some kind of rule

http://www.youtube.com/watch?v=xO3ED27rMHs

2. Erode - the reverse process

http://www.youtube.com/watch?v=fmyE7DiaIYQ

before after

OpeningSame as previous but in reverse 2 and 1 in order

before after

Case uses Closing is used to connect missing lines, parts.

Opening is used to remove noise, that does not belong to the largest object in the scene.

25






Structuring element

In this case the blue cross is the structuring element. IT can be any other shape

For more details: http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm

http://bigwww.epfl.ch/demo/demoteaching.html Demos

PracticeWe can use Excel to practice manual closing and opening

26


http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm

http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm

http://bigwww.epfl.ch/demo/jmorpho/

http://bigwww.epfl.ch/demo/jmorpho/

L shape Excel exercise

In this exercise, I asked the students to come up with a structuring element to reconstruct the E shape. Most of them propose the L-shape 3x3. After two closings we can realize the following: We managed to reconstruct the missing corner, but if the structuring element size will obliterate details inthe picture smaller than itself. This is one drawback of morphological operation.

27


Code review - What’s wrong with openCV

thinkinghttps://github.com/orioli/MAID-ROBOT

https://github.com/orioli/MAID-ROBOT/blob/master/uEyeCameraHIRO/camShiftDemo/camShiftDemo.cpp

What the robot sees.

Results of the code review:

400 lines of code to count 5 fingers.

Not robust to skin color change

Not resuable code or a general purpose solution

it is too custom --> obsolescence assured

28

3 | Number Recognition Workshop - OpenCV Code Review

This project of 2007 is an example of everything

what is wrong with OpenCV thinking. It is not the way forward. The brain does not work like this. It

does not scale. So whats next?------

https://github.com/orioli/MAID-ROBOT

https://github.com/orioli/MAID-ROBOT



Now students have all the knowledge to make the sw to identify numbers. Ask the students to draft a detailed action plan to classify characters from 0 to 9 from photos.

Here is an example:

1.Get the training dataset

1.How many need? 100 of each number?

1. Organize folders /1 /2 /3 …

2. Go to cafeteria and ask people to give sample

of number

3. Digitize it. How? Take a picture with the

iPhone

2.Training SW

1.Find features for numbers

1. How many do we need?

1. Five?

2. Let the SW choose the useful ones

2. Edges

3. Horizontal lines

4. Closed spaces

5. Shapes?

3.Cluster dataset

1. How good is the clustering? ! Testing

2. Choose the method with highest accuracy across

testing subsets

3. The model will tell you which features are useful and

which are not at labeling

4. How do you test it?

5. Bring other (new) numbers and check

6. Split dataset into training set and testing set. 50% -

50%

7. Prevent overfitting - Divide test data into chunks of 10

and see prediction accuracy for each individual chunk

Now that students have drafted a plan let them do something concrete. draw a matrix of 1 and 0 that represents the binary matrix image of number 7 and ask them how would they extract features.

29

3 | Number Recognition Workshop - Edge Detectors & convolution

Feature Extraction

In Excel we can use conditional highlighting to visualize the filtering process. We start by a 7. Ask the students

How would you extract a feature from the ones and zeroes?

30


Gaussian filtering by convolution

31


Vertical edge detector by convolution

See also Canny Filters and Laplacians1. http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html

2. http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe

3. http://www.youtube.com/watch?v=pIFnFhDsYlk

32


http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html

http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html

http://www.youtube.com/watch?v=pIFnFhDsYlk

http://www.youtube.com/watch?v=pIFnFhDsYlk

33


At this

point we ca

n realiz

e that

numbers

are tra

ces so w

hat we

need is

to clas

sify typ

es of

traces?

not the

detectio

n of the

edges?

What kind

of edg

e classi

fier can

we use? v

iola-jon

es?

convolu

tion? we want

to know

:

type of

edge, p

osition

1 2 3 4 5 6 7 8 9 0

http://practicalquant.blogspot.ae/2013/10/deep-learning-oral-traditions.html

34


4 (0,0.5,4,13,33) it is a four!

Convert to +100 features found by hand

Example feature: Has a cross in the lower mid feature

Traditio

nal

approac

h

35


it is a three!

LEVEL 3SPACIAL RELATIONSHIPDETECTOR

Layer -

abstract

ion

Deep le

arning

approac

h

LEVEL 2PRIMITIVE SHAPE DETECTORS

LEVEL 1EDGE DETECTOR BANK

http://cs.brown.edu/courses/cs143/2011/results/proj2/thuhe/



Feature Extraction

http://www.youtube.com/watch?v=n1ViNeWhC24

36

3 | Number Recognition Workshop - Features and Sparse Coding



At this point, it is probably a good time to ask the students to try to make a little system to classify handwritten numbers. See what error rate they come up. For the training set, they can ask friends to write numbers. Whatever you do, do not forget to set up a deadline, otherwise they will seize the opportunity. ( #rookiemistake )

How to do automatic

feature extraction?

37

3 | Number Recognition Workshop - Features and Sparse Coding

Manual Feature Extraction is not the way forward

Most of Kaggle’s comp.

winners are decided by how lucky they are at

finding useful features----

D. Efimov

4 | Numer Recognition Workshop Solution

Number Recognition Workshop Solution

RecapitulationIn the previous chapter we saw how to manually make features (aka feature extraction). We also so that feature extraction is about 50% of the work to win a kaggle competition. The other 50% is optimizing the mathematical prediction model ( Efimov 2012 ). We also saw that some geniuses, like Andrew Ng, postulate that manual feature extraction is a waste of time, that this is the kind of nitty gritty job that should be done by computers. We also saw in the section feature-cluster conundrum one more reason why feature engineering is part of the problem itself. In the section about Sparse Coding we saw a mathematical foundation that is a good reverse engineering of hwo the brain finds good features because both sparse coding and the brain end up with similar edge detector filters.

DNN online, on-demandThe company Ersats allows you to try Deep Neural Networks models online. They have a demo based on number recognition that we will use now (which is what we have been doing in the last chapter). The handwritten training sets are available from NIST at http://yann.lecun.com/exdb/mnist/. The author, Yann LeCun, compared different methods to to the job...Convolutional neural networks do the job better than metric approaches such as SVM.

39

4 | Numer Recognition Workshop Solution - Deep Learning Workshop

http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/mnist/

The MNIST tutorial

This tutorial explains how to use the cloud infrastructure to solve the number recognition problem via Deep Neural Networks.

http://www.ersatzlabs.com/documentation/sharedMNIST/

More background info, current backdrops and videos on history of ANN at: (playlist of 5 videos)

History of neural nets (playlist)

https://www.youtube.com/watch?v=4B-XY8a4RGk

https://gigaom.com/2014/06/11/more-deep-learning-for-the-masses-courtesy-of-ersatz-labs/

40










Additionally, Ersats published an interesting introduction to Neural Nets at:

http://neuralnetworksanddeeplearning.com/chap1.html

41




5 | Advanced Topics

Advanced Topics

Using Vision to Navigate

http://www.youtube.com/watch?v=8c2SFXQ5zHM Sir James explains

http://www.youtube.com/watch?v=oguKCHP7jNQ Navigation

http://www.youtube.com/watch?v=xlaqYDZwoWo#t=43 Suction

NT: Hoover tried to steal his idea. They lost in court with punitive damages. Top right photo HEAD SPORTS.

43

5 | Advanced Topics - Vision for Navigation

One day my wife was sick so I had to vacuum the

house. Then I realized that bag based vacuum cleaners do not suck, So I decided to make one that sucks more. I took longer than expected

though. ------

Sir J. Dyson

http://www.youtube.com/watch?v=8c2SFXQ5zHM

http://www.youtube.com/watch?v=8c2SFXQ5zHM

http://www.youtube.com/watch?v=oguKCHP7jNQ

http://www.youtube.com/watch?v=oguKCHP7jNQ

What People Cured of Blindness See

Abridged from “What People Cured of Blindness See”

BY PATRICK HOUSE” The New Yorker

How quickly, if at all, does the brain adapt and vision return after surgery? A simple answer, and a correct one, is that it depends entirely on circumstance. Back in 1993, Oliver Sacks wrote a story in the magazine about Virgil, a man with

limited to no vision as a child who had developed cataracts at the age of six. After his cataracts were removed, fifty years

later, Virgil had trouble adjusting. (For example, he could not always distinguish the letter “A” from the letter “H” and, when given Molyneux’s test, could not tell a square he felt from a square he saw.)*

Since the surgeries, Sinha has followed up with the Prakash children and found that, while they continued to suffer from poor acuity, many higher-order aspects of vision seemed to be improving. Within a week to a few months after surgery, the children could match felt objects to their visual counterparts. They also improved on spatial-navigation tasks requiring mental imagery, which tested their ability to follow a series of up, down, left, and right directions on a visually imagined game board. This finding was particularly important because previous work by Kosslyn and others had found that the congenitally blind have a capacity for mental imagery, but it is limited in some ways and becomes increasingly poor as the task becomes more complex. (In one example, a sighted person will imagine a typewriter a few feet away as larger than the same one imagined a hundred feet away. Among the congenitally blind, however, the imagined typewriter—a composite of experiences of touch and sound alone—is the same size at all distances.)

44

5 | Advanced Topics - Blindness

Kosslyn believes that any improvements in mental imagery will require a “catalogue of visual memories” that can then be used to build expectations about the visual world. “When you develop expectations, you can use the fruits of previous experience to help you process what’s coming in now,” Kosslyn said. “But you need to have had that experience.” An example is depth perception: to the sighted, with a lifetime of practice, rules about occlusion (if A occludes B, object A is closer) and foreshortening (objects farther away appear smaller) are continually used to combine incoming light into a rich, three-dimensional world. The absence of these rules can frustrate the newly sighted, whose visual world can be both blurry and two-dimensional—paintings and people are often

described as “flat, with dark patches”; a far-

away house is “nearby, but requiring the taking of a lot of steps”; streetlights seen through glass are “luminous stains stuck to the window”; sunbeams through tree branches collapse into a single “tree with all the lights in it.” (The writer Jorge Luis Borges, who went blind at age fifty-five, described going blind as a process by which “everything near becomes distant.” In the newly sighted, without depth perception, the opposite seems true: the distant—tiny houses on the horizon, clouds in the impossibly high sky—suddenly looks nearby.)

45

5 | Advanced Topics - Blindness

Ways of tracking

http://www.youtube.com/watch?v=InqV34BcheM

1. By Color histogram (HSV is less dependent on illumination)

2. By Blob (OpenCV Library, not very robust)

3. By Face Detection

4. By Saliency (robust to occlusions

Traditional methodsTypical Keypoint Extraction for recognition of objects independent of view:

Harris Afine

MSER

Hessian Afine

Maja’s method InsightFeatures extracted

Use of HSV histogram (robust to ilumination changes)

Texture by Gray level co-occurrence matrix

Edge orientation histogram (6 bins)

Mean, skewness and sd for each color channel

Discard all but 25 top features.

Tested on Columbia Object Image Library. Beats previous methods.

46

5 | Advanced Topics - Easy Tracking

We are looking for an algo that is invariant to

partial oclusions------

Maja R.





47


For more:

48


Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. See these course notes for a brief introduction to Machine Learning for AI and an introduction toDeep Learning algorithms. www.deeplearning.net/tutorial/

Deep Learning explained(Abridged from the original from Pete Warden | @petewarden)http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html

Inside an ANN

The functions that are run inside an ANN are controlled by the memory of the neural network, arrays of numbers known as weights that define how the inputs are combined and recombined to produce the results. Dealing with real-world problems like cat-detection requires very complex functions, which mean these arrays are very large, containing around

60 million (60MBytes) numbers in the case of one of

the recent computer vision networks. The biggest obstacle to using neural networks has been figuring out how to set all these massive arrays to values that will do a good job transforming the input signals into output predictions.

Renaissance

It has always been difficult to train an ANN. But in 2012, a

breakthrough, a paper sparks a renaissance in ANN. Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton bring together a

whole bunch of different ways of accelerating the

learning process, including convolutional networks, clever use of GPUs, and some novel mathematical tricks like ReLU and dropout, and showed that in a few weeks they could

49

5 | Advanced Topics - Deep Learning

http://www.deeplearning.net/tutorial/

http://www.deeplearning.net/tutorial/

http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html

http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf

http://www.cs.toronto.edu/~kriz/




http://www.cs.toronto.edu/~ilya/

http://www.cs.toronto.edu/~ilya/

http://www.cs.toronto.edu/~hinton/

http://www.cs.toronto.edu/~hinton/

http://en.wikipedia.org/wiki/Rectifier_(neural_networks)

http://en.wikipedia.org/wiki/Rectifier_(neural_networks)

http://videolectures.net/nips2012_hinton_networks/

http://videolectures.net/nips2012_hinton_networks/

train a very complex network to a level that outperformed conventional approaches to computer vision.

GPU photo by Pete Warden slides (Jetpack)

Listen to the Webcast at Strata 2013http://www.oreilly.com/pub/e/3121

http://www.iro.umontreal.ca/~pift6266/H10/intro_diapos.pdf

Deep NN failed unitl 2006....

50


https://wcc.on24.com/event/80/54/34/rt/1/documents/resourceList1406140198176/7242014_deep_learning.pdf

https://wcc.on24.com/event/80/54/34/rt/1/documents/resourceList1406140198176/7242014_deep_learning.pdf

http://www.oreilly.com/pub/e/3121

http://www.oreilly.com/pub/e/3121



Automatic speech recognitionThe results shown in the table below are for automatic speech recognition on the popular TIMIT data set. This is a common data set used for initial evaluations of deep learning architectures. The entire set contains 630 speakers from eight major dialects of American English, with each speaker reading 10 different sentences.[48] Its small size allows many different configurations to be tried effectively with it. The error rates presented are phone error rates (PER).

http://en.wikipedia.org/wiki/Deep_learning#Fundamental_concepts

51




Andrew Ng on Deep Learning where AI will learn from untagged data

52


https://www.youtube.com/watch?v=W15K9PegQt0#t=221 [39:56]

53


https://www.youtube.com/watch?v=W15K9PegQt0#t=221


To learn more about Andrew Ng on Deep Learning and the future of #AI

- http://new.livestream.com/gigaom/FutureofAI (~1:20:00)

- https://www.youtube.com/watch?v=W15K9PegQt0#t=221

-http://deeplearning.stanford.edu/

Neural Nets talk by Ilya Sutskeverhttps://vimeo.com/77050653 [25:00 - end]

54


http://new.livestream.com/gigaom/FutureofAI

http://new.livestream.com/gigaom/FutureofAI



http://deeplearning.stanford.edu

http://deeplearning.stanford.edu

https://vimeo.com/77050653

https://vimeo.com/77050653

6 | Sample Questions

Sample Questions

1. OpenCV is a software app that is useful because

1. it not a software app

2. false

3. true

4. it is useful because it integrates functionality commonly used to perform computer vision

2. Projective geometry is useful because

1. every pinhole based visual system is subject to the laws of projective geometry

2. compound eyes are subject to projective geometry

3. all are false

4. all are true

3. Pinhole eye design is popular in nature

1. true

2. false

4. A feature is

1. a characteristic of a image

2. the output of a filter

3. the output of non linear filter (morphological)

4. a feature must be a number

5. a feature cannot be binary

6. a feature can be true or false

5. A feature that is a constant can still be useful for clustering purposes,

1. true

2. false

6. A good feature is a salient feature,

1. true

2. false

3. not always

56

6 | Sample Questions - A

7. The absolute minimum number of features to cluster 7 labels is six

1. true

2. false

3. its 8

4. 7

5. all are true

6. depends on the features

7. none

8. Use of features instead of dealing with pixels allows to simplify the problem of labeling a image

1. true

2. false

9. Clustering by closest neighbor is the most effective method to classify two overlapping clouds of points

1. true

2. k-means is in general superior

3. all of the above

4. none

10. Likelihood is the best way to cluster labels

1. true

2. false

3. it is better than k-means

11. A drawback of k-means is that you have to set the number of clusters

1. true

2. false

12. Learning to recognize a pattern is equivalent to,

1. Knowing the cluster boundaries and knowing which features to use

2. is to make a database of possible images and then when one of the images is presented to tell which one is same in the database

3. all

4. none

13. In morphological operations, closing is...

1. used to clean salt a pepper noise which is bigger than the objects on interest

2. used to clean salt a pepper noise which is smaller than the objects on interest

3. opening and dilation by this order

4. dilation and then erosion

14. Opening is

1. a tool to remove white noise form a black and white picture

57


2. it only works on black and white picture

3. erosion followed by closing

4. erosion followed by dilation

5. none

15. Erosion is

1. a linear operation

2. is reversible

3. all are true

16. Dilation is

1. used to connect disconnected lines

2. to remove vertical lines

3. none

4. all

17. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate

1. [1,1,1,1,1,1,1,1] (as a vertical column)

2. [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]

3. [1,1,1,3,4,3,1,1]

4. [0,0,0,1,0,0]

5. [1,1,1]

6. [1,1]

7. none


1. a 4x4 matrix of ones




5. none




3. a 5x5 matrix of zeroes

4. a 5x1 matrix of zeroes with a 1 in the middle

5. none

20. Ideally the features to do arabic number recognition would be

1. correspond to traces,

2. not correspond to traces but numbers

3. both

4. none

58


21. The best features for pattern recognition are those that appear infrequently in the dataset because

1. they have a lot of entropy

2. they discriminate numbers better because the frequency

3. none

4. all

22. Sparse coding is related to how the brain detects edges by means of basis functions

1. true

2. false

3. not always, depends on the variance of the images

59


Dr. Jose Berengueres joined UAE University as Assistant Professor in 2011. He received MEE from Polytechnic University of Catalonia in 1999 and a PhD in bio-inspired robotics from Tokyo Institute of Technology in 2007.

He has authored books on:The Toyota Production SystemDesign ThinkingHuman Computer InteractionUX women designersBusiness Models Innovation

He has given talks and workshops on Design Thinking & Business Models in Germany, Mexico, Dubai, and California.

applied computer vision - a deep learning approach

Education