applied computer vision - a deep learning approach
DESCRIPTION
text book for undergradsTRANSCRIPT
Applied Computer Vision
FOR UNDERGRADS
A Deep Learning ApproachJ. Berengueres
Applied Computer Vision for Undergrads version 2.2
EditorJose Berengueres
Edition First Edition. August 24th, 2014.
Text Copyright© Jose Berengueres 2014. All Rights Reserved.
i
©
Video, Audio & Artwork CopyrightArtwork appearing in this work is subject to their corresponding original Copyright or Creative Commons License. Except where otherwise noted a Creative Commons Attribution 3.0 License applies.
Limit of Liability
The editor makes no representations or warranties concerning the accuracy or exhaustivity of the contents and theories hereby presented and particularly disclaim any implied warranties regarding merchantability or fitness for a particular use including but not limited to educational, industrial and academic application. Neither the editor or the authors are liable for any loss or profit or any commercial damages including but not limited to incidental, consequential or other damages.
Support
This work was supported by:
UAE University V ii
1 | Intro
http://www.howwedrive.com/2012/01/23/let-the-robot-drive/
How face recognition woks
http://vimeo.com/12774628
Andrew NG
http://www.youtube.com/watch?v=AY4ajbu_G3k
Google Self driving Car
https://www.youtube.com/watch?v=YXylqtEQ0tk
4
1 | Intro - Topics
Introdu
ction
videos
2 | Theoretical Stuff
Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html
OpenCV was developed by Intel Russia research center at Nizhny Novgorod
How to program a PC so it learns how to see?
What does it mean to “see” something?
1. Projective geometry
2. The Eye Design
3. Saliency Sense in babies
Optical
Illusion
s are
proof of
inform
ation
discarding
at retin
a
level (se
e also d
eep
learning
)What does it mean to See?------
6
Projective Geometry
Many Books on Computer vision start by this topic. Which is quite irrelevant. Let me explain, Picasso did not benefit from learning the chemistry of Yellow paint manufacturing. You neither. You don’t need 3D geometry knowledge, you just need great 2D geometry knowledge. Additionally, his paintings were one of the first (since the middle ages) to ignore the laws of projective geometry. And they where a big success. His father was a frustrated art teacher, but he made sure that young Picasso would not become like himself. Picasso like Tiger Woods was a product of his father.
7
2 | Theoretical Stuff - Projective Geometry
Caring about projective Geom in Computer Vsion is the same mistake as “model of the world” vs. “ground model” - you should not care about the world because you already have one model of it:
Your retina is your world and your model! Do not complicate it by adding an additional layer or
model of a model that u have to update continuously. You don’t even know if the world is
real. See Godl.------
Reality?Your Reality!
Pin-hole Design
Pin-hole design is one of the few cheap ways to convert a 3D world into a 2D picture (simplification). That is why man-made cameras use the same principle. For a fascinating story on how the pinhole eye is used by biological systems I recommend Climbing mount improbable. There are other ways in which nature “sees”:
the bat’s ultrasound vision
dolphins’ ultrasound vision
the compound eye of fly
Polarized light vision
From a computer vision point of view it should not mater if your vision device is pinhole based or not.
Compound Eye. What is the pixel resolution of the compound eye?
8
2 | Theoretical Stuff - The Design of the Eye
9
2 | Theoretical Stuff - The Design of the Eye
http://www.detectingdesign.com/humaneye.html
10
2 | Theoretical Stuff - The Design of the Eye
Hit a WallWhen Maja Rudinac hit the wall of unpractical computer vision, she turned to developmental psychologists for strategies to cope with large amounts of image pixels. This is what she found (so you don’t have to):
Visual Developmental Psychology
Basics(Abridged from Maja Rudinac PhD thesis, TUDelft)
Babies at the 4th month can already tell if a character is bad or good because we can see who they hug longer.
Infants look longer at new faces or new objects
Independent of where are born, all babies know
boundaries of objects.
Can predict collisions
Basic additive and subtractive cognition
Can identify members of own group versus non-own group
Spontaneous motor movement is not goal directed at the
onset. The baby explores the degrees of freedomGoal directed arm-grasp appears at the 4th month
The ability to engage and disengage attention on targets
appears from day 1 in babies.
Smooth visual tracking is present at birth
How baby cognition “works”Development of actions of babies is goal directed by two motives. Actions are either,
1. To discover novelty
2. To discover regularity
3. To discover the potential of their own body
Development of Perception Perception in babies is driven by two processes:
1. Detection of structure or patterns
11
2 | Theoretical Stuff - Developmental Psychology
2. Discarding of irrelevant info and keeping relevant info
Cognitive Robot Shopping List
So if we want to make a minimum viable product (MVP)
that can understand the world at least (as well or as poorly) as a baby does, this are the functions that according to Mrs. Maja (pronounced Maya) we will need:
A WebCam
Object Rracking
Object Discrimination
Attraction to peoples faces
Face Recognition
Use the hand to move objects to scan them form various angles
Shades and 3DTurns out that shades have a disproportionate influence in helping us figure out 3D info from 2D retina pixels. When researchers at Univ. of Texas used fake shades in a virtual
reality world, participants got head aches (because the faking of the shades was not precise enough to fool the brain. The brain got confused by the imperceptible mismatches... that’s why smart people get head aches in 3D cinemas)
12
2 | Theoretical Stuff - Developmental Psychology
3 | Number Recognition Workshop
Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html
2014. Number recognition workshop purpose is to learn computer vision basics.
In this chapter we will learn the four basic components of a typical computer vision program:
1. Features
2. Clustering
3. Filtering/Morph ops
4. Validation
we will use the example to learn OpenCV. And finally we will learn why manual feature making is obsolete because deep learning.
Let’s lea
rn by m
eans
of a sim
ple exam
ple
What does it mean to classify numbers?
------
1 2 3 4 5 6 7 8 9 0
IntroThis workshop drives the introduction OpenCV functions as needs arise. Let’s identify written numbers from 0 to 9. You can get some inspiration in this video:
http://www.youtube.com/watch?v=D_cZBdfw-hQ
In Feb 2011 (just before the tsunami), we programmed HRP-IV to play a game. We used the histogram method to separate Caucasian skin from a background and then we counted the number of valleys and mountains between the hull vertexes. A more primitive approach is to average the area, but then it is not as robust.
Workshop Time
Teams of 4. You have 15 minutes to come up with some algorithm, trick or rule of thumb to classify the numbers.
Note The students will try to find saliency features. Good saliency features are robust to:
noise
partial occlusions and,
confusion
Here is a typical list:
# of horizontal segments
# of pixels belonging to horizontal segments
Length of segments
# contains closed loops
# start and end points
Relative orientation of end points
Extracting features from pixels is called feature extraction. Salient features are the ones that are most
useful in classifying pixels (by Information Entropy).
14
3 | Number Recognition Workshop - Number Recognition
1 2 3 4 5 6 7 8 9 0Feature Reduction as a Necessity
Consider the letter ‘E’, its representation a s a vector is e = { 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,0,0,1,1,1,1,1,1,1,0,0 0,0,1,0,0,0,0,0,0 ... } that is a a vector of dimension 11 x 13 = +100. There is no way for the
retinal nerve to have enough bandwidth to send all
that information for processing to the brain neurons. What happens in reality is that, already at retina level, features
15
3 | Number Recognition Workshop - Number Recognition
are already being extracted, information is compressed (edge filters).
Minimum Number of FeaturesIf we find five features that allow us to distinguish between al 10 different number shapes, then the problem becomes
a dimension 5 problem, much more manageable.
Minimum Number of features formula
N> Log2 (different labels)
Because the problem can be complex. Students can start by trying to distinguish just two numbers (the 1 and the 0). Manually extracting two features: vertical and horizontal features (feature reduction). Then, we can map the numbers in clusters. This helps them understand the need of more features when clusters overlap.
TrainingSo if we start with webcam photos of figures at 600x400 pixels how would you design the program? At this stage most students fail to recognize the need for training with lots of examples. In fact the hard part of this workshop is training (understanding what features work to differentiate 0 from 1 and a 1 from a 2...) (See also Andrew Ng video in Chapter 1)
16
3 | Number Recognition Workshop - Number Recognition
ReflectionSome teams came up with valley mountain features. Others with number of lines, or crossings.
Let’s make a competition to see which is the winning team! (this is also a great excuse to help them learn openCV)
Cognition Leads to Play?Once we have a working program with the ability to recognize numbers. We can Apply Toyota’s Kaizen (see The Brown Book of Design Thinking Chapter 3). What kind of games or apps can we make? What kind of applications can u imagine? Make a useful app.
17
3 | Number Recognition Workshop - Number Recognition
18
3 | Number Recognition Workshop - Number Recognition
All com
puter vi
sion
apps tod
ay
fundamenta
lly work
the sam
e as this
example
Now we just need to learn
how to:
1. Do tracking
2. Clustering methods
3. How extract features with
opencv without reinventing
the wheel (next)
If you can make an
app that recognizes my hand writting better than my
app, I will give you a....
A +
19
3 | Number Recognition Workshop - Number Recognition
How to Cluster Features (review of three methods)
By Closest Neighbor By Centroids (Center of mass) k-means
20
3 | Number Recognition Workshop - Clustering of Features
Let’s
have an
honest
discussio
n abou
t
clusteri
ng
By Probability
Given x is in x position what minimizes Prob. Error
Comparison of methods Centroid --> x belongs to Green
Neighbor --> Green
Statistic --> Red
x
21
3 | Number Recognition Workshop - Clustering of Features
Where does ‘x’ belong?-----
What is the best “strategy”
http://en.wikipedia.org/wiki/Cluster_analysis
22
3 | Number Recognition Workshop - Clustering of Features
Where does ‘x’ belong?-----
What is the best “strategy”
The pre
vious ar
e the
main 3 un
supervis
ed
clusteri
ng metho
ds
The Feature - Cluster ConundrumNow ask the students to make a flowchart of a program that can label hand written numbers 0 to 9. At this point most students come up with this flow chart
The purpose of the exercise is to let the student rea l ize the difference between training and labeling. Training is the hard part. Another hard part to get is how to discard useful from not useful features in the Dimension reduction phase. Previusly we said to use Entropy as a indicator of how useful a feature can be. However, if I plug random noise as a feature it will socre high on entropy. Because the quality of resutls depends on clustering and clusters depends on what features we choose, it is not a good idea to decouple features discarding from clustering itself. This is called the feature-cluster conundrum.
In the next chapter, lets find how feature-finding has been traditionally approached in the 80’s and 90’s (this is now of course obsolete knowledge but I will include it here as a “nostalgic” note).
23
3 | Number Recognition Workshop - Clustering of Features
So, learnt
(knowledge) = List of Feature + Cluster
boundaries?
Morphological OpsMorphs ops is different from linear filters in that they are not linear. Imagine you have the letter E but that the corner pixel has been erased because of some noise. Your brain
(Gestalt Theory) can reconstruct, in fact, it is
designed to reconstruct intersections of lines.
However, a computer is not you. It does not know anything about Gestalt and so how can we reconstruct this missing corner automatically? If we don’t the computer might think this is an F underlined.
You can reconstruct this by use of a so called
‘morph op’ called closing
24
3 | Number Recognition Workshop - Morphological Ops
F!E!
Closing1. Dilate - enlarge black pixel by adding black pixel next to
pre existing black pixels using some kind of rule
http://www.youtube.com/watch?v=xO3ED27rMHs
2. Erode - the reverse process
http://www.youtube.com/watch?v=fmyE7DiaIYQ
before after
OpeningSame as previous but in reverse 2 and 1 in order
before after
Case uses Closing is used to connect missing lines, parts.
Opening is used to remove noise, that does not belong to the largest object in the scene.
25
3 | Number Recognition Workshop - Morphological Ops
Structuring element
In this case the blue cross is the structuring element. IT can be any other shape
For more details: http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm
http://bigwww.epfl.ch/demo/demoteaching.html Demos
PracticeWe can use Excel to practice manual closing and opening
26
3 | Number Recognition Workshop - Morphological Ops
L shape Excel exercise
In this exercise, I asked the students to come up with a structuring element to reconstruct the E shape. Most of them propose the L-shape 3x3. After two closings we can realize the following: We managed to reconstruct the missing corner, but if the structuring element size will obliterate details inthe picture smaller than itself. This is one drawback of morphological operation.
27
3 | Number Recognition Workshop - Morphological Ops
Code review - What’s wrong with openCV
thinkinghttps://github.com/orioli/MAID-ROBOT
https://github.com/orioli/MAID-ROBOT/blob/master/uEyeCameraHIRO/camShiftDemo/camShiftDemo.cpp
What the robot sees.
Results of the code review:
400 lines of code to count 5 fingers.
Not robust to skin color change
Not resuable code or a general purpose solution
it is too custom --> obsolescence assured
28
3 | Number Recognition Workshop - OpenCV Code Review
This project of 2007 is an example of everything
what is wrong with OpenCV thinking. It is not the way forward. The brain does not work like this. It
does not scale. So whats next?------
Now students have all the knowledge to make the sw to identify numbers. Ask the students to draft a detailed action plan to classify characters from 0 to 9 from photos.
Here is an example:
1.Get the training dataset
1.How many need? 100 of each number?
1. Organize folders /1 /2 /3 …
2. Go to cafeteria and ask people to give sample
of number
3. Digitize it. How? Take a picture with the
iPhone
2.Training SW
1.Find features for numbers
1. How many do we need?
1. Five?
2. Let the SW choose the useful ones
2. Edges
3. Horizontal lines
4. Closed spaces
5. Shapes?
3.Cluster dataset
1. How good is the clustering? ! Testing
2. Choose the method with highest accuracy across
testing subsets
3. The model will tell you which features are useful and
which are not at labeling
4. How do you test it?
5. Bring other (new) numbers and check
6. Split dataset into training set and testing set. 50% -
50%
7. Prevent overfitting - Divide test data into chunks of 10
and see prediction accuracy for each individual chunk
Now that students have drafted a plan let them do something concrete. draw a matrix of 1 and 0 that represents the binary matrix image of number 7 and ask them how would they extract features.
29
3 | Number Recognition Workshop - Edge Detectors & convolution
Feature Extraction
In Excel we can use conditional highlighting to visualize the filtering process. We start by a 7. Ask the students
How would you extract a feature from the ones and zeroes?
30
3 | Number Recognition Workshop - Edge Detectors & convolution
Gaussian filtering by convolution
31
3 | Number Recognition Workshop - Edge Detectors & convolution
Vertical edge detector by convolution
See also Canny Filters and Laplacians1. http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html
2. http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe
3. http://www.youtube.com/watch?v=pIFnFhDsYlk
32
3 | Number Recognition Workshop - Edge Detectors & convolution
33
3 | Number Recognition Workshop - Edge Detectors & convolution
At this
point we ca
n realiz
e that
numbers
are tra
ces so w
hat we
need is
to clas
sify typ
es of
traces?
not the
detectio
n of the
edges?
What kind
of edg
e classi
fier can
we use? v
iola-jon
es?
convolu
tion? we want
to know
:
type of
edge, p
osition
1 2 3 4 5 6 7 8 9 0
http://practicalquant.blogspot.ae/2013/10/deep-learning-oral-traditions.html
34
3 | Number Recognition Workshop - Edge Detectors & convolution
4 (0,0.5,4,13,33) it is a four!
Convert to +100 features found by hand
Example feature: Has a cross in the lower mid feature
Traditio
nal
approac
h
35
3 | Number Recognition Workshop - Edge Detectors & convolution
it is a three!
LEVEL 3SPACIAL RELATIONSHIPDETECTOR
Layer -
abstract
ion
Deep le
arning
approac
h
LEVEL 2PRIMITIVE SHAPE DETECTORS
LEVEL 1EDGE DETECTOR BANK
http://cs.brown.edu/courses/cs143/2011/results/proj2/thuhe/
Feature Extraction
http://www.youtube.com/watch?v=n1ViNeWhC24
36
3 | Number Recognition Workshop - Features and Sparse Coding
At this point, it is probably a good time to ask the students to try to make a little system to classify handwritten numbers. See what error rate they come up. For the training set, they can ask friends to write numbers. Whatever you do, do not forget to set up a deadline, otherwise they will seize the opportunity. ( #rookiemistake )
How to do automatic
feature extraction?
37
3 | Number Recognition Workshop - Features and Sparse Coding
Manual Feature Extraction is not the way forward
Most of Kaggle’s comp.
winners are decided by how lucky they are at
finding useful features----
D. Efimov
4 | Numer Recognition Workshop Solution
Number Recognition Workshop Solution
RecapitulationIn the previous chapter we saw how to manually make features (aka feature extraction). We also so that feature extraction is about 50% of the work to win a kaggle competition. The other 50% is optimizing the mathematical prediction model ( Efimov 2012 ). We also saw that some geniuses, like Andrew Ng, postulate that manual feature extraction is a waste of time, that this is the kind of nitty gritty job that should be done by computers. We also saw in the section feature-cluster conundrum one more reason why feature engineering is part of the problem itself. In the section about Sparse Coding we saw a mathematical foundation that is a good reverse engineering of hwo the brain finds good features because both sparse coding and the brain end up with similar edge detector filters.
DNN online, on-demandThe company Ersats allows you to try Deep Neural Networks models online. They have a demo based on number recognition that we will use now (which is what we have been doing in the last chapter). The handwritten training sets are available from NIST at http://yann.lecun.com/exdb/mnist/. The author, Yann LeCun, compared different methods to to the job...Convolutional neural networks do the job better than metric approaches such as SVM.
39
4 | Numer Recognition Workshop Solution - Deep Learning Workshop
The MNIST tutorial
This tutorial explains how to use the cloud infrastructure to solve the number recognition problem via Deep Neural Networks.
http://www.ersatzlabs.com/documentation/sharedMNIST/
More background info, current backdrops and videos on history of ANN at: (playlist of 5 videos)
History of neural nets (playlist)
https://www.youtube.com/watch?v=4B-XY8a4RGk
https://gigaom.com/2014/06/11/more-deep-learning-for-the-masses-courtesy-of-ersatz-labs/
40
4 | Numer Recognition Workshop Solution - Deep Learning Workshop
Additionally, Ersats published an interesting introduction to Neural Nets at:
http://neuralnetworksanddeeplearning.com/chap1.html
41
4 | Numer Recognition Workshop Solution - Deep Learning Workshop
5 | Advanced Topics
Advanced Topics
Using Vision to Navigate
http://www.youtube.com/watch?v=8c2SFXQ5zHM Sir James explains
http://www.youtube.com/watch?v=oguKCHP7jNQ Navigation
http://www.youtube.com/watch?v=xlaqYDZwoWo#t=43 Suction
NT: Hoover tried to steal his idea. They lost in court with punitive damages. Top right photo HEAD SPORTS.
43
5 | Advanced Topics - Vision for Navigation
One day my wife was sick so I had to vacuum the
house. Then I realized that bag based vacuum cleaners do not suck, So I decided to make one that sucks more. I took longer than expected
though. ------
Sir J. Dyson
What People Cured of Blindness See
Abridged from “What People Cured of Blindness See”
BY PATRICK HOUSE” The New Yorker
How quickly, if at all, does the brain adapt and vision return after surgery? A simple answer, and a correct one, is that it depends entirely on circumstance. Back in 1993, Oliver Sacks wrote a story in the magazine about Virgil, a man with
limited to no vision as a child who had developed cataracts at the age of six. After his cataracts were removed, fifty years
later, Virgil had trouble adjusting. (For example, he could not always distinguish the letter “A” from the letter “H” and, when given Molyneux’s test, could not tell a square he felt from a square he saw.)*
Since the surgeries, Sinha has followed up with the Prakash children and found that, while they continued to suffer from poor acuity, many higher-order aspects of vision seemed to be improving. Within a week to a few months after surgery, the children could match felt objects to their visual counterparts. They also improved on spatial-navigation tasks requiring mental imagery, which tested their ability to follow a series of up, down, left, and right directions on a visually imagined game board. This finding was particularly important because previous work by Kosslyn and others had found that the congenitally blind have a capacity for mental imagery, but it is limited in some ways and becomes increasingly poor as the task becomes more complex. (In one example, a sighted person will imagine a typewriter a few feet away as larger than the same one imagined a hundred feet away. Among the congenitally blind, however, the imagined typewriter—a composite of experiences of touch and sound alone—is the same size at all distances.)
44
5 | Advanced Topics - Blindness
Kosslyn believes that any improvements in mental imagery will require a “catalogue of visual memories” that can then be used to build expectations about the visual world. “When you develop expectations, you can use the fruits of previous experience to help you process what’s coming in now,” Kosslyn said. “But you need to have had that experience.” An example is depth perception: to the sighted, with a lifetime of practice, rules about occlusion (if A occludes B, object A is closer) and foreshortening (objects farther away appear smaller) are continually used to combine incoming light into a rich, three-dimensional world. The absence of these rules can frustrate the newly sighted, whose visual world can be both blurry and two-dimensional—paintings and people are often
described as “flat, with dark patches”; a far-
away house is “nearby, but requiring the taking of a lot of steps”; streetlights seen through glass are “luminous stains stuck to the window”; sunbeams through tree branches collapse into a single “tree with all the lights in it.” (The writer Jorge Luis Borges, who went blind at age fifty-five, described going blind as a process by which “everything near becomes distant.” In the newly sighted, without depth perception, the opposite seems true: the distant—tiny houses on the horizon, clouds in the impossibly high sky—suddenly looks nearby.)
45
5 | Advanced Topics - Blindness
Ways of tracking
http://www.youtube.com/watch?v=InqV34BcheM
1. By Color histogram (HSV is less dependent on illumination)
2. By Blob (OpenCV Library, not very robust)
3. By Face Detection
4. By Saliency (robust to occlusions
Traditional methodsTypical Keypoint Extraction for recognition of objects independent of view:
Harris Afine
MSER
Hessian Afine
Maja’s method InsightFeatures extracted
Use of HSV histogram (robust to ilumination changes)
Texture by Gray level co-occurrence matrix
Edge orientation histogram (6 bins)
Mean, skewness and sd for each color channel
Discard all but 25 top features.
Tested on Columbia Object Image Library. Beats previous methods.
46
5 | Advanced Topics - Easy Tracking
We are looking for an algo that is invariant to
partial oclusions------
Maja R.
47
5 | Advanced Topics - Easy Tracking
For more:
48
5 | Advanced Topics - Easy Tracking
Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. See these course notes for a brief introduction to Machine Learning for AI and an introduction toDeep Learning algorithms. www.deeplearning.net/tutorial/
Deep Learning explained(Abridged from the original from Pete Warden | @petewarden)http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html
Inside an ANN
The functions that are run inside an ANN are controlled by the memory of the neural network, arrays of numbers known as weights that define how the inputs are combined and recombined to produce the results. Dealing with real-world problems like cat-detection requires very complex functions, which mean these arrays are very large, containing around
60 million (60MBytes) numbers in the case of one of
the recent computer vision networks. The biggest obstacle to using neural networks has been figuring out how to set all these massive arrays to values that will do a good job transforming the input signals into output predictions.
Renaissance
It has always been difficult to train an ANN. But in 2012, a
breakthrough, a paper sparks a renaissance in ANN. Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton bring together a
whole bunch of different ways of accelerating the
learning process, including convolutional networks, clever use of GPUs, and some novel mathematical tricks like ReLU and dropout, and showed that in a few weeks they could
49
5 | Advanced Topics - Deep Learning
train a very complex network to a level that outperformed conventional approaches to computer vision.
GPU photo by Pete Warden slides (Jetpack)
Listen to the Webcast at Strata 2013http://www.oreilly.com/pub/e/3121
http://www.iro.umontreal.ca/~pift6266/H10/intro_diapos.pdf
Deep NN failed unitl 2006....
50
5 | Advanced Topics - Deep Learning
Automatic speech recognitionThe results shown in the table below are for automatic speech recognition on the popular TIMIT data set. This is a common data set used for initial evaluations of deep learning architectures. The entire set contains 630 speakers from eight major dialects of American English, with each speaker reading 10 different sentences.[48] Its small size allows many different configurations to be tried effectively with it. The error rates presented are phone error rates (PER).
http://en.wikipedia.org/wiki/Deep_learning#Fundamental_concepts
51
5 | Advanced Topics - Deep Learning
Andrew Ng on Deep Learning where AI will learn from untagged data
52
5 | Advanced Topics - Deep Learning
https://www.youtube.com/watch?v=W15K9PegQt0#t=221 [39:56]
53
5 | Advanced Topics - Deep Learning
To learn more about Andrew Ng on Deep Learning and the future of #AI
- http://new.livestream.com/gigaom/FutureofAI (~1:20:00)
- https://www.youtube.com/watch?v=W15K9PegQt0#t=221
-http://deeplearning.stanford.edu/
Neural Nets talk by Ilya Sutskeverhttps://vimeo.com/77050653 [25:00 - end]
54
5 | Advanced Topics - Deep Learning
6 | Sample Questions
Sample Questions
1. OpenCV is a software app that is useful because
1. it not a software app
2. false
3. true
4. it is useful because it integrates functionality commonly used to perform computer vision
2. Projective geometry is useful because
1. every pinhole based visual system is subject to the laws of projective geometry
2. compound eyes are subject to projective geometry
3. all are false
4. all are true
3. Pinhole eye design is popular in nature
1. true
2. false
4. A feature is
1. a characteristic of a image
2. the output of a filter
3. the output of non linear filter (morphological)
4. a feature must be a number
5. a feature cannot be binary
6. a feature can be true or false
5. A feature that is a constant can still be useful for clustering purposes,
1. true
2. false
6. A good feature is a salient feature,
1. true
2. false
3. not always
56
6 | Sample Questions - A
7. The absolute minimum number of features to cluster 7 labels is six
1. true
2. false
3. its 8
4. 7
5. all are true
6. depends on the features
7. none
8. Use of features instead of dealing with pixels allows to simplify the problem of labeling a image
1. true
2. false
9. Clustering by closest neighbor is the most effective method to classify two overlapping clouds of points
1. true
2. k-means is in general superior
3. all of the above
4. none
10. Likelihood is the best way to cluster labels
1. true
2. false
3. it is better than k-means
11. A drawback of k-means is that you have to set the number of clusters
1. true
2. false
12. Learning to recognize a pattern is equivalent to,
1. Knowing the cluster boundaries and knowing which features to use
2. is to make a database of possible images and then when one of the images is presented to tell which one is same in the database
3. all
4. none
13. In morphological operations, closing is...
1. used to clean salt a pepper noise which is bigger than the objects on interest
2. used to clean salt a pepper noise which is smaller than the objects on interest
3. opening and dilation by this order
4. dilation and then erosion
14. Opening is
1. a tool to remove white noise form a black and white picture
57
6 | Sample Questions - A
2. it only works on black and white picture
3. erosion followed by closing
4. erosion followed by dilation
5. none
15. Erosion is
1. a linear operation
2. is reversible
3. all are true
16. Dilation is
1. used to connect disconnected lines
2. to remove vertical lines
3. none
4. all
17. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate
1. [1,1,1,1,1,1,1,1] (as a vertical column)
2. [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
3. [1,1,1,3,4,3,1,1]
4. [0,0,0,1,0,0]
5. [1,1,1]
6. [1,1]
7. none
18. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate
1. a 4x4 matrix of ones
2. a 7x1 matrix of ones
3. a 7x7 matrix of ones
4. a 1x7 matrix of ones
5. none
19. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate
1. a 1x5 matrix of ones
2. a 5x1 matrix of ones
3. a 5x5 matrix of zeroes
4. a 5x1 matrix of zeroes with a 1 in the middle
5. none
20. Ideally the features to do arabic number recognition would be
1. correspond to traces,
2. not correspond to traces but numbers
3. both
4. none
58
6 | Sample Questions - A
21. The best features for pattern recognition are those that appear infrequently in the dataset because
1. they have a lot of entropy
2. they discriminate numbers better because the frequency
3. none
4. all
22. Sparse coding is related to how the brain detects edges by means of basis functions
1. true
2. false
3. not always, depends on the variance of the images
59
6 | Sample Questions - A
Dr. Jose Berengueres joined UAE University as Assistant Professor in 2011. He received MEE from Polytechnic University of Catalonia in 1999 and a PhD in bio-inspired robotics from Tokyo Institute of Technology in 2007.
He has authored books on:The Toyota Production SystemDesign ThinkingHuman Computer InteractionUX women designersBusiness Models Innovation
He has given talks and workshops on Design Thinking & Business Models in Germany, Mexico, Dubai, and California.