discriminative & deep learning for big dataaz/lectures/aims-big_data/discrim_learning1.pdf ·...
TRANSCRIPT
![Page 1: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/1.jpg)
Discriminative & Deep Learning for
Big Data
Andrew Zisserman
Visual Geometry Group
University of Oxford
http://www.robots.ox.ac.uk/~vgg
AIMS-CDT Michaelmas 2019
![Page 2: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/2.jpg)
Overview
Two aspects:
1. Learning from Big Data:
• Supervised and discriminative learning: provides training data
for large capacity models
• e.g. for data hungry deep neural networks
2. Searching and modelling Big Data:
• Need efficient (fast and low memory) methods to address this
![Page 3: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/3.jpg)
Mon: Discriminative Learning 1 [AZ]
• Introduction, NN, linear classifiers; regression, SVMs, loss functions
Tue: Discriminative Learning 2 and searching Big Data [AZ,RA]
• Multiple classes, large scale retrieval; ANN, LSH, PQ
• Practical 1: Image classification
Wed: Deep Learning 1 [AV]
• Neural networks, CNNs, back-prop, applications
Thu: Deep learning 2 and Large Scale Learning [AV]
• Architectures, visualization, applications; large scale learning
• Practical 2: CNNs
Mon 9th: Seminar by Alyosha Efros
• Applications of large scale matching in Computer Vision
Course Outline
![Page 4: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/4.jpg)
Recommended book: general introduction
• Pattern Recognition and
Machine Learning
Christopher Bishop, Springer, 2006.
• Excellent on classification and
regression
![Page 5: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/5.jpg)
Textbooks 2: also general
• Elements of Statistical
Learning
Hastie, Tibshirani, Friedman,
Springer, 2009, second edition
• Good explanation of algorithms
• pdf available online
![Page 6: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/6.jpg)
Learning with Kernels
Bernhard Schölkopf and Alexander J. Smola
MIT, 2002.
Textbooks 3: specialized for SVMs and kernels
![Page 7: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/7.jpg)
Textbooks 4: specialized for deep learning
• Deep Learning
Goodfellow, Bengio, Courville
MIT Press, 2016
• html version available online
• http://www.deeplearningbook.org
![Page 8: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/8.jpg)
Introduction: What is Machine Learning?
Algorithms that can improve their performance using
training data
• Typically the algorithm has a (large) number of
parameters whose values are learnt from the data
• Can be applied in situations where it is very challenging
(= impossible) to define rules by hand, e.g.:
• Face recognition
• Speech recognition
• Stock prediction
![Page 9: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/9.jpg)
Example 1: hand-written digit recognition
Images are 28 x 28 pixels
![Page 10: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/10.jpg)
How to proceed …
As a supervised classification problem
Start with training data, e.g. 6000 examples of each digit
• Can achieve testing error of 0.4%
• One of first commercial and widely used ML systems (for zip codes & checks)
![Page 11: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/11.jpg)
Example 2: Face detection
• Again, a supervised classification problem
• Need to classify an image window into three classes:
• non-face
• frontal-face
• profile-face
![Page 12: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/12.jpg)
Classifier is learnt from labelled data
Training data for frontal faces
• 5000 faces
All near frontal
Age, race, gender, lighting
• 108 non faces
• faces are normalized
scale, translation
![Page 13: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/13.jpg)
Example 3: Spam detection
• This is a classification problem
• Task is to classify email into spam/non-spam
• Data xi is word count, e.g. of viagra, outperform, “you may be
surprized to be contacted” …
• Requires a learning system as “enemy” keeps innovating
![Page 14: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/14.jpg)
Example 4: Stock price prediction
• Task is to predict stock price at future date
• This is a regression task, as the output is continuous
![Page 15: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/15.jpg)
Protein Structure and Disulfide Bridges
Protein: 1IMT
AVITGACERDLQCG
KGTCCAVSLWIKSV
RVCTPVGTSGEDCH
PASHKIPFSGQRMH
HTCPCAPNLACVQT
SPKKFKCLSK
Regression task: given sequence predict 3D structure
Example 5: Computational biology
![Page 16: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/16.jpg)
abL
greyscale image
“Free”
supervisory
signal
Regression task: predict pixel colour from a monochrome input
Example 6: Colourization
colour image
![Page 17: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/17.jpg)
Colourization examples
Train network to predict pixel colour from a monochrome input
Colorful Image Colorization, Zhang et al., ECCV 2016
![Page 18: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/18.jpg)
Example 7: Machine translation
What is the anticipated cost of collecting fees under the new proposal?
En vertu des nouvelles propositions, quel est le coût prévu de perception des droits?
x y
Whatis
theanticipated
costof
collecting fees
under the
new proposal
?
Envertudelesnouvellespropositions, quelestle coûtprévude perception de les droits?
e.g. Google translate
Use of aligned text
![Page 19: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/19.jpg)
What is the anticipated cost of collecting fees under the new proposal?
Google Translate (2019)
![Page 20: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/20.jpg)
Example 8: Machine transcription – sequences
Use of aligned speech and text
• Automated speech recognition
Use of aligned images and text
• Text spotting and recognition
Use of aligned face video and text
• Automated lip reading
![Page 21: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/21.jpg)
![Page 22: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/22.jpg)
![Page 23: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/23.jpg)
Lip reading sentences
Chung, Senior, Vinyals, Zisserman, 2017
![Page 24: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/24.jpg)
Speech and natural language
https://www.skype.com/en/features/skype-translator/
Google Translate App
• Translate between 103 languages by typing
• Offline: Translate 52 languages when you have no Internet
• Instant camera translation: Use your camera to translate
text instantly in 30 languages
• Camera Mode: Take pictures of text for higher-quality
translations in 37 languages
• Conversation Mode: Two-way instant speech translation in
32 languages
https://play.google.com/store/apps/details?id=com.google.android.apps.tra
nslate&hl=en
See also: The Great AI Awakening (New York Times Magazine)
Slide credit: Lana Lazebnik
![Page 25: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/25.jpg)
Lecture outline
• Classification
• Regression
• Overfitting
• Regularization and Loss functions
• Part II
• Linear classifiers
• Support Vector Machine (SVM)
• Logistic regression (LR)
![Page 26: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/26.jpg)
Supervised Learning: Overview
Learning machine
![Page 27: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/27.jpg)
Classification
• Suppose we are given a training set of N observations
• Classification problem is to estimate f(x) from this data such that
![Page 28: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/28.jpg)
K Nearest Neighbour (K-NN) Classifier
Algorithm
• For each test point, x, to be classified, find the K nearest
samples in the training data
• Classify the point, x, according to the majority vote of their
class labels
e.g. K = 3
• applicable to
multi-class case
![Page 29: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/29.jpg)
K = 1
Voronoi diagram:
• partitions the space into regions
• boundaries are equal distance
from training points
Classification boundary:
• non-linear
![Page 30: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/30.jpg)
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
A sampling assumption: training and test data
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Training data Testing data
• Assume that the training examples are drawn independently from the set of all
possible examples.
• This makes it very unlikely that a strong regularity in the training data will be absent in
the test data.
• Measure classification error as
loss function
The “risk”
![Page 31: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/31.jpg)
K = 1
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.0
Training data
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.15
Testing data
![Page 32: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/32.jpg)
K = 3
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.0760 error = 0.1340
Training data Testing data
![Page 33: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/33.jpg)
Generalization
• The real aim of supervised learning is to do well on test data that is not known during learning
• Choosing the values for the parameters that minimize the loss function on the training data is not necessarily the best policy
• We want the learning machine to model the true regularities in the data and to ignore the noise in the data.
![Page 34: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/34.jpg)
K = 1
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.0 error = 0.15
Training data Testing data
![Page 35: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/35.jpg)
K = 3
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.0760 error = 0.1340
Training data Testing data
![Page 36: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/36.jpg)
K = 7
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.1320 error = 0.1110
Training data Testing data
![Page 37: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/37.jpg)
K = 21
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-1.5 -1 -0.5 0 0.5 1 1.5-0.2
0
0.2
0.4
0.6
0.8
1
1.2
error = 0.1120 error = 0.0920
Training data Testing data
![Page 38: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/38.jpg)
Properties and training
As K increases:
• Classification boundary becomes smoother
• Training error can increase
Choose (learn) K on a validation set
• Split training data into training and validation
• Hold out validation data and measure error on this
• Validation set acts as a proxy for the test set
![Page 39: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/39.jpg)
Example: hand written digit recognition
• MNIST data set
• Distance = raw pixel distance between images
• 60K training examples
• 10K testing examples
• K-NN gives 5% classification error
![Page 40: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/40.jpg)
Summary
Advantages:
• K-NN is a simple but effective classification procedure
• Applies to multi-class classification
• Decision surfaces are non-linear
• Quality of predictions automatically improves with more training
data
• Only a single parameter, K; easily tuned by cross-validation
• Use as a baseline/debugging classifier
-1.5 -1 -0.5 0 0.5 1-0.2
0
0.2
0.4
0.6
0.8
1
1.2
![Page 41: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/41.jpg)
Summary
Disadvantages:
• What does nearest mean? Need to specify or learn a distance
metric.
• Computational cost: must store and search through the entire
training set at test time. Can alleviate this problem by thinning,
and use of efficient data structures like approximate NN
![Page 42: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/42.jpg)
Lecture outline
• Classification
• Regression
• Overfitting
• Regularization and Loss functions
• Part II
• Linear classifiers
• Support Vector Machine (SVM)
• Logistic regression (LR)
![Page 43: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/43.jpg)
Regression
• Suppose we are given a training set of N observations
• Regression problem is to estimate y(x) from this data
y
x
![Page 44: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/44.jpg)
K-NN Regression
Algorithm
• For each test point, x, find the K nearest samples xi in the
training data and their values yi
• Output is mean of their values
• Again, need to choose (learn) K
y
![Page 45: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/45.jpg)
Example:
Real time head pose regression
Fanelli et al., DAGM 2011
![Page 46: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/46.jpg)
![Page 47: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/47.jpg)
Image Regression Examples
IM2GPS
When was that made?Age estimation
Slide credit: Lana Lazebnik
![Page 48: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/48.jpg)
Regression example: polynomial curve fitting
• The green curve is the true function (which is
not a polynomial)
• The data points are uniform in x but have
noise in y.
• We will use a loss function that measures the
squared error in the prediction of y(x) from x.
The loss for the red polynomial is the sum of
the squared vertical errors.
from Bishop
target value
polynomial
regression
y
y
![Page 49: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/49.jpg)
Some fits to the data: which is best?
from Bishop
over fitting
y y
yy
![Page 50: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/50.jpg)
Over-fitting
Root-Mean-Square (RMS) Error:
• test data: a different sample from the same true function
• training error goes to zero, but test error increases with M
![Page 51: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/51.jpg)
Trading off goodness of fit against model complexity
• If the model has as many degrees of freedom as the data, it
can fit the training data perfectly
• But the objective in ML is generalization
• Can expect a model to generalize well if it explains the training
data surprisingly well given the complexity of the model.
![Page 52: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/52.jpg)
How to prevent over fitting? I
• Add more data than the model “complexity”
• For 9th order polynomial:
y y
![Page 53: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/53.jpg)
Polynomial Coefficients
![Page 54: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/54.jpg)
• Regularization: penalize large coefficient values
How to prevent over fitting? II
loss function regularization
• In practice use validation data to choose lambda (not test data)
• Convex cost function, closed form solution
• This is “weight decay” in deep learning
“ridge” regression
y
![Page 55: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/55.jpg)
Polynomial Coefficients
![Page 56: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/56.jpg)
Summary Point: How to set parameters?
Use a validation set
Divide the total dataset into three subsets:
• Training data is used for learning the parameters of the model.
• Validation data is not used for learning, but is used to determine the hyper-parameters, e.g. for deciding the type of model and the amount of regularization.
• Test data is used to get a final, unbiased estimate of how well the learning machine works. Expect this estimate to be worse than on the validation data.
Can re-divide the total dataset to get another unbiased estimate of the true error rate.
![Page 57: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/57.jpg)
• Again, need to control the complexity of the (discriminant)
function
![Page 58: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/58.jpg)
Lecture outline
• Classification
• Regression
• Overfitting
• Regularization and Loss functions
• Part II
• Linear classifiers
• Support Vector Machine (SVM)
• Logistic regression (LR)
![Page 59: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/59.jpg)
What comes next?
• Learning by optimizing a cost function:
loss function regularization
• In general
• choose loss function for: classification, regression, ranking, clustering …
• choose regularization function
![Page 60: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/60.jpg)
The “Lasso” or L1 norm regularization
loss function regularization
• This is a quadratic optimization problem
• There is a unique solution
• p-Norm definition:
• LASSO = Least Absolute Shrinkage and Selection
![Page 61: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/61.jpg)
Sparsity property of the Lasso
• contour plots for d = 2
ridge regression lasso
• Minimum where contours of loss and regularizer are tangent
• For the lasso case, minima occur at “corners”
• Consequently one of the weights is zero
• In high dimensions many weights can be zero
![Page 62: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/62.jpg)
![Page 63: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/63.jpg)
regularization parameter lambda
0 0.5 1-1
-0.5
0
0.5
1
1.5Ridge Regression
percent of lambdaMax
co
effic
ien
t va
lue
s
0 0.5 1-1
-0.5
0
0.5
1
1.5L1-Regularized Least Squares
percent of lambdaMax
co
effic
ien
t va
lue
s
Lasso in action
![Page 64: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/64.jpg)
Sparse weight vectors
• Weights being zero is a method of “feature selection” –
zeroing out the unimportant features
• (The SVM classifier also has this property (sparse alpha
in the dual representation))
• Ridge regression does not
![Page 65: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/65.jpg)
More loss functions for regression
![Page 66: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/66.jpg)
• Linear classifiers
• Linear separability
• Support Vector Machine (SVM) classifier
• Wide margin
• Cost function
• Hard and soft margins
• Optimization
• Logistic Regression classifier
• Cost function
Part II Outline
![Page 67: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/67.jpg)
Linear Classifiers and the SVM
![Page 68: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/68.jpg)
Binary Classification
![Page 69: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/69.jpg)
Linear separability
linearly
separable
not
linearly
separable
![Page 70: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/70.jpg)
Linear classifiers
X2
X1
A linear classifier has the form
• in 2D the discriminant is a line
• is the normal to the line, and b the bias
• is known as the weight vector
![Page 71: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/71.jpg)
Linear classifiers
A linear classifier has the form
• in 3D the discriminant is a plane, and in nD it is a hyperplane
For a K-NN classifier it was necessary to `carry’ the training data
For a linear classifier, the training data is used to learn w and then discarded
Only w is needed for classifying new data
![Page 72: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/72.jpg)
What is the best w?
• maximum margin solution: most stable under perturbations of the inputs
![Page 73: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/73.jpg)
Interlude: Why should we care about linear classifiers?
Example: bicycle image classification
![Page 74: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/74.jpg)
Pre-deep learning – non-linear classifiers
interpretationinput representation
Feature
Extractor
![Page 75: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/75.jpg)
Deep learning – train network for linear classification
interpretationinput representation
CNN Feature
Extractor
• Optimize network with loss
function for a linear classifier
• Learns to produce feature vectors
x that are linearly separable
![Page 76: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/76.jpg)
How to find the maximum margin?
linearly separable data
![Page 77: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/77.jpg)
Support Vector Machine
w
Support Vector
Support Vector
support vectors
wTx + b = 0
linearly separable data
![Page 78: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/78.jpg)
How to find the maximum margin?
linearly separable data
![Page 79: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/79.jpg)
SVM – sketch derivation
![Page 80: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/80.jpg)
Support Vector Machine
w
Support Vector
Support Vector
wTx + b = 0
wTx + b = 1
wTx + b = -1
Margin =
linearly separable data
![Page 81: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/81.jpg)
SVM – Optimization
![Page 82: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/82.jpg)
Linear separability again: What is the best w?
• the points can be linearly separated but
there is a very narrow margin
• but possibly the large margin solution is
better, even though one constraint is violated
In general there is a trade off between the margin and the number of
mistakes on the training data
![Page 83: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/83.jpg)
Margin violations and misclassifications
w
Support Vector
Support Vector
wTx + b = 0
wTx + b = 1
wTx + b = -1
Misclassified point
Margin violationPenalize points according
to their distance from the
margin
![Page 84: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/84.jpg)
Loss functions
• SVM uses “hinge” loss
• in contrast to the 0-1 loss
![Page 85: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/85.jpg)
“Soft” margin problem
loss functionregularization
![Page 86: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/86.jpg)
Loss function
w
Support Vector
Support Vector
wTx + b = 0loss function
![Page 87: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/87.jpg)
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
feature x
featu
re y
File: az-margin.mat, # of points K = 27
• data is linearly separable
• but only with a narrow margin
![Page 88: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/88.jpg)
C = Infinity hard margin
STPRTool Franc & Hlavac
![Page 89: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/89.jpg)
C = 10 soft margin
![Page 90: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/90.jpg)
• Does this cost function have a unique solution?
• Does the solution depend on the starting point of an iterative
optimization algorithm (such as gradient descent)?
local
minimum
global
minimum
If the cost function is convex, then a locally optimal point is globally optimal
(provided the optimization is over a convex set, which it is in our case)
Optimization
![Page 91: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/91.jpg)
Convex functions
![Page 92: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/92.jpg)
Convex function examples
convex Not convex
A non-negative sum of convex functions is convex
![Page 93: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/93.jpg)
+
convex
![Page 94: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/94.jpg)
• We have seen – “ridge” regression
squared loss: squared regularizer:
• Lasso regression
squared loss: lasso regularizer:
• SVM
hinge loss: squared regularizer:
Summary: Learning by optimizing cost functions
loss function regularization
![Page 95: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/95.jpg)
Logistic Regression Classifier
![Page 96: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/96.jpg)
Overview
![Page 97: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/97.jpg)
The logistic function or sigmoid function
-20 -15 -10 -5 0 5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
z
![Page 98: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/98.jpg)
-20 -15 -10 -5 0 5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
A sigmoid favours a larger margin cf a step classifier
Margin property
-20 -15 -10 -5 0 5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Though, need to control the gradient. How?
![Page 99: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/99.jpg)
Learning
-20 -15 -10 -5 0 5 10 15 200
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
1
0.5
![Page 100: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/100.jpg)
Maximum Likelihood Estimation
![Page 101: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/101.jpg)
Logistic Regression Loss function
![Page 102: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/102.jpg)
Comparison of SVM and LR cost functions
Note:
• both approximate 0-1 loss
• very similar asymptotic behaviour
• main difference is smoothness of LR,
and non-zero outside SVM margin
• SVM gives sparse solution for ai
yi f (x i )
![Page 103: Discriminative & Deep Learning for Big Dataaz/lectures/aims-big_data/discrim_learning1.pdf · Colorful Image Colorization, Zhang et al., ECCV 2016. Example 7: Machine translation](https://reader030.vdocuments.mx/reader030/viewer/2022040306/5ec5aa81bd278d405c141dcc/html5/thumbnails/103.jpg)
• Beyond Binary Classification
• Multi-class and Multi-label
• Using binary classifiers
• Big Data
• retrieval and ranking
• precision-recall curves
• Nearest Neighbours (NN)
• Approximate NN, Locality Sensitive Hashing, Product Quantization
• Intro to practical
Next lecture