![Page 1: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/1.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 20191
Lecture 3:Loss Functions
and Optimization
![Page 2: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/2.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Administrative: Assignment 1
Released last week, due Wed 4/17 at 11:59pm
Reminder: Please double check your downloaded file is spring1819_assignment1.zip, not spring1718! As announced in piazza note @79, there was a legacy link being fixed just as the assignment was being released.
2
![Page 3: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/3.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Administrative: Project proposal
Due Wed 4/24
3
![Page 4: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/4.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Administrative: Piazza
Please check and read pinned piazza posts. There will be announcements soon re: alternative midterm requests, Google Cloud credits, etc.
4
![Page 5: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/5.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Recall from last time: Challenges of recognition
5
This image is CC0 1.0 public domain This image by Umberto Salvagnin is licensed under CC-BY 2.0
This image by jonsson is licensed under CC-BY 2.0
Illumination Deformation Occlusion
This image is CC0 1.0 public domain
Clutter
This image is CC0 1.0 public domain
Intraclass Variation
Viewpoint
![Page 6: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/6.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Recall from last time: data-driven approach, kNN
6
1-NN classifier 5-NN classifier
train test
train testvalidation
![Page 7: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/7.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Recall from last time: Linear Classifier
7
f(x,W) = Wx + b
![Page 8: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/8.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Recall from last time: Linear Classifier
8
1. Define a loss function that quantifies our unhappiness with the scores across the training data.
2. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization)
TODO:
Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain
![Page 9: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/9.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 20199
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
![Page 10: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/10.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201910
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
A loss function tells how good our current classifier is
![Page 11: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/11.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201911
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
A loss function tells how good our current classifier is
Given a dataset of examples
Where is image and is (integer) label
![Page 12: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/12.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201912
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
A loss function tells how good our current classifier is
Given a dataset of examples
Where is image and is (integer) label
Loss over the dataset is a average of loss over examples:
![Page 13: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/13.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201913
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
![Page 14: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/14.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201914
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
“Hinge loss”
![Page 15: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/15.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201915
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
![Page 16: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/16.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201916
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
= max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1)= max(0, 2.9) + max(0, -3.9)= 2.9 + 0= 2.9Losses: 2.9
![Page 17: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/17.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201917
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Losses:
= max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)= max(0, -2.6) + max(0, -1.9)= 0 + 0= 002.9
![Page 18: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/18.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201918
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Losses:
= max(0, 2.2 - (-3.1) + 1) +max(0, 2.5 - (-3.1) + 1)= max(0, 6.3) + max(0, 6.6)= 6.3 + 6.6= 12.912.92.9 0
![Page 19: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/19.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201919
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Loss over full dataset is average:
Losses: 12.92.9 0 L = (2.9 + 0 + 12.9)/3 = 5.27
![Page 20: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/20.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201920
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q: What happens to loss if car scores change a bit?Losses: 12.92.9 0
![Page 21: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/21.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201921
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q2: what is the min/max possible loss?Losses: 12.92.9 0
![Page 22: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/22.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201922
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q3: At initialization W is small so all s ≈ 0.What is the loss?Losses: 12.92.9 0
![Page 23: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/23.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201923
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q4: What if the sum was over all classes? (including j = y_i)Losses: 12.92.9 0
![Page 24: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/24.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201924
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q5: What if we used mean instead of sum?Losses: 12.92.9 0
![Page 25: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/25.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201925
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
Suppose: 3 training examples, 3 classes.With some W the scores are:
Multiclass SVM loss:
Given an examplewhere is the image andwhere is the (integer) label,
and using the shorthand for the scores vector:
the SVM loss has the form:
Q6: What if we used
Losses: 12.92.9 0
![Page 26: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/26.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Multiclass SVM Loss: Example code
26
![Page 27: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/27.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201927
E.g. Suppose that we found a W such that L = 0. Is this W unique?
![Page 28: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/28.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201928
E.g. Suppose that we found a W such that L = 0. Is this W unique?
No! 2W is also has L = 0!
![Page 29: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/29.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201929
Suppose: 3 training examples, 3 classes.With some W the scores are:
cat
frog
car
3.25.1-1.7
4.91.3
2.0 -3.12.52.2
= max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1)= max(0, -2.6) + max(0, -1.9)= 0 + 0= 0
0Losses: 2.9
Before:
With W twice as large:= max(0, 2.6 - 9.8 + 1) +max(0, 4.0 - 9.8 + 1)= max(0, -6.2) + max(0, -4.8)= 0 + 0= 0
![Page 30: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/30.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201930
E.g. Suppose that we found a W such that L = 0. Is this W unique?
No! 2W is also has L = 0! How do we choose between W and 2W?
![Page 31: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/31.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization
31
Data loss: Model predictions should match training data
![Page 32: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/32.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization
32
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
![Page 33: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/33.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization
33
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
![Page 34: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/34.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization
34
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):
![Page 35: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/35.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization
35
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
Simple examplesL2 regularization: L1 regularization: Elastic net (L1 + L2):
More complex:DropoutBatch normalizationStochastic depth, fractional pooling, etc
![Page 36: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/36.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization
36
Data loss: Model predictions should match training data
Regularization: Prevent the model from doing too well on training data
= regularization strength(hyperparameter)
Why regularize?- Express preferences over weights- Make the model simple so it works on test data- Improve optimization by adding curvature
![Page 37: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/37.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization: Expressing Preferences
37
L2 Regularization
![Page 38: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/38.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization: Expressing Preferences
38
L2 Regularization
L2 regularization likes to “spread out” the weights
![Page 39: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/39.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization: Prefer Simpler Models
39
x
y
![Page 40: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/40.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization: Prefer Simpler Models
40
x
yf1 f2
![Page 41: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/41.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Regularization: Prefer Simpler Models
41
x
yf1 f2
Regularization pushes against fitting the data too well so we don’t fit noise in the data
![Page 42: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/42.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201942
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilities
![Page 43: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/43.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201943
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
![Page 44: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/44.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201944
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
exp
unnormalized probabilities
Probabilities must be >= 0
![Page 45: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/45.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201945
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilities
![Page 46: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/46.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201946
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
![Page 47: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/47.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201947
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
Li = -log(0.13) = 2.04
![Page 48: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/48.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201948
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
Li = -log(0.13) = 2.04
Maximum Likelihood EstimationChoose weights to maximize the likelihood of the observed data(See CS 229 for details)
![Page 49: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/49.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201949
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
1.000.000.00Correct probs
compare
![Page 50: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/50.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201950
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
1.000.000.00Correct probs
compare
Kullback–Leibler divergence
![Page 51: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/51.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201951
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
24.5164.00.18
0.130.870.00
exp normalize
unnormalized probabilities
Probabilities must be >= 0
Probabilities must sum to 1
probabilitiesUnnormalized log-probabilities / logits
1.000.000.00Correct probs
compare
Cross Entropy
![Page 52: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/52.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201952
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
![Page 53: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/53.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201953
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q: What is the min/max possible loss L_i?
![Page 54: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/54.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201954
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q: What is the min/max possible loss L_i?A: min 0, max infinity
![Page 55: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/55.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201955
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q2: At initialization all s will be approximately equal; what is the loss?
![Page 56: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/56.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201956
Softmax Classifier (Multinomial Logistic Regression)
cat
frog
car
3.25.1-1.7
Want to interpret raw classifier scores as probabilitiesSoftmax Function
Maximize probability of correct class Putting it all together:
Q2: At initialization all s will be approximately equal; what is the loss?A: log(C), eg log(10) ≈ 2.3
![Page 57: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/57.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201957
Softmax vs. SVM
![Page 58: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/58.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201958
Softmax vs. SVM
![Page 59: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/59.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201959
Softmax vs. SVM
assume scores:[10, -2, 3][10, 9, 9][10, -100, -100]and
Q: Suppose I take a datapoint and I jiggle a bit (changing its score slightly). What happens to the loss in both cases?
![Page 60: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/60.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201960
Recap- We have some dataset of (x,y)- We have a score function: - We have a loss function:
e.g.
Softmax
SVM
Full loss
![Page 61: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/61.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201961
Recap- We have some dataset of (x,y)- We have a score function: - We have a loss function:
e.g.
Softmax
SVM
Full loss
How do we find the best W?
![Page 62: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/62.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201962
Optimization
![Page 63: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/63.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201963
This image is CC0 1.0 public domain
![Page 64: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/64.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201964Walking man image is CC0 1.0 public domain
![Page 65: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/65.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201965
Strategy #1: A first very bad idea solution: Random search
![Page 66: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/66.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201966
Lets see how well this works on the test set...
15.5% accuracy! not bad!(SOTA is ~95%)
![Page 67: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/67.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201967
Strategy #2: Follow the slope
![Page 68: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/68.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201968
Strategy #2: Follow the slope
In 1-dimension, the derivative of a function:
In multiple dimensions, the gradient is the vector of (partial derivatives) along each dimension
The slope in any direction is the dot product of the direction with the gradientThe direction of steepest descent is the negative gradient
![Page 69: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/69.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201969
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
gradient dW:
[?,?,?,?,?,?,?,?,?,…]
![Page 70: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/70.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201970
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (first dim):
[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322
gradient dW:
[?,?,?,?,?,?,?,?,?,…]
![Page 71: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/71.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201971
gradient dW:
[-2.5,?,?,?,?,?,?,?,?,…]
(1.25322 - 1.25347)/0.0001= -2.5
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (first dim):
[0.34 + 0.0001,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25322
![Page 72: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/72.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201972
gradient dW:
[-2.5,?,?,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (second dim):
[0.34,-1.11 + 0.0001,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25353
![Page 73: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/73.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201973
gradient dW:
[-2.5,0.6,?,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (second dim):
[0.34,-1.11 + 0.0001,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25353
(1.25353 - 1.25347)/0.0001= 0.6
![Page 74: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/74.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201974
gradient dW:
[-2.5,0.6,?,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (third dim):
[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
![Page 75: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/75.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201975
gradient dW:
[-2.5,0.6,0,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (third dim):
[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
(1.25347 - 1.25347)/0.0001= 0
![Page 76: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/76.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201976
gradient dW:
[-2.5,0.6,0,?,?,?,?,?,?,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
W + h (third dim):
[0.34,-1.11,0.78 + 0.0001,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
Numeric Gradient- Slow! Need to loop over
all dimensions- Approximate
![Page 77: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/77.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201977
This is silly. The loss is just a function of W:
want
![Page 78: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/78.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201978
This is silly. The loss is just a function of W:
want
This image is in the public domain This image is in the public domain
Use calculus to compute an analytic gradient
![Page 79: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/79.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201979
gradient dW:
[-2.5,0.6,0,0.2,0.7,-0.5,1.1,1.3,-2.1,…]
current W:
[0.34,-1.11,0.78,0.12,0.55,2.81,-3.1,-1.5,0.33,…] loss 1.25347
dW = ...(some function data and W)
![Page 80: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/80.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201980
In summary:- Numerical gradient: approximate, slow, easy to write
- Analytic gradient: exact, fast, error-prone
=>
In practice: Always use analytic gradient, but check implementation with numerical gradient. This is called a gradient check.
![Page 81: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/81.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201981
Gradient Descent
![Page 82: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/82.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201982
original W
negative gradient directionW_1
W_2
![Page 83: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/83.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201983
![Page 84: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/84.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Stochastic Gradient Descent (SGD)
84
Full sum expensive when N is large!
Approximate sum using a minibatch of examples32 / 64 / 128 common
![Page 85: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/85.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 201985
Interactive Web Demo
http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/
![Page 86: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/86.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Aside: Image Features
86
f(x) = WxClass scores
![Page 87: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/87.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Aside: Image Features
87
f(x) = WxClass scores
Feature Representation
![Page 88: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/88.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Image Features: Motivation
88
x
y
Cannot separate red and blue points with linear classifier
![Page 89: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/89.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Image Features: Motivation
89
x
y
r
θ
f(x, y) = (r(x, y), θ(x, y))
Cannot separate red and blue points with linear classifier
After applying feature transform, points can be separated by linear classifier
![Page 90: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/90.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Example: Color Histogram
90
+1
![Page 91: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/91.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Example: Histogram of Oriented Gradients (HoG)
91
Divide image into 8x8 pixel regionsWithin each region quantize edge direction into 9 bins
Example: 320x240 image gets divided into 40x30 bins; in each bin there are 9 numbers so feature vector has 30*40*9 = 10,800 numbers
Lowe, “Object recognition from local scale-invariant features”, ICCV 1999Dalal and Triggs, "Histograms of oriented gradients for human detection," CVPR 2005
![Page 92: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/92.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Example: Bag of Words
92
Extract random patches
Cluster patches to form “codebook” of “visual words”
Step 1: Build codebook
Step 2: Encode images
Fei-Fei and Perona, “A bayesian hierarchical model for learning natural scene categories”, CVPR 2005
![Page 93: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/93.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Aside: Image Features
93
![Page 94: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/94.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Feature Extraction
Image features vs ConvNets
94
f10 numbers giving scores for classes
training
training
10 numbers giving scores for classes
Krizhevsky, Sutskever, and Hinton, “Imagenet classification with deep convolutional neural networks”, NIPS 2012.Figure copyright Krizhevsky, Sutskever, and Hinton, 2012. Reproduced with permission.
![Page 95: Lecture 3: Loss Functions and Optimizationvision.stanford.edu/.../2019/cs231n_2019_lecture03.pdf · Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019 16 cat frog](https://reader036.vdocuments.mx/reader036/viewer/2022062317/5ed82b8e0fa3e705ec0df73d/html5/thumbnails/95.jpg)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 9, 2019
Next time:
Introduction to neural networks
Backpropagation
95