least squares optimization and gradient descent algorithm · 2019. 11. 21. · scatter plot plot...
TRANSCRIPT
![Page 1: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/1.jpg)
Least Squares Optimization and Gradient Descent Algorithm
![Page 2: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/2.jpg)
Example
■ Single Variable Linear Regression estimate yi = θ0 + θ1xi
122025354050
y
204010206065
x
![Page 3: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/3.jpg)
ESTIMATING PARAMETERS:LEAST SQUARES METHOD
?!3
![Page 4: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/4.jpg)
SCATTER PLOTPlot all (Xi, Yi) pairs, and plot your learned model
!4
0204060
0 20 40 60X
Y
[WF]
![Page 5: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/5.jpg)
QUESTIONHow would you draw a line through the points? How do you determine which line “fits the best” …? ?????????
!5
0204060
0 20 40 60X
Y
[WF]
![Page 6: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/6.jpg)
QUESTIONHow would you draw a line through the points? How do you determine which line “fits the best” ?????????
!6
0204060
0 20 40 60X
Y
Slope changed
Intercept unchanged
[WF]
![Page 7: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/7.jpg)
QUESTIONHow would you draw a line through the points? How do you determine which line “fits the best” ?????????
!7
0204060
0 20 40 60X
Y
Slope unchanged
Intercept changed[WF]
![Page 8: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/8.jpg)
QUESTIONHow would you draw a line through the points? How do you determine which line “fits the best” ?????????
!8
0204060
0 20 40 60X
Y
Slope changed
Intercept changed
[WF]
![Page 9: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/9.jpg)
LEAST SQUARESBest fit: difference between the true (observed) Y-values and the estimated Y-values is minimized: • Positive errors offset negative errors … • … square the error!
Least squares minimizes the sum of the squared errors
!9[WF]
n
∑i=1
(yi − yi)2 =n
∑i=1
ϵ2i
![Page 10: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/10.jpg)
!10
LEAST SQUARES, GRAPHICALLY
ε2
Y
X
ε1 ε3
ε4
^^
^^
[WF]
LS Minimizes n
∑i=1
ϵ2i = ϵ2
1 + ϵ22 + … + ϵ2
n
y2 = θ0 + θ1x2 + ϵ2
yi = θ0 + θ1xi
![Page 11: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/11.jpg)
Example
■ Single Variable Linear Regression estimate yi = θ0 + θ1xi
160014002100 … …. 2400
Area(sq. ft.)y
220180350…….500
Price (in 1000$) x
![Page 12: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/12.jpg)
Multivariate Regression
■ Multi Linear Regression
Price (in 1000$)
220180350…….500
Area(sq. ft.)
160014002100 … ….
2400
# Bathrooms # Bedrooms
2.51.53.5……4
334……5
yi = θ0 + θ1xi1 + θ2xi2 + … + θmxim
y x1 x2 x3
![Page 13: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/13.jpg)
Multivariate Regression
■ Multi Linear Regression
Price (in 1000$)
220180350…….500
Area(sq. ft.)
160014002100 … ….
2400
# Bathrooms # Bedrooms
2.51.53.5……4
334……5 1400
1.53
yi
xi
xi1
xi2
xi3
yi = θ0 + θ1xi1 + θ2xi2 + … + θmxim
![Page 14: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/14.jpg)
Multivariate Regression
■ Multi Linear Regression
Price (in 1000$)
220180350…….500
Area(sq. ft.)
160014002100 … ….
2400
# Bathrooms # Bedrooms
2.51.53.5……4
334……5
yi
14001.53
1
yi = θ0xi0 + θ1xi1 + θ2xi2 + … + θmxim
xi
xi1
xi2
xi3
xi0
111…….1
y x1 x2 x3x0
![Page 15: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/15.jpg)
Multivariate Regression Model
■ Model:
feature 1 = x0 …. (constant, 1)feature 2 = x1 …. (area, sq. ft.)feature 3 = x2 …. (# of bedrooms)feature 4 = x3 …. (# of bathrooms)….….feature m = xm
yi = θ0xi0 + θ1xi1 + θ2xi2 + … + θmxim
yi =m
∑j=0
θijxij
![Page 16: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/16.jpg)
One Observation Model
■ Matrix Notation For observation i
yi =m
∑j=0
θijxij
yi =θ0
. . . ….
θ2θm
θ1
yi = XTi θ
xi0 xi1 xi2 … xim
![Page 17: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/17.jpg)
All Observation Model
■ Matrix Notation For all observations
.
.
.
=.
.
.
.
.
.
.
.
.
..
..
...
.
.
..
..
..
...
.
.
..
x3m
.
.
.xnm
.θm
θ0
θ1
θ2
Y = Xθ
x10
x20
x12
x30
xn0
x11 xim
x21 x22 x2m
x31 x32
xn1 xn2
y1
y0
y2
yn
![Page 18: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/18.jpg)
LEAST SQUARES OPTIMIZATION
Rewrite inputs:
Rewrite optimization problem:
Each row is a feature vector paired with a label for a single input
n labeled inputs
m features
X =
(x(1))T
(x(2))T
⋯(x(n))T
∈ ℝn×m, y =
y(1)
y(2)
⋯y(n)
∈ ℝn
*Recall | |z | |22 = zT z = ∑
i
z2i
![Page 19: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/19.jpg)
LEAST SQUARES OPTIMIZATION
Rewrite inputs:
Rewrite optimization problem:
Each row is a feature vector paired with a label for a single input
n labeled inputs
m features
X =
(x(1))T
(x(2))T
⋯(x(n))T
∈ ℝn×m, y =
y(1)
y(2)
⋯y(n)
∈ ℝn
*Recall | |z | |22 = zT z = ∑
i
z2i
⟹ minimize n
∑i=1
(yi − yi)2 =n
∑i=1
ϵ2i
![Page 20: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/20.jpg)
ERROR FUNCTION
!20
θ
n
∑i=1
(yi − yi)2 =n
∑i=1
ϵ2i
ϵ
![Page 21: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/21.jpg)
GRADIENTSMinimizing a multivariate function involves finding a point where the gradient is zero:
Points where the gradient is zero are local minima • If the function is convex, also a global minimum Let’s solve the least squares problem! We’ll use the multivariate generalizations of some concepts from MATH141/142 … • Chain rule:
• Gradient of squared ℓ2 norm:
!21
![Page 22: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/22.jpg)
LEAST SQUARESRecall the least squares optimization problem:
What is the gradient of the optimization objective ????????
!22
Chain rule:
Gradient of norm:
![Page 23: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/23.jpg)
LEAST SQUARESRecall: points where the gradient equals zero are minima.
So where do we go from here?????????
!23
Solve for model parameters θ
![Page 24: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/24.jpg)
LINEAR REGRESSION AS OPTIMIZATION PROBLEMLet’s consider linear regression that minimizes the sum of squared error, i.e., least squares … 1. Hypothesis function: ????????
• Linear hypothesis function
2. Loss function: ???????? • Squared error loss
4. Optimization problem: ????????
!24
minθ
n
∑i=1
(θT x(i) − y(i))2
![Page 25: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/25.jpg)
GRADIENT DESCENT
We used the gradient as a condition for optimality It also gives the local direction of steepest increase for a function:
Intuitive idea: take small steps against the gradient.
!25Image from Zico Kolter
If there is no increase, gradient is zero = local minimum!
![Page 26: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/26.jpg)
GRADIENT DESCENTAlgorithm for any* hypothesis function , loss function , step size : Initialize the parameter vector: •
Repeat until satisfied (e.g., exact or approximate convergence): • Compute gradient: • Update parameters:
!26*must be reasonably well behaved
g ←n
∑i=1
∇θℓ(hθ(x(i)), y(i))
![Page 27: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/27.jpg)
GRADIENT DESCENT
!27
∂f (θ)∂θ
> 0
∂f (θ)∂θ
< 0
⋯θ⋯
θ := θ − α∂f (θ)
∂θ
![Page 28: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/28.jpg)
EXAMPLEFunction: f(x,y) = x2 + 2y2
Gradient: ??????????
Let’s take a gradient step from (-2, +1):
Step in the direction (0.04, -0.01), scaled by step size Repeat until no movement
!28
▿ f (−2,1) = [−41 ]
![Page 29: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/29.jpg)
GRADIENT DESCENTy = θ0 + θ1x
![Page 30: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/30.jpg)
GRADIENT DESCENT
!30
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
![Page 31: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/31.jpg)
GRADIENT DESCENT
!31
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
![Page 32: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/32.jpg)
GRADIENT DESCENT
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
![Page 33: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/33.jpg)
GRADIENT DESCENT
!33
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
![Page 34: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/34.jpg)
GRADIENT DESCENT
!34
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
![Page 35: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/35.jpg)
GRADIENT DESCENT
!35
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
![Page 36: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/36.jpg)
GRADIENT DESCENT
!36
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495
![Page 37: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/37.jpg)
GRADIENT DESCENT
!37
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495θ0 := θ0 − α
n
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αn
∑i=1
∂(SSE )∂θ1
![Page 38: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/38.jpg)
GRADIENT DESCENT
!38
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495θ0 := θ0 − α
n
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αn
∑i=1
∂(SSE )∂θ1
α = 0.01
![Page 39: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/39.jpg)
GRADIENT DESCENT
!39
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495θ0 := θ0 − α
n
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αn
∑i=1
∂(SSE )∂θ1
α = 0.01
θ0 = 0.1 − 0.01 × (−1.132)
![Page 40: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/40.jpg)
GRADIENT DESCENT
!40
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495θ0 := θ0 − α
n
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αn
∑i=1
∂(SSE )∂θ1
α = 0.01
θ0 = 0.1 − 0.01 × (−1.132)θ0 = 0.11132
![Page 41: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/41.jpg)
GRADIENT DESCENT
!41
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495θ0 := θ0 − α
n
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αn
∑i=1
∂(SSE )∂θ1
α = 0.01
θ0 = 0.1 − 0.01 × (−1.132)θ0 = 0.11132
θ1 = 0.1 − 0.01 × (−0.495)θ1 = 0.10495
![Page 42: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/42.jpg)
GRADIENT DESCENTAlgorithm for any* hypothesis function , loss function , step size : Initialize the parameter vector: •
Repeat until satisfied (e.g., exact or approximate convergence): • Compute gradient: • Update parameters:
!42*must be reasonably well behaved
θ0 := θ0 − αm
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αm
∑i=1
∂(SSE )∂θ1
![Page 43: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/43.jpg)
GRADIENT DESCENT - MULTIVARIATE
!43
θ0 := θ0 − αn
∑i=1
∂(SSE )∂θ0
θ1 := θ1 − αn
∑i=1
∂(SSE )∂θ1
θ2 := θ2 − αn
∑i=1
∂(SSE )∂θ2
…
θm := θn − αn
∑i=1
∂(SSE )∂θm
![Page 44: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/44.jpg)
GRADIENT DESCENT - MULTIVARIATE
!44
θ0 := θ0 − α1n
n
∑i=1
(h(θ(xi) − yi)x0i…
Repeat{
θj := θj − α1n
n
∑i=1
(h(θ(xi) − yi)xji
}(update θj for all j = 1…m simultaneously
θ = 01n
n
∑i=1
(h(θ(xi) − yi)xji =∂
∂θjf (θ)
θ1 := θ1 − α1n
n
∑i=1
(h(θ(xi) − yi)x1i
θ2 := θ2 − α1n
n
∑i=1
(h(θ(xi) − yi)x2i
θm := θm − α1n
n
∑i=1
(h(θ(xi) − yi)xmi
obtain all partial derivatives w.r.t θ 's first
![Page 45: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/45.jpg)
PLOTTING LOSS OVER TIME
!45
![Page 46: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/46.jpg)
STOCHASTIC GRADIENT DESCENT - MULTIVARIATE
!46
Repeat{
θj := θj − α(h(θ(xi) − yi)xji
}(update θj for all j = 1…n
θ = 0
i = random index between 1 and m
STOCHASTIC GRADIENT DESCENT
Repeat{
θj := θj − α1n
n
∑i=1
(h(θ(xi) − yi)xji
}(update θj for all j = 1…m simultaneously
θ = 01n
n
∑i=1
(h(θ(xi) − yi)xji =∂
∂θjf (θ)
obtain all partial derivatives w.r.t θ 's first
![Page 47: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/47.jpg)
GRADIENT DESCENT
!47
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.2 0.440.31
0.45
0.26
0.123
0.75
0.39
0.120.131
0.145
0.175
0.05120.000032
0.183
0.0231
-0.32
0.008
-0.605
-0.215
-0.064
0.00248
-0.27225
-0.16125
-1.132 -0.495
![Page 48: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/48.jpg)
GRADIENT DESCENT
!48
θ0 = 0.1 θ1 = 0.1
y = θ0 + θ1x
x y y = θ0 + θ1x12
( y − y)2
SSE ∂(SSE )∂θ0y − y
∂(SSE )∂θ1
( y − y)x
0.31 0.123 0.131 0.000032 0.008 0.00248
...…..
![Page 49: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/49.jpg)
STOCHASTIC GRADIENT DESCENT
!49
![Page 50: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/50.jpg)
STOCHASTIC GRADIENT DESCENT - MINI BATCH
!50
Repeat {
}
(update θj for all j = 1…n
θ = 0
= random index between 1 and mi1, …, ilθj := θj − α
1l
l
∑i=1
(h(θ(xi) − yi)xji
![Page 51: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/51.jpg)
Gradient Descent
![Page 52: Least Squares Optimization and Gradient Descent Algorithm · 2019. 11. 21. · SCATTER PLOT Plot all (X i, Y i) pairs, and plot your learned model !4 0 20 40 60 0 20 40 60 X Y [WF]](https://reader036.vdocuments.mx/reader036/viewer/2022062610/6124df642da9ad37a74372ef/html5/thumbnails/52.jpg)
GRADIENT DESCENT IN PURE(-ISH) PYTHON
Implicitly using squared loss and linear hypothesis function above; drop in your favorite gradient for kicks!
!52
# Training data (X, y), T time steps, alpha stepdef grad_descent(X, y, T, alpha): m, n = X.shape # m = #examples, n = #features theta = np.zeros(n) # initialize parameters f = np.zeros(T) # track loss over time
for i in range(T): # loss for current parameter vector theta f[i] = 0.5*np.linalg.norm(X.dot(theta) – y)**2 # compute steepest ascent at f(theta) g = np.transpose(X).dot(X.dot(theta) – y) # step down the gradient theta = theta – alpha*g return theta, f