25: logistic regression...lisa yan, cs109, 2020 quick slide reference 2 3 background 25a_background...

97
25: Logistic Regression Lisa Yan June 3, 2020 1

Upload: others

Post on 23-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

25: Logistic RegressionLisa Yan

June 3, 2020

1

Page 2: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Quick slide reference

2

3 Background 25a_background

9 Logistic Regression 25b_logistic_regression

27 Training: The big picture 25c_lr_training

56 Training: The details, Testing LIVE

59 Philosophy LIVE

63 Gradient Derivation 25e_derivation

Page 3: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Background

3

25a_background

Page 4: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Weighted sum

If 𝑿 = 𝑋1, 𝑋2, … , π‘‹π‘š :

4

dot product

𝑍 = πœƒ1𝑋1 + πœƒ2𝑋2 +β‹―+ πœƒπ‘šπ‘‹π‘š

=

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘— weighted sum

= πœƒπ‘‡π‘Ώ

Page 5: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Weighted sum

Recall the linear regression model, where 𝑿 = 𝑋1, 𝑋2, … , π‘‹π‘š and π‘Œ ∈ ℝ:

𝑔 𝑿 = πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—

How would you rewrite this expression as a single dot product?

5

πŸ€”

πœƒπ‘‡π‘Ώ =

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—Dot product/

weighted sum

Page 6: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Weighted sum

Recall the linear regression model, where 𝑿 = 𝑋1, 𝑋2, … , π‘‹π‘š and π‘Œ ∈ ℝ:

𝑔 𝑿 = πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—

How would you rewrite this expression as a single dot product?

6

Define 𝑋0 = 1𝑔 𝑿 = πœƒ0𝑋0 + πœƒ1𝑋1 + πœƒ2𝑋2 +β‹―+ πœƒπ‘šπ‘‹π‘š

= πœƒπ‘‡π‘Ώ

πœƒπ‘‡π‘Ώ =

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—Dot product/

weighted sum

New 𝑿 = 1, 𝑋1, 𝑋2, … , π‘‹π‘š

Prepending 𝑋0 = 1 to each feature vector 𝑿 makes

matrix operators more accessible.

Page 7: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

2. Sigmoid function 𝜎 𝑧

β€’ The sigmoid function:

β€’ Sigmoid squashes 𝑧 toa number between 0 and 1.

β€’ Recall definition of probability:A number between 0 and 1

7

0

0.2

0.4

0.6

0.8

1

-10 -8 -6 -4 -2 0 2 4 6 8 10

𝜎 𝑧 =1

1 + π‘’βˆ’π‘§

𝜎 𝑧

𝑧

𝜎 𝑧 can represent

a probability.

Page 8: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

3. Conditional likelihood function

Training data (𝑛 datapoints):

β€’ 𝒙 𝑖 , 𝑦 𝑖 drawn i.i.d. from a distribution 𝑓 𝑿 = 𝒙 𝑖 , π‘Œ = 𝑦 𝑖 |πœƒ = 𝑓 𝒙 𝑖 , 𝑦 𝑖 |πœƒ

8

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒconditional likelihood

of training data

log conditional likelihood

β€’ MLE in this lecture is estimator that

maximizes conditional likelihood

β€’ Confusingly, log conditional

likelihood is also written as 𝐿𝐿 πœƒ

= arg maxπœƒ

𝑖=1

𝑛

log 𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ

= arg maxπœƒ

𝐿𝐿 πœƒ

Review

Page 9: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Logistic Regression

9

25b_logistic_regression

Page 10: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

βœ… 𝑿 can be dependent

πŸ€·β€β™€οΈ Regression model ( π‘Œ ∈ ℝ, not discrete)

10

Prediction models so far

Linear Regression (Regression)

πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—π‘Ώ π‘Œ

βœ… Tractable with NB assumption, but…

⚠️ Realistically, 𝑋𝑗 features not

necessarily conditionally independent

πŸ€·β€β™€οΈ Actually models 𝑃 𝑿, π‘Œ , not 𝑃 π‘Œ|𝑿 ?

𝑃 𝑿|π‘Œ 𝑃 π‘Œ

𝑿

π‘Œ

𝑃 𝑿, π‘Œ

NaΓ―ve Bayes (Classification)

Review

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ | 𝑿

= arg max𝑦= 0,1

𝑃 𝑿|π‘Œ 𝑃 π‘Œ

π‘Œ = πœƒ0 + σ𝑗=1π‘š πœƒπ‘—π‘‹π‘—

Page 11: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Introducing Logistic Regression!

11

Linear Regression ideas Classification models

+ compute power

Page 12: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression

Logistic RegressionModel:

Predict π‘Œ as the most likely π‘Œgiven our observation 𝑿 = 𝒙:

β€’ Since π‘Œ ∈ 0,1 , 𝑃 π‘Œ = 0|𝑿 = 𝒙 = 1 βˆ’ 𝜎 πœƒ0 + σ𝑗=1π‘š πœƒπ‘—π‘₯𝑗

β€’ Sigmoid function also known as β€œlogit” function

12

πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—π‘Ώ 𝑃 π‘Œ = 1|𝑿

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ | 𝑿

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

sigmoid function

𝜎 𝑧 =1

1 + π‘’βˆ’π‘§π‘

Page 13: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression

13

0.81𝒙 = [0,1,1]

𝑃 π‘Œ = 1|𝑿 = 𝒙conditional likelihood

𝑿input features

πœƒ parameter

Slides courtesy of Chris Piech

𝑃 π‘Œ = 1|𝑿 = π‘₯ = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

Page 14: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression cartoon

14

πœƒ parameter

Slides courtesy of Chris Piech

Page 15: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression cartoon

15Slides courtesy of Chris Piech

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

+

𝑧 𝜎 𝑧

𝑃 π‘Œ = 1|𝒙

Page 16: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression cartoon

16Slides courtesy of Chris Piech

𝑿, input features

0,1,1

π‘Œ, output

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

+

𝑧 𝜎 𝑧

𝑃 π‘Œ = 1|𝒙

Page 17: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Components of Logistic Regression

17Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑧 𝜎 𝑧

πœƒ weights

(aka parameters)

𝑃 π‘Œ = 1|𝒙

Page 18: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Components of Logistic Regression

18Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑧 𝜎 𝑧

weighted sum

𝑃 π‘Œ = 1|𝒙

Page 19: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Components of Logistic Regression

19Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑧 𝜎 𝑧

squashing function

b/t 0 and 1

𝑃 π‘Œ = 1|𝒙

Page 20: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Components of Logistic Regression

20Slides courtesy of Chris Piech

+

𝑧 𝜎 𝑧

𝑃 π‘Œ = 1|𝒙

prediction

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

Page 21: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Different predictions for different inputs

21Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑃 π‘Œ = 1|𝒙

𝑿, input features

0,1,1

Page 22: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Different predictions for different inputs

22Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑃 π‘Œ = 1|𝒙

𝑿, input features

0,0,1

Page 23: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Parameters affect prediction

23Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑃 π‘Œ = 1|𝒙

Page 24: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Parameters affect prediction

24Slides courtesy of Chris Piech

+

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑃 π‘Œ = 1|𝒙

Page 25: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

For simplicity

25

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘₯𝑗

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎

𝑗=0

π‘š

πœƒπ‘—π‘₯𝑗 = 𝜎 πœƒπ‘‡π’™ where π‘₯0 = 1

Page 26: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic regression classifier

2626

Testing π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑿

Training πœƒ = πœƒ0, πœƒ1, πœƒ2, … , πœƒπ‘š

Given an observation 𝑿 = 𝑋1, 𝑋2, … , π‘‹π‘š , predict

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0π‘š πœƒπ‘—π‘₯𝑗 = 𝜎 πœƒπ‘‡π’™

Estimate parameters

from training data

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑿

Page 27: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Training:The big picture

27

25c_lr_training

Page 28: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic regression classifier

2828

Testing π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑋

Training πœƒ = πœƒ0, πœƒ1, πœƒ2, … , πœƒπ‘š

Given an observation 𝑿 = 𝑋1, 𝑋2, … , π‘‹π‘š , predict

Estimate parameters

from training data

Choose πœƒ that optimizes some objective:1. Determine objective function

2. Find gradient with respect to πœƒ3. Solve analytically by setting to 0, or

computationally with gradient ascent

We are modeling 𝑃 π‘Œ|𝑋directly, so we maximize the

conditional likelihood of

training data.

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0π‘š πœƒπ‘—π‘₯𝑗 = 𝜎 πœƒπ‘‡π’™

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑿

Page 29: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objectivefunction

2. Gradient w.r.t. πœƒπ‘—, for 𝑗 = 0, 1, … ,π‘š

3. Solve

β€’ No analytical derivation of πœƒπ‘€πΏπΈβ€¦

β€’ …but can still compute πœƒπ‘€πΏπΈwith gradient ascent!

Estimating πœƒ

29

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ

initialize xrepeat many times:compute gradient

x += Ξ· * gradient

Page 30: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objective function

30

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒπ‘ƒ π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0

π‘š πœƒπ‘—π‘₯𝑗= 𝜎 πœƒπ‘‡π’™

First: Interpret

conditional likelihood

with Logistic Regression

Second: Write a differentiable

expression for log conditional

likelihood

Page 31: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objective function (interpret)

Suppose you have 𝑛 = 2 training datapoints: 𝒙 1 , 1 , 𝒙 2 , 0

Consider the following expressions for a given πœƒ:

31

πŸ€”

A. 𝜎 πœƒπ‘‡π’™ 1 𝜎 πœƒπ‘‡π’™ 2

B. 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 1 𝜎 πœƒπ‘‡π’™ 2

1. Interpret the above expressions as probabilities.

2. If we let πœƒ = πœƒπ‘€πΏπΈ, which probability should be highest?

C. 𝜎 πœƒπ‘‡π’™ 1 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 2

D. 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 1 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 2

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒπ‘ƒ π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0

π‘š πœƒπ‘—π‘₯𝑗= 𝜎 πœƒπ‘‡π’™

Page 32: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objective function (interpret)

Suppose you have 𝑛 = 2 training datapoints: 𝒙 1 , 1 , 𝒙 2 , 0

Consider the following expressions for a given πœƒ:

32

A. 𝜎 πœƒπ‘‡π’™ 1 𝜎 πœƒπ‘‡π’™ 2

B. 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 1 𝜎 πœƒπ‘‡π’™ 2

1. Interpret the above expressions as probabilities.

2. If we let πœƒ = πœƒπ‘€πΏπΈ, which probability should be highest?

C. 𝜎 πœƒπ‘‡π’™ 1 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 2

D. 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 1 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 2

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 π’Š , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒπ‘ƒ π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0

π‘š πœƒπ‘—π‘₯𝑗= 𝜎 πœƒπ‘‡π’™

Page 33: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objective function (write)

33

1. What is a differentiableexpression for 𝑃 π‘Œ = 𝑦| 𝑿 = 𝒙 ?

2. What is a differentiable expressionfor 𝐿𝐿 πœƒ , log conditional likelihood?

𝑃 π‘Œ = 𝑦|𝑿 = 𝒙 = ࡝𝜎 πœƒπ‘‡π’™ if 𝑦 = 1

1 βˆ’ 𝜎 πœƒπ‘‡π’™ if 𝑦 = 0

𝐿𝐿 πœƒ = logΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ

πŸ€”

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒπ‘ƒ π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0

π‘š πœƒπ‘—π‘₯𝑗= 𝜎 πœƒπ‘‡π’™

Page 34: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objective function (write)

34

1. What is a differentiableexpression for 𝑃 π‘Œ = 𝑦| 𝑿 = 𝒙 ?

2. What is a differentiable expressionfor 𝐿𝐿 πœƒ , log conditional likelihood?

𝑃 π‘Œ = 𝑦|𝑿 = 𝒙 = ࡝𝜎 πœƒπ‘‡π’™ if 𝑦 = 1

1 βˆ’ 𝜎 πœƒπ‘‡π’™ if 𝑦 = 0

𝐿𝐿 πœƒ = logΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ

Recall

Bernoulli MLE!

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒπ‘ƒ π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0

π‘š πœƒπ‘—π‘₯𝑗= 𝜎 πœƒπ‘‡π’™

Page 35: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Determine objective function (write)

35

1. What is a differentiableexpression for 𝑃 π‘Œ = 𝑦| 𝑿 = 𝒙 ?

2. What is a differentiable expressionfor 𝐿𝐿 πœƒ , log conditional likelihood?

𝑃 π‘Œ = 𝑦|𝑿 = 𝒙 = 𝜎 πœƒπ‘‡π’™π‘¦1 βˆ’ 𝜎 πœƒπ‘‡π’™

1βˆ’π‘¦

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒπ‘ƒ π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0

π‘š πœƒπ‘—π‘₯𝑗= 𝜎 πœƒπ‘‡π’™

Page 36: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

2. Find gradient with respect to πœƒ

Optimizationproblem:

Gradient w.r.t. πœƒπ‘—, for 𝑗 = 0, 1, … ,π‘š:

36

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖) π‘₯𝑗(𝑖)

(derived later)

How do we interpret the gradient

contribution of the i-th training datapoint? πŸ€”

Page 37: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

2. Find gradient with respect to πœƒ

Optimizationproblem:

Gradient w.r.t. πœƒπ‘—, for 𝑗 = 0, 1, … ,π‘š:

37

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖) π‘₯𝑗(𝑖)

(derived later)

scale by j-th feature

Page 38: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

𝑃 π‘Œ = 1|𝑿 = 𝒙 𝑖

2. Find gradient with respect to πœƒ

Optimizationproblem:

Gradient w.r.t. πœƒπ‘—, for 𝑗 = 0, 1, … ,π‘š:

38

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖) π‘₯𝑗(𝑖)

(derived later)

1 or 0

Page 39: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Suppose 𝑦(𝑖) = 1 (the true class label for 𝑖-th datapoint):

β€’ If 𝜎 πœƒπ‘‡π’™ π’Š β‰₯ 0.5, correct

β€’ If 𝜎 πœƒπ‘‡π’™ π’Š < 0.5, incorrect β†’ change πœƒπ‘— more

2. Find gradient with respect to πœƒ

Optimizationproblem:

Gradient w.r.t. πœƒπ‘—, for 𝑗 = 0, 1, … ,π‘š:

39

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖) π‘₯𝑗(𝑖)

(derived later)

Page 40: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

1. Optimizationproblem:

2. Gradient w.r.t. πœƒπ‘—, for 𝑗 = 0, 1, … ,π‘š:

3. Solve

3. Solve

40

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖) π‘₯𝑗(𝑖)

πœƒπ‘€πΏπΈ = arg maxπœƒ

ΰ·‘

𝑖=1

𝑛

𝑓 𝑦 𝑖 | 𝒙 𝑖 , πœƒ = arg maxπœƒ

𝐿𝐿 πœƒ

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

Stay tuned!

Page 41: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

(live)25: Logistic RegressionSlides by Lisa Yan

August 12, 2020

41

Page 42: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression Model

4242

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎

𝑗=0

π‘š

πœƒπ‘—π‘₯𝑗 = 𝜎 πœƒπ‘‡π’™

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑿

Review

π‘Œ is prediction of π‘Œ

πœƒ0 +

𝑗=1

π‘š

πœƒπ‘—π‘‹π‘—π‘Ώ 𝑃 π‘Œ = 1|𝑿

sigmoid function

where π‘₯0 = 1

𝜎 𝑧 =1

1 + π‘’βˆ’π‘§

Page 43: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

For the β€œcorrect” parameters πœƒ:

β€’ 𝒙, 1 should have πœƒπ‘‡π‘₯ > 0β€’ 𝒙, 0 should have πœƒπ‘‡π‘₯ ≀ 0

Another view of Logistic Regression

Logistic

Regression

Model

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒπ‘‡π’™ πœƒπ‘‡π’™ =

𝑗=0

π‘š

πœƒπ‘—π‘₯𝑗where

𝑧 = πœƒπ‘‡π’™

𝑧, 1

𝑧, 0

0

Page 44: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Learning parameters

44

Training

Learn parameters πœƒ = πœƒ0, πœƒ1, … , πœƒπ‘š

Review

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(𝑖) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)

πœƒπ‘€πΏπΈ = arg maxπœƒ

𝐿𝐿 πœƒ

β€’ No analytical derivation of πœƒπ‘€πΏπΈβ€¦

β€’ …but can still compute πœƒπ‘€πΏπΈ with gradient ascent!

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖) π‘₯𝑗(𝑖)

for 𝑗 = 0, 1, … ,π‘š

Page 45: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Gradient Ascent

Walk uphill and you will find a local maxima(if your step is small enough).

45

πΏπœƒ

πœƒ1 πœƒ2Logistic regression 𝐿𝐿 πœƒ

is concave

Review

Page 46: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Training:The details

46

LIVE

Page 47: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Training: Gradient ascent step

3. Optimize.

47

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™(π’Š) π‘₯𝑗(𝑖)

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…πœ•πΏπΏ πœƒold

πœ•πœƒπ‘—old

for all thetas:

= πœƒπ‘—old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)

What does this look like in code?

repeat many times:

Page 48: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Training: Gradient Ascent

48

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

// compute all gradient[j]’s// based on n training examples

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Gradient

Ascent Step

Page 49: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Training: Gradient Ascent

49

// update gradient[j] for// current (x,y) example

for each 0 ≀ j ≀ m:

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Page 50: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Training: Gradient Ascent

50

for each 0 ≀ j ≀ m:

gradient[j] +=

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

𝑦 βˆ’1

1 + π‘’βˆ’πœƒπ‘‡π’™

π‘₯𝑗What are the important details?

Page 51: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Training: Gradient Ascent

51

for each 0 ≀ j ≀ m:

gradient[j] +=

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

𝑦 βˆ’1

1 + π‘’βˆ’πœƒπ‘‡π’™

π‘₯𝑗

β€’ π‘₯𝑗 is 𝑗-th feature of

input 𝒙 = π‘₯1, … , π‘₯π‘š

Page 52: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Training: Gradient Ascent

52

for each 0 ≀ j ≀ m:

gradient[j] +=

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

𝑦 βˆ’1

1 + π‘’βˆ’πœƒπ‘‡π’™

π‘₯𝑗

β€’ π‘₯𝑗 is 𝑗-th feature of

input 𝒙 = π‘₯1, … , π‘₯π‘šβ€’ Insert π‘₯0 = 1 before

training

Page 53: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Training: Gradient Ascent

53

for each 0 ≀ j ≀ m:

gradient[j] +=

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

𝑦 βˆ’1

1 + π‘’βˆ’πœƒπ‘‡π’™

π‘₯𝑗

β€’ π‘₯𝑗 is 𝑗-th feature of

input 𝒙 = π‘₯1, … , π‘₯π‘šβ€’ Insert π‘₯0 = 1 before

training

β€’ Finish computing

gradient before

updating any part of πœƒ

Page 54: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Training: Gradient Ascent

54

for each 0 ≀ j ≀ m:

gradient[j] +=

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

𝑦 βˆ’1

1 + π‘’βˆ’πœƒπ‘‡π’™

π‘₯𝑗

β€’ π‘₯𝑗 is 𝑗-th feature of

input 𝒙 = π‘₯1, … , π‘₯π‘šβ€’ Insert π‘₯0 = 1 before

training

β€’ Finish computing

gradient before

updating any part of πœƒβ€’ Learning rate πœ‚ is a

constant you set

before training

Page 55: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

initialize πœƒπ‘— = 0 for 0 ≀ j ≀ m

repeat many times:

gradient[j] = 0 for 0 ≀ j ≀ m

for each training example (x, y):

πœƒπ‘— += Ξ· * gradient[j] for all 0 ≀ j ≀ m

Training: Gradient Ascent

55

for each 0 ≀ j ≀ m:

gradient[j] +=

πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)Gradient

Ascent Step

𝑦 βˆ’1

1 + π‘’βˆ’πœƒπ‘‡π’™

π‘₯𝑗

β€’ π‘₯𝑗 is 𝑗-th feature of

input 𝒙 = π‘₯1, … , π‘₯π‘šβ€’ Insert π‘₯0 = 1 before

training

β€’ Finish computing

gradient before

updating any part of πœƒβ€’ Learning rate πœ‚ is a

constant you set

before training

Page 56: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Testing

56

LIVE

Page 57: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Introducing notation ΰ·œπ‘¦

5757

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0π‘š πœƒπ‘—π‘₯𝑗 = 𝜎 πœƒπ‘‡π’™

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑿

ΰ·œπ‘¦ = 𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒπ‘‡π’™

𝑃 π‘Œ = 𝑦|𝑿 = 𝒙 = α‰Šΰ·œπ‘¦ if 𝑦 = 11 βˆ’ ΰ·œπ‘¦ if 𝑦 = 0

Small ΰ·œπ‘¦ is

conditional probability

π‘Œ is prediction of π‘Œ

Page 58: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Testing: Classification with Logistic Regression

58

Testing

β€’ Compute ΰ·œπ‘¦ = 𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒπ‘‡π’™ =

β€’ Classify instance as:

α‰Š1 ΰ·œπ‘¦ > 0.5, equivalently πœƒπ‘‡π’™ > 00 otherwise

Training

Learn parameters πœƒ = πœƒ0, πœƒ1, … , πœƒπ‘š

via gradient

ascent: πœƒπ‘—new = πœƒπ‘—

old + πœ‚ β‹…

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒold𝑇𝒙(π’Š) π‘₯𝑗

(𝑖)

1

1 + π‘’βˆ’πœƒπ‘‡π’™

Parameters πœƒπ‘— are not updated during testing phase⚠️

Page 59: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Interlude for jokes/announcements

59

Page 60: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Announcements

1. Pset 6 due tomorrow at 1pm. No late days or on-time bonus for this pset.

2. Look out for extra office hours + review session for the Final Quiz

3. Final Quiz begins Friday 5pm and ends Sunday 5pm.

4. You’re so close, you got this!

60

Page 61: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Ethics and datasets

Sometimes machine learning feels universally unbiased.

We can even prove our estimators are β€œunbiased” (mathematically).

Google/Nikon/HP had biased datasets.

61

Page 62: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Should your data be unbiased?

62

Dataset: Google News

Bolukbasi et al., Man is to Computer Programmer as Woman is to

Homemaker? Debiasing Word Embeddings. NIPS 2016

Should our unbiased data collection reflect society’s systemic bias?

Page 63: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

How can we explain decisions?

63

If your task is image classification,reasoning about high-level features is relatively easy.

Everything can be visualized.

What if you are trying to classify social outcomes?

β€’ Criminal recidivism

β€’ Job performance

β€’ Policing

β€’ Terrorist risk

β€’ At-risk kids

Page 64: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Ethics in Machine Learning is a whole new field. πŸ™‚

64

Page 65: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Philosophy

65

LIVE

Page 66: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Logistic Regression is trying to fita line that separates data instanceswhere 𝑦 = 1 from those where 𝑦 = 0:

β€’ We call such data (or functionsgenerating the data linearly separable.

β€’ NaΓ―ve Bayes is linear too, because there is no interaction between different features.

Intuition about Logistic Regression

66

πœƒπ‘‡π’™ = 0

Logistic

Regression

Model

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒπ‘‡π’™ πœƒπ‘‡π’™ =

𝑗=0

π‘š

πœƒπ‘—π‘₯𝑗where

Page 67: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Data is often not linearly separable

β€’ Not possible to draw a line that successfully separates all the 𝑦 = 1 points (green) from the 𝑦 = 0 points (red)

β€’ Despite this fact, Logistic Regression and Naive Bayes still often work well in practice

67

Page 68: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Many tradeoffs in choosing an algorithm

68

NaΓ―ve Bayes Logistic Regression

Modeling goal 𝑃 𝑿, π‘Œ 𝑃 π‘Œ|𝑿

Generative: could use joint

distribution to generate new

points (⚠️but you might not

need this extra effort)

Generative or

discriminative?

Discriminative: just tries to

discriminate 𝑦 = 0 vs 𝑦 = 1(❌ cannot generate new points

b/c no 𝑃 𝑿, π‘Œ )

Continuous input

featuresβœ… Yes, easily

⚠️ Needs parametric form

(e.g., Gaussian) or

discretized buckets (for

multinomial features)

Discrete input

features

βœ… Yes, multi-value discrete

data = multinomial 𝑃 𝑋𝑖|π‘Œ

⚠️ Multi-valued discrete data

hard (e.g., if 𝑋𝑖 ∈ {𝐴, 𝐡, 𝐢}, not

necessarily good to encode as

1, 2, 3

Page 69: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Gradient Derivation

69

25e_derivation

Page 70: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Background: Calculus

70

πœ•π‘“ π‘₯

πœ•π‘₯=πœ•π‘“ 𝑧

πœ•π‘§

πœ•π‘§

πœ•π‘₯Calculus Chain Rule

aka decomposition

of composed functions𝑓 π‘₯ = 𝑓 𝑧 π‘₯

Calculus refresher #1:

Derivative(sum) =

sum(derivative)

Calculus refresher #2:

Chain rule 🌟🌟🌟

πœ•

πœ•π‘₯

𝑖=1

𝑛

𝑓𝑖 π‘₯ =

𝑖=1

π‘›πœ•π‘“π‘– π‘₯

πœ•π‘₯

Page 71: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Are you ready?

71

Right now!!!

Page 72: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Compute gradient of log conditional likelihood

72

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(π’Š) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)log conditional

likelihood

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—whereFind:

Page 73: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

What is πœ•

πœ•πœƒπ‘—πœŽ πœƒπ‘‡π’™ ?

A. 𝜎 π‘₯𝑗 1 βˆ’ 𝜎 π‘₯𝑗 π‘₯𝑗

B. 𝜎 πœƒπ‘‡π’™ 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 𝒙

C. 𝜎 πœƒπ‘‡π’™ 1 βˆ’ 𝜎 πœƒπ‘‡π’™ π‘₯𝑗

D. 𝜎 πœƒπ‘‡π’™ π‘₯𝑗 1 βˆ’ 𝜎 πœƒπ‘‡π’™ π‘₯𝑗

E. None/other

Aside: Sigmoid has a beautiful derivative

73

πŸ€”

Derivative:Sigmoid function:

𝜎 𝑧 =1

1 + π‘’βˆ’π‘§π‘‘

π‘‘π‘§πœŽ 𝑧 = 𝜎 𝑧 1 βˆ’ 𝜎 𝑧

Page 74: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Aside: Sigmoid has a beautiful derivative

What is πœ•

πœ•πœƒπ‘—πœŽ πœƒπ‘‡π’™ ?

A. 𝜎 π‘₯𝑗 1 βˆ’ 𝜎 π‘₯𝑗 π‘₯𝑗

B. 𝜎 πœƒπ‘‡π’™ 1 βˆ’ 𝜎 πœƒπ‘‡π’™ 𝒙

C. 𝜎 πœƒπ‘‡π’™ 1 βˆ’ 𝜎 πœƒπ‘‡π’™ π‘₯𝑗

D. 𝜎 πœƒπ‘‡π’™ π‘₯𝑗 1 βˆ’ 𝜎 πœƒπ‘‡π’™ π‘₯𝑗

E. None/other74

Derivative:Sigmoid function:

𝜎 𝑧 =1

1 + π‘’βˆ’π‘§π‘‘

π‘‘π‘§πœŽ 𝑧 = 𝜎 𝑧 1 βˆ’ 𝜎 𝑧

Let 𝑧 = πœƒπ‘‡π’™

πœ•

πœ•πœƒπ‘—πœŽ πœƒπ‘‡π’™ =

πœ•

πœ•π‘§πœŽ 𝑧 β‹…

πœ•π‘§

πœ•πœƒπ‘—(Chain Rule)

=

π‘˜=0

π‘š

πœƒπ‘˜π‘₯π‘˜ .

= 𝜎 πœƒπ‘‡π’™ 1 βˆ’ 𝜎 πœƒπ‘‡π’™ π‘₯𝑗

Page 75: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Re-itroducing notation ΰ·œπ‘¦

7575

𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 σ𝑗=0π‘š πœƒπ‘—π‘₯𝑗 = 𝜎 πœƒπ‘‡π’™

π‘Œ = arg max𝑦= 0,1

𝑃 π‘Œ|𝑿

ΰ·œπ‘¦ = 𝑃 π‘Œ = 1|𝑿 = 𝒙 = 𝜎 πœƒπ‘‡π’™

𝑃 π‘Œ = 𝑦|𝑿 = 𝒙 = α‰Šΰ·œπ‘¦ if 𝑦 = 11 βˆ’ ΰ·œπ‘¦ if 𝑦 = 0

𝑃 π‘Œ = 𝑦|𝑿 = 𝒙 = ΰ·œπ‘¦ 𝑦 1 βˆ’ ΰ·œπ‘¦ 1βˆ’π‘¦

Page 76: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Compute gradient of log conditional likelihood

76

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log 𝜎 πœƒπ‘‡π’™(π’Š) + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ 𝜎 πœƒπ‘‡π’™(𝑖)log conditional

likelihood

πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—whereFind:

𝐿𝐿 πœƒ =

𝑖=1

𝑛

𝑦(𝑖) log ΰ·œπ‘¦ 𝑖 + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ ΰ·œπ‘¦ 𝑖

Page 77: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Compute gradient of log conditional likelihood

77

Let ΰ·œπ‘¦ 𝑖 = 𝜎 πœƒπ‘‡π’™(π’Š)πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

π‘›πœ•

πœ•πœƒπ‘—π‘¦(𝑖) log ΰ·œπ‘¦ 𝑖 + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ ΰ·œπ‘¦ 𝑖

=

𝑖=1

π‘›πœ•

πœ• ΰ·œπ‘¦ 𝑖𝑦(𝑖) log ΰ·œπ‘¦ 𝑖 + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ ΰ·œπ‘¦ 𝑖 β‹…

πœ• ΰ·œπ‘¦ 𝑖

πœ•πœƒπ‘—(Chain Rule)

=

𝑖=1

𝑛

𝑦(𝑖)1

ΰ·œπ‘¦ π‘–βˆ’ 1 βˆ’ 𝑦(𝑖)

1

1 βˆ’ ΰ·œπ‘¦ 𝑖⋅ ΰ·œπ‘¦ 𝑖 1 βˆ’ ΰ·œπ‘¦ 𝑖 π‘₯𝑗

𝑖 (calculus)

=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ ΰ·œπ‘¦ 𝑖 π‘₯𝑗(𝑖)

=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™ 𝑖 π‘₯𝑗(𝑖) (simplify)

Page 78: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Compute gradient of log conditional likelihood

78

Let ΰ·œπ‘¦ 𝑖 = 𝜎 πœƒπ‘‡π’™(π’Š)πœ•πΏπΏ πœƒ

πœ•πœƒπ‘—=

𝑖=1

π‘›πœ•

πœ•πœƒπ‘—π‘¦(𝑖) log ΰ·œπ‘¦ 𝑖 + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ ΰ·œπ‘¦ 𝑖

=

𝑖=1

π‘›πœ•

πœ• ΰ·œπ‘¦ 𝑖𝑦(𝑖) log ΰ·œπ‘¦ 𝑖 + 1 βˆ’ 𝑦(𝑖) log 1 βˆ’ ΰ·œπ‘¦ 𝑖 β‹…

πœ• ΰ·œπ‘¦ 𝑖

πœ•πœƒπ‘—

=

𝑖=1

𝑛

𝑦(𝑖)1

ΰ·œπ‘¦ π‘–βˆ’ 1 βˆ’ 𝑦(𝑖)

1

1 βˆ’ ΰ·œπ‘¦ 𝑖⋅ ΰ·œπ‘¦ 𝑖 1 βˆ’ ΰ·œπ‘¦ 𝑖 π‘₯𝑗

𝑖

=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ ΰ·œπ‘¦ 𝑖 π‘₯𝑗(𝑖)

=

𝑖=1

𝑛

𝑦(𝑖) βˆ’ 𝜎 πœƒπ‘‡π’™ 𝑖 π‘₯𝑗(𝑖)

πŸŽ‰

(Chain Rule)

(calculus)

(simplify)

Page 79: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Wow. We did it!

79

Page 80: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

CS109 Wrap-

80

LIVE

Page 81: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

What have we learned in CS109?

81

Page 82: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

A wild journey

82

Computer science Probability

Page 83: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

From combinatorics to probability…

83

𝑃 𝐸 + 𝑃 𝐸𝐢 = 1

Page 84: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

…to random variables and the Central Limit Theorem…

84

Bernoulli

Poisson

Gaussian

Page 85: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

…to statistics, parameter estimation, and machine learning

85

A happy

Bhutanese person

𝐿𝐿 πœƒFlu

Under-

grad

TiredFever

and Learn

Page 86: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Lots and lots of analysis

86

Climate

sensitivity

Bloom filters

Peer

grading

Biometric keystroke

recognition

Web MD

inference

Coursera A/B testing

Page 87: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Lots and lots of analysis

87

Heart

Ancestry Netflix

Page 88: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

After CS109

88

Statistics

Stats 200 – Statistical Inference

Stats 208 – Intro to the Bootstrap

Stats 209 – Group Methods/Causal Inference

Theory

CS161 – Algorithmic analysis

CS168 - ~Modern~ Algorithmic Analysis

Stats 217 – Stochastic Processes

CS238 – Decision Making Under Uncertainty

CS228 – Probabilistic Graphical Models

Page 89: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

After CS109

89

AI

CS 221 – Intro to AI

CS 229 – Machine Learning

CS 230 – Deep Learning

CS 224N – Natural Language Processing

CS 231N – Conv Neural Nets for Visual Recognition

CS 234 – Reinforcement Learning

Applications

CS 279 – Bio Computation

Literally any class with numbers in it

Page 90: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

What do you want to remember in 5 years?

90

Page 91: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Why study probability + CS?

91

Page 92: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Why study probability + CS?

92

Source: US Bureau of Labor Statistics

Page 93: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Why study probability + CS?

93

Interdisciplinary Closest thing to magic

Page 94: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Lisa Yan, CS109, 2020

Why study probability + CS?

94

Everyone is welcome!

Page 95: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

Technology magnifies.

What do we want magnified?

95

Page 96: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

You are all one step closer to improving the world.

(all of you!)

96

Page 97: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training

The end

97