cse217 introduction to data sciencem.neumann/sp2019/cse217/...cse217 introduction to data science...

CSE217 INTRODUCTION TO DATA SCIENCE

Spring 2019Marion Neumann

LECTURE 5: LOGISTIC REGRESSION

RECAP: MACHINE LEARNING

• Workflow

2

LINEAR CLASSIFICATION MODEL

3

• Recall: model = the mathematical dependencybetween variables

I

sign Has

VISUALIZING THE PARAMETERS (1D)

4

y oh13 summa

badmodel in

i wi e

in

I 0 Okdrecision boundary w x O

b

DECISION BOUNDARY

• target (sentiment) is a binary class

4

Is this new review positive or negative?

or

T2 of

as

model

decisionboundary

8th O

VISUALIZING THE DECISION BOUNDARY (3D)

6

555 0

decision boundaryis a

plane

the model isa 4dimensional

hyperplaneparameterized by

a LED

WHERE DO THE WEIGHTS COME FROM?

getting the weights = training the classifier

7

80 w t

I

look p i

TOWARDS LEARNING THE WEIGHTS

• predicted labels come from a score squashed through the sign function

• score ≈ level of confidence in prediction

8

owtCL label is certainly pos

Largebut I

I e label is most likely pos

0.5 o 0.5 no idea

futz O label is certainly negativelargebutnegative

TOWARDS LEARNING THE WEIGHTS

• This measures a probability!• More precisely it measures:

the probability that the label y of the data point with features x is positive

9

math & statistics

Key properties of a probability:• always between 0 and 1• sum to 1

ply post BI given conditional prosas 5

Eg Go 40Ply posts Ply neg

PROBABILISTIC CLASSIFIER

10

IPlyk

gift if Plyk as

if Pty IE c 0.5

HOW TO GET P(Y|X)?

• turn score into probability

11

Logis9c Regression model:

P " = +1 | (⃑ = 11 + )*(,-.⃑/0)

P " = −1 | (⃑ = 1 − P " = +1 | (⃑

Sigmoid function:

sigm 789:) = 11 + )*;<=>?

no 0 Ascore W'Itb It

i il i r new hotation

sigmoid rfunction i i WLy I

ai rv

probability o o.si Ply HII

D

VISUALIZING LOGISTIC REGRESSION (1D)

12

PIEHkjadpup.tl y5Pesnfainhp9Ibxtha'Iasl label11

i prediction

Attest input


13


14

Heatmapvisualization show

the contours (level-sets) of the logistic regression model

Ply 1 01 0.5P y 111

decision boundary

high

Ply HIEvery low

TRAINING THE MODEL = LEARNING THE WEIGHTS

• sum of squared error is not a good quality metric• use likelihood

12

Likelihood:• measures how well a specific model maps

training input to training observations • measures how likely is it that a data point (x,y)

was generated by the model

blottote o sItereeotmothfe

l 8 106 likelihood forone datapoint

llif I.TT5 yilxiiulikelihood forentiretraining dataset

choose 5 thathigherisbetter maximizes the likelihood


16

• data-point likelihoods ! ", $, % for 1D training data:O

x

Ox x

Ox x

F i E


17

• likelihood ! " for 1D training data:

Heatmapvisualization show

the contours (level-sets) of the

likelihood

10datapointsinDm

i

fedlol X

y Imax lto

b I wo tw 0.6 b Wi

xf bifido datapoints blfwt IIPly yilxi.co

toserved DTR


• For this 2D training data:

• the likelihood ! " looks like this (no bias term here):

18

to keep visualizationsimple

hoax


• Pick the weights w that maximize the likelihood!Solve: max

$%($)

• How?• no analytic solution• use iterative solution (hill climbing/gradient descent)

19

Ollisneed to compute the gradient vector

partial derivatives00

CONFUSION MATRIX

13

+1 -1

+1-1

prediction

true

labe

l

✓

✓

✘false negative

prediction

✘false positive

prediction

true positiveprediction

true negativeprediction

Can you define accuracyusing these measures?

14

• DSFS • Ch16• Ch6 (p69-71)

SUMMARY & READING• Logistic Regression is used for classification. It is a

linear model that gives us class probabilities.

• Probabilities are awesome!

• Model evaluation for classification is more than just looking at the error.

cse217 introduction to data sciencem.neumann/sp2019/cse217/...cse217 introduction to data science...

Documents