cse217 introduction to data sciencem.neumann/sp2019/cse217/...cse217 introduction to data science...
TRANSCRIPT
LINEAR CLASSIFICATION MODEL
3
• Recall: model = the mathematical dependencybetween variables
I
sign Has
VISUALIZING THE PARAMETERS (1D)
4
y oh13 summa
badmodel in
i wi e
in
I 0 Okdrecision boundary w x O
b
DECISION BOUNDARY
• target (sentiment) is a binary class
4
Is this new review positive or negative?
or
T2 of
as
model
decisionboundary
8th O
VISUALIZING THE DECISION BOUNDARY (3D)
6
555 0
decision boundaryis a
plane
the model isa 4dimensional
hyperplaneparameterized by
a LED
TOWARDS LEARNING THE WEIGHTS
• predicted labels come from a score squashed through the sign function
• score ≈ level of confidence in prediction
8
owtCL label is certainly pos
Largebut I
I e label is most likely pos
0.5 o 0.5 no idea
futz O label is certainly negativelargebutnegative
TOWARDS LEARNING THE WEIGHTS
• This measures a probability!• More precisely it measures:
the probability that the label y of the data point with features x is positive
9
math & statistics
Key properties of a probability:• always between 0 and 1• sum to 1
ply post BI given conditional prosas 5
Eg Go 40Ply posts Ply neg
HOW TO GET P(Y|X)?
• turn score into probability
11
Logis9c Regression model:
P " = +1 | (⃑ = 11 + )*(,-.⃑/0)
P " = −1 | (⃑ = 1 − P " = +1 | (⃑
Sigmoid function:
sigm 789:) = 11 + )*;<=>?
no 0 Ascore W'Itb It
i il i r new hotation
sigmoid rfunction i i WLy I
ai rv
probability o o.si Ply HII
D
VISUALIZING LOGISTIC REGRESSION (1D)
12
PIEHkjadpup.tl y5Pesnfainhp9Ibxtha'Iasl label11
i prediction
Attest input
VISUALIZING LOGISTIC REGRESSION (2D)
14
Heatmapvisualization show
the contours (level-sets) of the logistic regression model
Ply 1 01 0.5P y 111
decision boundary
high
Ply HIEvery low
TRAINING THE MODEL = LEARNING THE WEIGHTS
• sum of squared error is not a good quality metric• use likelihood
12
Likelihood:• measures how well a specific model maps
training input to training observations • measures how likely is it that a data point (x,y)
was generated by the model
blottote o sItereeotmothfe
l 8 106 likelihood forone datapoint
llif I.TT5 yilxiiulikelihood forentiretraining dataset
choose 5 thathigherisbetter maximizes the likelihood
TRAINING THE MODEL = LEARNING THE WEIGHTS
16
• data-point likelihoods ! ", $, % for 1D training data:O
x
Ox x
Ox x
F i E
TRAINING THE MODEL = LEARNING THE WEIGHTS
17
• likelihood ! " for 1D training data:
Heatmapvisualization show
the contours (level-sets) of the
likelihood
10datapointsinDm
i
fedlol X
y Imax lto
b I wo tw 0.6 b Wi
xf bifido datapoints blfwt IIPly yilxi.co
toserved DTR
TRAINING THE MODEL = LEARNING THE WEIGHTS
• For this 2D training data:
• the likelihood ! " looks like this (no bias term here):
18
to keep visualizationsimple
hoax
TRAINING THE MODEL = LEARNING THE WEIGHTS
• Pick the weights w that maximize the likelihood!Solve: max
$%($)
• How?• no analytic solution• use iterative solution (hill climbing/gradient descent)
19
Ollisneed to compute the gradient vector
partial derivatives00
CONFUSION MATRIX
13
+1 -1
+1-1
prediction
true
labe
l
✓
✓
✘false negative
prediction
✘false positive
prediction
true positiveprediction
true negativeprediction
Can you define accuracyusing these measures?