06 binary classification - tufts cs · 2019-02-13 · objectives: classifier overview • 3 steps...

51
Binary Classification 1 Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Many slides attributable to: Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) Prof. Mike Hughes

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Binary Classification

1

Tufts COMP 135: Introduction to Machine Learninghttps://www.cs.tufts.edu/comp/135/2019s/

Many slides attributable to:Erik Sudderth (UCI)Finale Doshi-Velez (Harvard)James, Witten, Hastie, Tibshirani (ISL/ESL books)

Prof. Mike Hughes

Page 2: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Logistics• Waitlist: We have some room, contact me

• HW2 due TONIGHT (Wed 2/6 at 11:59pm)• What you submit: PDF and zip• Please annotate pages in Gradescope!

• HW3 out later tonight, due a week from today• What you submit: PDF and zip• Please annotate pages in Gradescope!

• Next recitation is Mon 2/11• Practical binary classifiers in Python with sklearn• Numerical issues and how to address them

2Mike Hughes - Tufts COMP 135 - Spring 2019

Page 3: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Objectives: Classifier Overview• 3 steps of a classification task

• Prediction• Making hard binary decisions• Predicting class probabilities

• Training• Evaluation

• Performance Metrics

• A “taste” of 3 Methods• Logistic Regression• K-Nearest Neighbors• Decision Tree Regression

3Mike Hughes - Tufts COMP 135 - Spring 2019

Page 4: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

What will we learn?

4Mike Hughes - Tufts COMP 135 - Spring 2019

SupervisedLearning

Unsupervised Learning

Reinforcement Learning

Data, Label PairsPerformance

measureTask

data x

labely

{xn, yn}Nn=1

Training

Prediction

Evaluation

Page 5: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

5Mike Hughes - Tufts COMP 135 - Spring 2019

Before: RegressionSupervisedLearning

Unsupervised Learning

Reinforcement Learning

regression

x

y

y is a numeric variable e.g. sales in $$

Page 6: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

6Mike Hughes - Tufts COMP 135 - Spring 2019

y

x2

x1

is a binary variable (red or blue)

SupervisedLearning

binary classification

Unsupervised Learning

Reinforcement Learning

Task: Binary Classification

Page 7: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example: Hotdog or Not

7Mike Hughes - Tufts COMP 135 - Spring 2019

Page 8: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

8Mike Hughes - Tufts COMP 135 - Spring 2019

y

x2

x1

is a discrete variable (red or blue or green or purple)

SupervisedLearning

multi-class classification

Unsupervised Learning

Reinforcement Learning

Task: Multi-class Classification

Page 9: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

9Mike Hughes - Tufts COMP 135 - Spring 2019

Classification Example: Swype

Many possible letters: Multi-class classification

Page 10: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Binary Prediction StepGoal: Predict label (0 or 1) given features x

• Input:

• Output:

10Mike Hughes - Tufts COMP 135 - Spring 2019

xi , [xi1, xi2, . . . xif . . . xiF ]Entries can be real-valued, or other numeric types (e.g. integer, binary)

Binary label (0 or 1)

“features”“covariates”“predictors”“attributes”

“responses”“labels”

yi 2 {0, 1}

Page 11: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Binary Prediction Step

11Mike Hughes - Tufts COMP 135 - Spring 2019

>>> # Given: pretrained regression object model>>> # Given: 2D array of features x

>>> x_NF.shape(N, F)

>>> yhat_N = model.predict(x_NF)>>> yhat_N[:5] # peek at predictions[0, 0, 1, 0, 1]

>>> yhat_N.shape(N,)

Page 12: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Types of binary predictions

12

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Page 13: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

13

Which outcome is this?

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Page 14: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

14

Which outcome is this?

Answer:True Positive

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Page 15: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

15

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Which outcome is this?

Page 16: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

16

Which outcome is this?

Answer:True Negative (TN)

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Page 17: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

17

Which outcome is this?

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Page 18: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

18

Which outcome is this?

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Answer:False Negative (FN)

Page 19: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

19

Which outcome is this?

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Page 20: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Example:

20

Which outcome is this?

FP : false positiveTP : true positive

TN : true negativeFN : false negative

Answer:False Positive (FP)

Page 21: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Probability Prediction StepGoal: Predict probability p(Y=1) given features x

• Input:

• Output:

21Mike Hughes - Tufts COMP 135 - Spring 2019

xi , [xi1, xi2, . . . xif . . . xiF ]Entries can be real-valued, or other numeric types (e.g. integer, binary)

Probability between 0 and 1e.g. 0.001, 0.513, 0.987

“features”“covariates”“predictors”“attributes”

“probabilities”pi

Page 22: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Probability Prediction Step

22Mike Hughes - Tufts COMP 135 - Spring 2019

>>> # Given: pretrained regression object model>>> # Given: 2D array of features x

>>> x_NF.shape(N, F)

>>> yproba_N2 = model.predict_proba(x_NF)>>> yproba_N2.shape(N, 2)

>>> yproba_N2[:, 1][0.003, 0.358, 0.987, 0.111, 0.656]

Column index 1 gives probability of positive label

p(Y = 1)

Page 23: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

23

Credit: Wattenberg, Viégas, Hardt

Thresholding to get Binary Decisions

23Mike Hughes - Tufts COMP 135 - Spring 2019

Page 24: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

24

Credit: Wattenberg, Viégas, Hardt

Thresholding to get Binary Decisions

Mike Hughes - Tufts COMP 135 - Spring 2019

Page 25: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

25

Credit: Wattenberg, Viégas, Hardt

Thresholding to get Binary Decisions

Mike Hughes - Tufts COMP 135 - Spring 2019

Page 26: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Pair ExerciseInteractive Demo:https://research.google.com/bigpicture/attacking-discrimination-in-ml/

Loan and pay back: +$300Loan and not pay back: -$700

Goals:• What threshold maximizes accuracy?• What threshold maximizes profit?• What needs to be true of costs so threshold is the same

for profit and accuracy?

26Mike Hughes - Tufts COMP 135 - Spring 2019

Page 27: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Classifier: Training StepGoal: Given a labeled dataset, learn a function that can perform prediction well

• Input: Pairs of features and labels/responses

• Output:

27Mike Hughes - Tufts COMP 135 - Spring 2019

{xn, yn}Nn=1

y(·) : RF ! {0, 1}Useful to break into two steps:1) Produce probabilities in [0, 1] OR real-valued scores2) Threshold to make binary decisions

Page 28: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

28Mike Hughes - Tufts COMP 135 - Spring 2019

>>> # Given: 2D array of features x >>> # Given: 1D array of binary labels y

>>> y_N.shape(N,)>>> x_NF.shape(N, F)

>>> model = BinaryClassifier()>>> model.fit(x_NF, y_N)>>> # Now can call predict or predict_proba

Classifier: Training Step

Page 29: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Classifier: Evaluation StepGoal: Assess quality of predictions

29Mike Hughes - Tufts COMP 135 - Spring 2019

Many ways in practice:1) Evaluate probabilities / scores directly

logistic loss, hinge loss, …

2) Evaluate binary decisions at specific thresholdaccuracy, TPR, TNR, PPV, NPV, etc.

3) Evaluate across range of thresholdsROC curve, Precision-Recall curve

Page 30: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Metric: Confusion MatrixCounting mistakes in binary predictions

30

#TP : num. true positive#FP : num. false positive

#TN : num. true negative#FN : num. false negative

#FN#FP#TP#TP

Page 31: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Metric: Accuracy

accuracy = fraction of correct predictions

31Mike Hughes - Tufts COMP 135 - Spring 2019

Potential problem:

Suppose your dataset has 1 positive example and 99 negative examples

What is the accuracy of the classifier that always predicts ”negative”?

=TP + TN

TP + TN + FN + FP

Page 32: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Metric: Accuracy

accuracy = fraction of correct predictions

32Mike Hughes - Tufts COMP 135 - Spring 2019

Potential problem:

Suppose your dataset has 1 positive example and 99 negative examples

What is the accuracy of the classifier that always predicts ”negative”?99%!

=TP + TN

TP + TN + FN + FP

Page 33: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

33

Metrics for Binary Decisions

Emphasize the metrics appropriate for your application.

“sensitivity”, “recall”

“specificity”, 1 - FPR

“precision”

Page 34: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

34Mike Hughes - Tufts COMP 135 - Spring 2019

Goal: App to classify cats vs. dogs from images

Which metric might be most important? Could we just use accuracy?

Page 35: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

35Mike Hughes - Tufts COMP 135 - Spring 2019

Goal: Classifier to find relevant tweets to list on website

Which metric might be most important? Could we just use accuracy?

Page 36: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

36Mike Hughes - Tufts COMP 135 - Spring 2019

Goal: Detector for tumors based on medical image

Which metric might be most important? Could we just use accuracy?

Page 37: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

ROC Curve (across thresholds)

37

FPR(1 – specificity)

TPR(sensitivity) random guess

perfect

Specific thresh

Page 38: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Area under ROC curve(aka AUROC or AUC or “C statistic”)

38

FPR(1 – specificity)

TPR(sensitivity)

AUROC , Pr(y(xi) > y(xj)|yi = 1, yj = 0)

Graphical:

Probabilistic:

For random pair of examples, one positive and one negative,What is probability classifier will rank positive one higher?

Area varies from 0.0 – 1.0.0.5 is random guess.1.0 is perfect.

Page 39: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Precision-Recall Curve

39Mike Hughes - Tufts COMP 135 - Spring 2019

recall (= TPR)

prec

isio

n

Page 40: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

AUROC not always best choice

40

AUROC: red is better Blue much better for alarm fatigue

Page 41: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Classifier: Evaluation Metrics

41Mike Hughes - Tufts COMP 135 - Spring 2019

https://scikit-learn.org/stable/modules/model_evaluation.html

1) To evaluate predicted scores / probabilities

2) To evaluate specific binary decisions

3) To make ROC or PR curves and compute areas

Page 42: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Objectives: Classifier Overview• 3 steps of a classification task

• Prediction• Making hard binary decisions• Predicting class probabilities

• Training• Evaluation

• Performance Metrics

• A “taste” of 3 Methods• Logistic Regression• K-Nearest Neighbors• Decision Tree Regression

42Mike Hughes - Tufts COMP 135 - Spring 2019

Page 43: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Logistic Sigmoid Function

43Mike Hughes - Tufts COMP 135 - Spring 2019

sigmoid(z) =1

1 + e�z

Goal: Transform real values into probabilitiespr

obab

ility

Page 44: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Logistic RegressionParameters:

Prediction:

Training:find weights and bias that minimize error

44Mike Hughes - Tufts COMP 135 - Spring 2019

w = [w1, w2, . . . wf . . . wF ]b

weight vector

bias scalar

p(yi = 1|xi) , sigmoid

0

@FX

f=1

wfxif + b

1

Ap(xi, w, b) =

Page 45: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Measuring prediction qualityfor a probabilistic classifier

45Mike Hughes - Tufts COMP 135 - Spring 2019

Use the log loss (aka “binary cross entropy”, related to“logistic loss”)

log loss(y, p) = �y log p� (1� y) log(1� p)

from sklearn.metrics import log_loss

Advantages:• smooth• easy to take

derivatives!

Page 46: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Logistic Regression: Training

46Mike Hughes - Tufts COMP 135 - Spring 2019

Optimization: Minimize total log loss on train set

min

w,b

NX

n=1

log loss(yn, p(xn, w, b))

Algorithm: Gradient descent

More in next class!

Avoid overfitting: Use L2 or L1 penalty on weights

Page 47: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Nearest Neighbor Classifier

47Mike Hughes - Tufts COMP 135 - Spring 2019

Parameters:none

Prediction:- find “nearest” training vector to given input x- predict y value of this neighbor

Training:none needed (use training data as lookup table)

Page 48: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

K nearest neighbor classifier

48Mike Hughes - Tufts COMP 135 - Spring 2019

Parameters:K : number of neighbors

Prediction:- find K “nearest” training vectors to input x- predict: vote most common y in neighborhood- predict_proba: report fraction of labels

Training:none needed (use training data as lookup table)

Page 49: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Decision Tree Classifier

49Mike Hughes - Tufts COMP 135 - Spring 2019

Leaves make binary predictions! (but can be made probabilistic)

Goal: Does patient have heart disease?

Page 50: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

50Mike Hughes - Tufts COMP 135 - Spring 2019

Decision Tree ClassifierParameters:

- at each internal node: x variable id and threshold- at each leaf: probability of positive y label

Prediction:- identify rectangular region for input x - predict: most common y value in region- predict_proba: report fraction of each label in regtion

Training:- minimize error on training set- often, use greedy heuristics

Page 51: 06 binary classification - Tufts CS · 2019-02-13 · Objectives: Classifier Overview • 3 steps of a classification task • Prediction • Making hard binary decisions • Predicting

Summary of Methods

51Mike Hughes - Tufts COMP 135 - Spring 2019

Function classflexibility

Knobs to tune Interpret?

LogisticRegression

Linear L2/L1 penalty on weights Inspect weights

Decision TreeClassifier

Axis-alignedPiecewise constant

Max. depthMin. leaf sizeGoal criteria

Inspecttree

K Nearest NeighborsClassifier

Piecewise constant Number of NeighborsDistance metricHow neighbors vote

Inspect neighbors