decision tree learning - amazon s3...introduction to machine learning examples of features features...

80
INTRODUCTION TO MACHINE LEARNING Decision tree learning

Upload: others

Post on 27-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

INTRODUCTION TO MACHINE LEARNING

Decision tree learning

Page 2: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Task of classification● Automatically assign class to observations with features

● Observation: vector of features, with a class

● Automatically assign class to new observation with features, using previous observations

● Binary classification: two classes

● Multiclass classification: more than two classes

Page 3: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Example● A dataset consisting of persons

● Features: age, weight and income

● Class:

● binary: happy or not happy

● multiclass: happy, satisfied or not happy

Page 4: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Examples of features● Features can be numerical

● age: 23, 25, 75, …

● height: 175.3, 179.5, …

● Features can be categorical

● travel_class: first class, business class, coach class

● smokes?: yes, no

Page 5: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

The decision tree● Suppose you’re classifying patients as sick or not sick

● Intuitive way of classifying: ask questions

Is the patient young or old?

Page 6: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

The decision tree● Suppose you’re classifying patients as sick or not sick

● Intuitive way of classifying: ask questions

Is the patient young or old?

Old

Page 7: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

The decision tree● Suppose you’re classifying patients as sick or not sick

● Intuitive way of classifying: ask questions

Is the patient young or old?

Old

Smoked for more than 10 years?

Page 8: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

The decision tree● Suppose you’re classifying patients as sick or not sick

● Intuitive way of classifying: ask questions

Is the patient young or old?

Vaccinated against the measles?

Young Old

Smoked for more than 10 years?

Page 9: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

The decision tree● Suppose you’re classifying patients as sick or not sick

● Intuitive way of classifying: ask questions

Is the patient young or old?

Vaccinated against the measles?

Young Old

Smoked for more than 10 years?

Yes No

… …

Yes No

… …

Page 10: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

The decision tree● Suppose you’re classifying patients as sick or not sick

● Intuitive way of classifying: ask questions

Is the patient young or old?

Vaccinated against the measles?

Young Old

Smoked for more than 10 years?

Yes No

… …

Yes No

… …

It’s a decision tree!!!

Page 11: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Define the tree

A

B C

D E F G

Page 12: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nodes

Define the tree

A

B C

D E F G

Page 13: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Edges

Define the tree

A

B C

D E F G

Page 14: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Root

Define the tree

A

B C

D E F G

Page 15: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Root

Leafs

Define the tree

A

B C

D E F G

Page 16: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Root

Children of A

Children of B, C Grandchildren of A

Define the tree

A

B C

D E F G

Page 17: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Root

Children of A

Define the tree

A

B C

D E F G

Leafs

Children of B, C Grandchildren of A

Page 18: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Questions to ask

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Page 19: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Categorical feature● Can be a feature test on itself

● travel_class: coach, business or first

travel_class

coachbusiness

first

Page 20: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Classifying with the tree

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Observation: patient of 40 years, vaccinated and didn’t smoke

Page 21: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Classifying with the tree

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Observation: patient of 40 years, vaccinated and didn’t smoke

Page 22: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Classifying with the tree

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Observation: patient of 40 years, vaccinated and didn’t smoke

Page 23: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Classifying with the tree

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Observation: patient of 40 years, vaccinated and didn’t smoke

Page 24: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Classifying with the tree

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Observation: patient of 40 years, vaccinated and didn’t smoke

Page 25: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Classifying with the tree

age <= 18

vaccinated smoked

not sick sick sick not

sick

yes

yes yes

no

nono

Observation: patient of 40 years, vaccinated and didn’t smoke

Prediction: not sick

Page 26: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Learn a tree● Use training set

● Come up with queries (feature tests) at each node

Page 27: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

part of training set part of training setpart of training set

yes

part of training set

no

training set

age <= 18

Split into parts 2 parts for binary test

TRUE FALSE

Page 28: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

part of training set part of training set

feature test feature test

part of training set

yes

part of training set

no

part of training set

yes

part of training set

no

part of training set part of training set part of training set part of training set

Page 29: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

keep splitting until leafs contain small portion of training set

part of training set part of training set part of training set part of training set

Page 30: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Learn the tree

leaf

part of training set

class 1 class 2class

● Goal: end up with pure leafs — leafs that contain observations of one particular class

Page 31: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

leaf

part of training set

class 1 class 2class

leaf

part of training set

class 1 class 2● When classifying new instances

● end up in leaf

● Goal: end up with pure leafs — leafs that contain observations of one particular class

Learn the tree

● In practice: almost never the case — noise

Page 32: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

leaf

part of training set

class 1 class 2

Learn the tree

● assign class of majority of training instances

● In practice: almost never the case — noise

● When classifying new instances

● end up in leaf

● Goal: end up with pure leafs — leafs that contain observations of one particular class

Page 33: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Learn the tree● At each node

● Iterate over different feature tests

● Choose the best one

● Comes down to two parts

● Make list of feature tests

● Choose test with best split

Page 34: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Construct list of tests● Categorical features

● Parents/grandparents/… didn’t use the test yet

● Numerical features

● Choose feature

● Choose threshold

Page 35: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Choose best feature test● More complex

● Use spli!ing criteria to decide which test to use

● Information gain ~ entropy

Page 36: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Information gain● Information gained from split based on feature test

● Test leads to nicely divided classes -> high information gain

● Test leads to scrambled classes-> low information gain

● Test with highest information gain will be chosen

Page 37: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Pruning● Number of nodes influences chance on overfit

● Restrict size — higher bias

● Decrease chance on overfit

● Pruning the tree

Page 38: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

Page 39: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

INTRODUCTION TO MACHINE LEARNING

k-Nearest Neighbors

Page 40: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Instance-based learning● Save training set in memory

● No real model like decision tree

● Compare unseen instances to training set

● Predict using the comparison of unseen data and the training set

Page 41: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

k-Nearest Neighbor● Form of instance-based learning

● Simplest form: 1-Nearest Neighbor or Nearest Neighbor

Page 42: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nearest Neighbor - example● 2 features: X1 and X2

● Class: red or blue

● Binary classification

Page 43: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nearest Neighbor - example

Page 44: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nearest Neighbor - example● Save complete training set

Page 45: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nearest Neighbor - example● Save complete training set

● Given: unseen observation with features X = (1.3, -2)

Page 46: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nearest Neighbor - example● Save complete training set

● Given: unseen observation with features X = (1.3, -2)

● Compare training set with new observation

Page 47: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Nearest Neighbor - example● Save complete training set

● Given: unseen observation with features X = (1.3, -2)

● Compare training set with new observation

● Find closest observation — nearest neighbor — and assign same class

just Euclidean distance, nothing fancy

Page 48: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

k-Nearest Neighbors● k is the amount of neighbors

● If k = 5

● Use 5 most similar observations (neighbors)

● Assigned class will be the most represented class within the 5 neighbors

Page 49: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Distance metric● Important aspect of k-NN

Page 50: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Distance metric● Important aspect of k-NN

● Euclidian distance:

Page 51: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Distance metric● Important aspect of k-NN

● Euclidian distance:

● Manha!an distance:

Page 52: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Scaling - example● Dataset with

● 2 features: weight and height

● 3 observations

height (m) weight (kg)

1 1.83 80

2 1.83 80.5

3 1.70 80

distance: 0.5

distance: 0.13

Page 53: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Scaling - example● Dataset with

● 2 features: weight and height

● 3 observations

height (cm) weight (kg)

1 183 80

2 183 80.5

3 170 80

distance: 0.5

distance: 13

Scale influences distance!

Page 54: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Scaling● Normalize all features

● e.g. rescale values between 0 and 1

● Gives be!er measure of real distance

● Don’t forget to scale new observations

Page 55: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Categorical features● How to use in distance metric?

● Dummy variables

● 1 categorical features with N possible outcomes to N binary features (2 outcomes)

Page 56: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Dummy variables — Example

mother_tongueSpanishItalianItalianSpanishFrenchFrenchFrench

spanish italian french1 0 00 1 00 1 01 0 00 0 10 0 10 0 1

mother tongue: Spanish, Italian or French

Page 57: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

Page 58: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

INTRODUCTION TO MACHINE LEARNING

Introducing: The ROC curve

Page 59: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Introducing● Very powerful performance measure

● For binary classification

● Reiceiver Operator Characteristic Curve (ROC Curve)

Page 60: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Probabilities as output● Used decision trees and k-NN to predict class

● They can also output probability that instance belongs to class

Page 61: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Probabilities as output - example● Binary classification

● Decide whether patient is sick or not sick

● Define probability threshold from which you decide patient to be sick

New patient: 70% 30%Decision tree:

higher than 50%classify as

Avoid sending sick patient home:lower threshold to 30%

decision function!

More patients classified as

More patients classified as

but also

Page 62: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Confusion matrix● Other performance measure for classification

● Important to construct the ROC curve

Page 63: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Confusion matrix● Binary classifier: positive or negative (1 or 0)

Prediction

P N

Truthp TP FN

n FP TN

Page 64: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Prediction

P N

Truthp TP FN

n FP TN

True Positives Prediction: P

Truth: P

● Binary classifier: positive or negative (1 or 0)

Confusion matrix

Page 65: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Confusion matrix

Prediction

P N

Truthp TP FN

n FP TN

False Negatives Prediction: N

Truth: P

● Binary classifier: positive or negative (1 or 0)

Page 66: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Confusion matrix

Prediction

P N

Truthp TP FN

n FP TN

False Positives Prediction: P

Truth: N

● Binary classifier: positive or negative (1 or 0)

Page 67: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Confusion matrix

Prediction

P N

Truthp TP FN

n FP TN

True Negatives Prediction: N

Truth: N

● Binary classifier: positive or negative (1 or 0)

Page 68: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Prediction

P N

Truthp TP FN

n FP TN

TPR TP/(TP+FN)

Ratios in the confusion matrix● True positive rate (TPR) = recall

● False positive rate (FPR)

Page 69: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Prediction

P N

Truthp TP FN

n FP TN

Ratios in the confusion matrix● True positive rate (TPR) = recall

● False positive rate (FPR)

TPR TP/(TP+FN)

Truly

Truly+

Falsely

Page 70: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Prediction

P N

Truthp TP FN

n FP TN

Ratios in the confusion matrix● True positive rate (TPR) = recall

● False positive rate (FPR)

FPR FP/(FP+TN)

Page 71: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Prediction

P N

Truthp TP FN

n FP TN

Ratios in the confusion matrix● True positive rate (TPR) = recall

● False positive rate (FPR)

FPR FP/(FP+TN)

Falsely

Falsely+

Truly

Page 72: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

ROC curve● Horizontal axis: FPR

● Vertical axis: TPR

● How to draw the curve?

False positive rateTr

ue p

ositi

ve ra

te0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Page 73: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Draw the curve● Need classifier which outputs probabilities

● The decision function

probability decide to diagnose

probability

threshold by decision function

Page 74: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Draw the curve● Need classifier which outputs probabilities

● The decision function

probability

probability decide to diagnose

threshold by decision function

Page 75: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

50%

probability

>=50%: sick< 50%: healthy

Page 76: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

0%

probability

all sick

Page 77: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

100%

probability

all healthy

Page 78: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

Interpreting the curve

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

● Is it a good curve?

● Closer to le! upper corner = be!er

● Good classifiers have big area under the curve

Page 79: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

Introduction to Machine Learning

False positive rate

True

pos

itive

rate

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

AUC = 0.905

Area under the curve (AUC)

> 0.9 = very good

Page 80: Decision tree learning - Amazon S3...Introduction to Machine Learning Examples of features Features can be numerical age: 23, 25, 75, … height: 175.3, 179.5, … Features can be

INTRODUCTION TO MACHINE LEARNING

Let’s practice!