lecture 08: classification wrap-upjcrouser/sds293/lectures/08... · sds 293: machine learning. ......
TRANSCRIPT
![Page 1: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/1.jpg)
LECTURE 08:CLASSIFICATION WRAP-UPOctober 4, 2017SDS 293: Machine Learning
![Page 2: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/2.jpg)
Outline
• Lab: comparing classification approaches• Evaluating models - Today: validation set-Monday: resampling
![Page 3: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/3.jpg)
60 min: comparing classification methods
• Work in teams of 2-4 people (3 is ideal)• Goal: to deepen our understanding of how the 4
classification methods we’ve discussed measure up- K-nearest neighbors- Logistic regression- LDA- QDA
• Instructions:http://www.science.smith.edu/~jcrouser/SDS293/labs/lab6.html
• You can check your intuition against p.153-4 of ISLR
![Page 4: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/4.jpg)
Discussion: classification methodsLogistic
RegressionK-nearest neighbors LDA QDA
Scenario 1
Scenario 2
Scenario 3
Scenario 4
Scenario 5
Scenario 6
( )
( )
( )
( )
![Page 5: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/5.jpg)
Rough guide to choosing a method
Is the decision boundary ~linear?
Are the observations normally distributed?
Do you have limited training data?
Start with
LDA
Start withLogistic
Regression
Start with
QDA
Start withK-NearestNeighbors
YES
NO
YES
NO
YES
NO
Linear methodsNon-linear methods
![Page 6: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/6.jpg)
Test vs. training
• In all of these methods, we evaluate performance using test error rather than training error (why?)
• Real life: if the data comes in stages (i.e. trying to predict election results), we have a natural “test set”
• Question: when that’s not the case, what do we do?• Answer: set aside a “validation set”
![Page 7: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/7.jpg)
What to set aside?
• Smarket: used years 2001-2004 to train, tested on 2005• Caravan: used the first 1000 records as test
• Carseats: - some of you split the data in half- some sampled randomly- some split on one of the predictors (population>300, sales>7, etc.)
Question: did it make a difference?
![Page 8: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/8.jpg)
Answer: absolutely
• Example: Auto dataset (392 observations)• Goal: use linear regression to predict mpg using
polynomial f (horsepower)
• Randomly split into 2 sets of 196 obs. (test and training)
2 4 6 8 10
16
18
20
22
24
26
28
Degree of Polynomial
Mean S
quare
d E
rror
2 4 6 8 10
16
18
20
22
24
26
28
Degree of Polynomial
Mean S
quare
d E
rror
2 4 6 8 10
16
18
20
22
24
26
28
Degree of Polynomial
Mean S
quare
d E
rror
2 4 6 8 10
16
18
20
22
24
26
28
Degree of Polynomial
Mean S
quare
d E
rror
^10x?
Huge variation in prediction accuracy!
![Page 9: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/9.jpg)
Issues with the “validation set” approach
1. The test error rate depends heavily on which observations are used for training vs. testing
2. We’re only training on a subset of the data (why is this a problem?)
We need a new approach...
![Page 10: LECTURE 08: CLASSIFICATION WRAP-UPjcrouser/SDS293/lectures/08... · SDS 293: Machine Learning. ... -Logistic regression-LDA-QDA ... Discussion: classification methods Logistic Regression](https://reader034.vdocuments.mx/reader034/viewer/2022050323/5f7d556ef1693c7dd85f7eab/html5/thumbnails/10.jpg)
Coming up
• Monday: Resampling Methods• A2 due tonight
• A3 posted, due Wednesday Oct. 11th by 11:59pm