machine learning using matlab - uni konstanz...matlab lecture 8 advice on ml application...
TRANSCRIPT
![Page 1: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/1.jpg)
Machine Learning using Matlab
Lecture 8 Advice on ML application
![Page 2: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/2.jpg)
Presentation schedule
![Page 3: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/3.jpg)
Time slot
10:00 - 10:20 Presentation 1
10:25 - 10:45 Presentation 2
10:50 - 11:10 Presentation 3
11:15 - 11:35 Presentation 4
● 20 minutes for each group (15 minutes talk, and 5 minutes questions)
● Each member should give at least 3 minutes talk
![Page 4: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/4.jpg)
Outline● Evaluating your machine learning model● Bias vs. variance
○ Feature parameter, e.g., degree of polynomial in linear regression○ Regularization parameter, e.g., C in SVM○ Size of training examples
● Handling skewed/unbalanced classes
![Page 5: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/5.jpg)
Debugging a learning modelSuppose you have implemented regularized linear regression to predict housing prices:
However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?
● Get more training examples● Try smaller sets of features● Try getting additional features● Try adding polynomial features● Try decreasing lambda● Try increasing lambda
![Page 6: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/6.jpg)
Evaluate your modelTo evaluate the performance of your ML model, you should:
● Divide your dataset into training set (70%) and test set (30%)● Learn hypothesis from from training data, namely,● Predict results on test set and measure the performance of your model
![Page 7: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/7.jpg)
Example - linear regressionSize Price
2104 400
1600 330
2400 369
1416 232
3000 540
1985 300
1534 315
1427 199
1380 212
1494 243
Randomly shuffled
Size Price
2104 400
2400 369
1416 232
3000 540
1534 315
1427 199
1380 212
Training set (70%)
Size Price
1600 330
1985 300
1494 243
Test set (30%)
![Page 8: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/8.jpg)
Example - linear regression
Size Price
2104 400
2400 369
1416 232
3000 540
1534 315
1427 199
1380 212
Training set
Minimize the following cost function using the training set
optimal
![Page 9: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/9.jpg)
Example - linear regression
Size Price
1600 330
1985 300
1494 243
Test setMean squared error
![Page 10: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/10.jpg)
Question: how to evaluate the performance of a logistic regression model?
![Page 11: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/11.jpg)
Parameter selectionTake linear regression as an example, you may need to choose the degree of polynomial (d), i.e.,
● You tried d from 1 to 10, and you find d = 3 have the lowest mean square error in test data. So you claim d = 3 is the optimal parameter of your model. Anything wrong?
If you apply your model to other data, the performance may decrease as the parameter is fit to the test data. Namely, you don’t know how well your model is generalized to other examples.
![Page 12: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/12.jpg)
Parameter selection (cont.)To select the optimal parameters, there are two options:
● K-fold Cross Validation (CV) when you have a small data● Divide your data into three parts (training, validation, and test) when you have
a big data
![Page 13: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/13.jpg)
K-fold Cross Validation● Divide your training set into K parts● Each iteration you pick (K-1) parts for
training, and pick the rest part for testing, measure the performance
● Average the performance from those K iterations.
![Page 14: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/14.jpg)
Parameter selection with K-fold CV● Procedures:
○ For each parameter, e.g, degree of polynomial, compute the average performance use K-fold CV
○ Pick the parameter that reports the best average performance
● Pros and cons:○ Less bias○ Computational intensive (train K ⨉ d times)
![Page 15: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/15.jpg)
Parameter selection with big data● Procedures:
○ Divide your dataset into three parts: training set (60%), validation set (20%), and test set (20%)
○ Train your model with training set, and measure the performance on validation set with different parameters, choose the optimal parameter. I.e., the parameter that has the best performance on validation set
○ Measure the performance of your model on test set with the optimal parameter
● Pros and cons:○ Less computational cost (train d times)○ More bias
![Page 16: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/16.jpg)
Example - bias vs. variance on regression
Underfitting, high bias Just right Overfitting, high variance
size size size
pric
e
pric
e
pric
e
![Page 17: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/17.jpg)
Bias vs. variance on degree of polynomialUsing the mean squared error which is defined before, we have training error and validation error:
Q: if we change the degree of polynomial d, what will the training error and validation error look like?
![Page 18: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/18.jpg)
Diagnosing bias vs. variance on degree of polynomialSuppose your machine learning model is performing less well than you were hoping. Is it a bias problem or a variance problem?
● Bias (underfitting): both training error and validation error are high
● Variance (overfitting): training error ≫ test error
degree of polynomial
erro
r
training error
validation error
![Page 19: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/19.jpg)
Let’s fix the degree of polynomial d = 4, what will the hypothesis look like with different values of lambda?
Bias vs. variance on regularization
size
pric
e
size
pric
e
size
pric
e
small intermediate large
![Page 20: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/20.jpg)
Diagnosing bias vs. variance on regularizationIf we change the value of regularization parameter , what will the training error and validation error look like?
erro
rtraining error
validation error
“Just right”
Q: now you try to tune degree of polynomial d and regularization parameter , what should you do?
![Page 21: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/21.jpg)
Grid search● Pick a bunch of values of parameter A● Pick a bunch of values of parameter B● For each pair of parameter A and B, evaluate the
validation error, either K-fold CV on training set or testing on validation set.
● Pick the pair that gives the minimum value of the validation error
![Page 22: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/22.jpg)
Grid search - regularized linear regression
0.05 2 10
2 0.22 0.10 0.34
4 0.32 0.05 0.21
6 0.52 0.12 0.43
d
optimal parameters (4,2)
![Page 23: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/23.jpg)
Bias vs. variance on size of dataIf a learning algorithm is suffering from high bias, what will the training error and validation error look like when increasing training examples?
size
pric
e
size
pric
e
No. of training examples
erro
r
validation error
training error
Increasing number of training examples will not help much if high bias
high error
![Page 24: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/24.jpg)
Bias vs. variance on size of training examplesIf a learning algorithm is suffering from high variance, what will the training error and validation error look like when increasing number of training examples?
size
pric
e
No. of training examples
erro
r validation error
training errorsize
pric
e
Increasing number of training examples is likely to help if high variance
![Page 25: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/25.jpg)
Debugging a learning modelSuppose you have implemented regularized linear regression to predict housing prices:
However, when you test your hypothesis on a new set of houses, you find that it makes unacceptably large errors in its predictions. What should you try next?
● Get more training examples ➡ fixes high variance● Try smaller sets of features ➡ fixes high variance● Try getting additional features ➡ fixes high bias● Try adding polynomial features ➡ fixes high bias● Try decreasing lambda ➡ fixes high bias● Try increasing lambda ➡ fixes high variance
![Page 26: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/26.jpg)
Is your error metric fair?Suppose you have trained a logistic regression model to predict cancer. In your test set, only 0.5% of patients have cancer (skewed classes). You got 1% error on test set. Is your model a good classifier?
Positive example (1) - patient have cancerNegative example (0) - patient no cancer
Function y = predictCancer(x)y = 0;
end
You will achieve 0.5% error without doing anything!
![Page 27: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/27.jpg)
Precision/Recall
Predicted condition
Total population Positive Negative
True condition Positive True Positive (TP) False Negative (FN)
Negative False Positive (FP) True Negative (TN)
Precision = TP/(TP+FP)
Recall = TP/(TP+FN)
Precision: of all patients where we predicted have cancer, what fraction of patients actually have cancer.
Recall: of all patients that actually have cancer, what fraction of patients did we correctly detected as having cancer.
![Page 28: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/28.jpg)
Tradeoff between precision and recall● Logistic regression:
○ Predict 1 if ○ Predict 0 if
● Suppose we want to predict cancer (y = 1) only if very confident
○ Higher precision, lower recall (large threshold)
● Suppose we want to avoid missing too much cases of cancer (avoid false negatives)
○ Higher recall, lower precision (small threshold)
● Generate the curve by tuning thresholds
Recall
Pre
cisi
on
Large threshold
Small threshold
![Page 29: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/29.jpg)
F1-measureSuppose you have the precision and recall of three learning algorithms, which one is better? Precision Recall
Algorithm 1 0.6 0.3
Algorithm 2 0.2 0.9
Algorithm 3 0.9 0.1
Algorithm 1 has the highest F1-measure
![Page 30: Machine Learning using Matlab - Uni Konstanz...Matlab Lecture 8 Advice on ML application Presentation schedule Time slot 10:00 - 10:20 Presentation 1 10:25 - 10:45 Presentation 2 10:50](https://reader035.vdocuments.mx/reader035/viewer/2022071019/5fd2f110f8df677ab40c7ce6/html5/thumbnails/30.jpg)
SummaryThe procedure of a machine learning project:
1. Collect data and divide it into training, validation, and test sets.2. Choose the machine learning model you would like to use3. Select the optimal parameters by means of training and validation sets4. With the optimal parameters, predict results on test set5. Measure and analyze your result, improve your model if possible6. Write your project report