# 11 machine learning important issues in machine learning

Post on 21-Jan-2018

164 views

Category:

## Engineering

Embed Size (px)

TRANSCRIPT

1. 1. Machine Learning for Data Mining Important Issues Andres Mendez-Vazquez July 3, 2015 1 / 34
2. 2. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 2 / 34
3. 3. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 3 / 34
4. 4. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
5. 5. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
6. 6. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
7. 7. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
8. 8. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
9. 9. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
10. 10. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
11. 11. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
12. 12. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
13. 13. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
14. 14. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
15. 15. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
16. 16. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 6 / 34
17. 17. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
18. 18. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
19. 19. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
20. 20. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
21. 21. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
22. 22. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
23. 23. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
24. 24. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
25. 25. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 9 / 34
26. 26. Images/cinvestav- We have the Bias-Variance Our Final Equation ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 VARIANCE + (ED [g (x|D)] E [y|x])2 BIAS Where the variance It represents the measure of the error between our machine g (x|D) and the expected output of the machine under xi p (x|). Where the bias It represents the quadratic error between the expected output of the machine under xi p (x|) and the expected output of the optimal regression. 10 / 34
27. 27. Images/cinvestav- We have the Bias-Variance Our Final Equation ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 VARIANCE + (ED [g (x|D)] E [y|x])2 BIAS Where the variance It represents the measure of the error between our machine g (x|D) and the expected output of the machine under xi p (x|). Where the bias It represents the quadratic error between the expected output of the machine under xi p (x|) and the expected output of the optimal regression. 10 / 34
28. 28. Images/cinvestav- We have the Bias-Variance Our Final Equation ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 VARIANCE + (ED [g (x|D)] E [y|x])2 BIAS Where the variance It represents the measure of the error between our machine g (

Recommended