11 machine learning important issues in machine learning

Download 11 Machine Learning Important Issues in Machine Learning

Post on 21-Jan-2018

164 views

Category:

Engineering

4 download

Embed Size (px)

TRANSCRIPT

  1. 1. Machine Learning for Data Mining Important Issues Andres Mendez-Vazquez July 3, 2015 1 / 34
  2. 2. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 2 / 34
  3. 3. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 3 / 34
  4. 4. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  5. 5. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  6. 6. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  7. 7. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  8. 8. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  9. 9. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  10. 10. Images/cinvestav- Introduction What did we see until now? The design of learning machines from two main points: Statistical Point of View Linear Algebra and Optimization Point of View Going back to the probability models We might think in the machine to be learned as a function g (x|D).... Something as curve tting... Under a data set D = {(xi, yi) |i = 1, 2, ..., N} (1) Remark: Where the xi p (x|)!!! 4 / 34
  11. 11. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
  12. 12. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
  13. 13. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
  14. 14. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
  15. 15. Images/cinvestav- Thus, we have that Two main functions A function g (x|D) obtained using some algorithm!!! E [y|x] the optimal regression... Important The key factor here is the dependence of the approximation on D. Why? The approximation may be very good for a specic training data set but very bad for another. This is the reason of studying fusion of information at decision level... 5 / 34
  16. 16. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 6 / 34
  17. 17. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
  18. 18. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
  19. 19. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
  20. 20. Images/cinvestav- How do we measure the dierence We have that Var(X) = E((X )2 ) We can do that for our data VarD (g (x|D)) = ED (g (x|D) E [y|x])2 Now, if we add and subtract ED [g (x|D)] (2) Remark: The expected output of the machine g (x|D) 7 / 34
  21. 21. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
  22. 22. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
  23. 23. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
  24. 24. Images/cinvestav- Thus, we have that Or Original variance VarD (g (x|D)) = ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)] + ED [g (x|D)] E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 + ... ...2 ((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x]) + ... ... (ED [g (x|D)] E [y|x])2 Finally ED (((g (x|D) ED [g (x|D)])) (ED [g (x|D)] E [y|x])) =? (3) 8 / 34
  25. 25. Images/cinvestav- Outline 1 Bias-Variance Dilemma Introduction Measuring the dierence between optimal and learned The Bias-Variance Extreme Example 2 Confusion Matrix The Confusion Matrix 3 K-Cross Validation Introduction How to choose K 9 / 34
  26. 26. Images/cinvestav- We have the Bias-Variance Our Final Equation ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 VARIANCE + (ED [g (x|D)] E [y|x])2 BIAS Where the variance It represents the measure of the error between our machine g (x|D) and the expected output of the machine under xi p (x|). Where the bias It represents the quadratic error between the expected output of the machine under xi p (x|) and the expected output of the optimal regression. 10 / 34
  27. 27. Images/cinvestav- We have the Bias-Variance Our Final Equation ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 VARIANCE + (ED [g (x|D)] E [y|x])2 BIAS Where the variance It represents the measure of the error between our machine g (x|D) and the expected output of the machine under xi p (x|). Where the bias It represents the quadratic error between the expected output of the machine under xi p (x|) and the expected output of the optimal regression. 10 / 34
  28. 28. Images/cinvestav- We have the Bias-Variance Our Final Equation ED (g (x|D) E [y|x])2 = ED (g (x|D) ED [g (x|D)])2 VARIANCE + (ED [g (x|D)] E [y|x])2 BIAS Where the variance It represents the measure of the error between our machine g (

Recommended

View more >