lecture7 - ibk

Click here to load reader

Post on 24-Jan-2015




0 download

Embed Size (px)




  • 1. Introduction to MachineLearning Lecture 7Instance Based LearningAlbert Orriols i Puigaorriols@salle.url.edui l @ ll ldArtificial Intelligence Machine Learning Enginyeria i Arquitectura La Salle gy qUniversitat Ramon Llull

2. Recap of Lecture 6LETS START WITH DATACLASSIFICATIONSlide 2 Artificial Intelligence Machine Learning 3. Recap of Lecture 6Data SetClassification Model How?We are going to deal with: Data described by nominal and continuous attributes Data that may have instances with missing valuesSlide 3 Artificial Intelligence Machine Learning 4. Recap of Lecture 6 We want to build decision trees How can I automatically generate these types of trees? Decide which attribute we should put in each node Decide a split point Rely on information theory We also saw many other improvementsSlide 4 Artificial IntelligenceMachine Learning 5. Todays Agenda Classification without building a model K-Nearest Neighbor (kNN) Effect of K Distance functions Variants of K-NN Strengths and weaknessesSlide 5 Artificial IntelligenceMachine Learning 6. Classification without Building a ModelForget about a global model!g g Simply store all the training examples Build local model f each new t t i t B ild a l l d l forh test instance Refered to as lazy learners Some approaches to IBL Nearest neighbors Locally weighted regression Case-based reasoningSlide 6 Artificial IntelligenceMachine Learning 7. k-Nearest Neighbors Algorithm g Store all the training data Given a new t t instance Gitest i t Recover the k neighbors of the test instance Predict th P di t the majority class among the neighborsj it l thi hb Voronoi Cells: The feature space is decomposed into several cells. E.g. for k=1 Slide 7 Artificial IntelligenceMachine Learning 8. k-Nearest Neighbors But, where is the learning process?, gp Select the k neighbors and return the majority class is learning? No, thats just t i i N th t j t retrieving But still, some important issues Which k should I use? Which distance functions should I use? Should I maintain all instances of the training data set? Slide 8 Artificial Intelligence Machine Learning 9. Which k Should I Use? The effect of k15-NN 1-NN Do you remember the discussion about overfitting in C4.5? Apply the same concepts here!Slide 9 Artificial Intelligence Machine Learning 10. Which k Should I Use? Some experimental results on the use of different kp 7-NN Number of neighborsNotice that the test error decreases as k increases but at k 5- increases, 7, it starts increasing again Rule of thumb: k=3 k=5 and k=7 seem to work ok in the k=3, k=5, majority of problems Slide 10 Artificial IntelligenceMachine Learning 11. Distance Functions Distance functions must be able to Nominal attributes Continuous attributes C titt ib t Missing values The key They must return a low value for similar objects and a high value for different objects Seems obvious, right? But still, it is domain dependent obvious still There are many of them. Lets see some of the most usedSlide 11 Artificial Intelligence Machine Learning 12. Distance Functions Distance between two points in the same spacep p d(x, y) Some properties expected to be satisfied in general d(x, y) 0 and d(x, x) = 0 d(x y) = d(y x) d(x, d(y, d(x, y) + d(y, z) d(x, z)Slide 12 Artificial Intelligence Machine Learning 13. Distances for Continuous Variables Given x=(x1,,xn) and y=(y1,,yn) nd E ( x, y ) = [ ( xi yi ) 2 ]1/ 2 Euclideani =1n d E ( x, y ) = [ ( xi yi ) ] q 1/ q Minkowskyi =1 nd ABS ( x, y ) = | xi yi | Distance absolute value i =1 Slide 13 Artificial IntelligenceMachine Learning 14. Distances for Continuous Variables What if attributes are measured over different scales? Attribute 1 ranging in [0,1] Attribute 2 ranging in [0 1000][0, Can you detect any potential problem in the aforementioned distance functions? X in [0,1], y in [0,1000]X in [0,1000], y in [0,1000]Slide 14 Artificial Intelligence Machine Learning 15. Distances for Continuous Variables The larger the scale, the larger the influence of theg,g attribute in the distance function Solution: Normalize each attribute How: Normalization by means of the range d (ex1a , ex2 )a d anorm (ex1 , ex2 ) =a a max a min aNormalization by means of the standard deviationd (ex1a , ex2 ) a d anorm (ex1a , ex2 ) = a4 a Slide 15 Artificial Intelligence Machine Learning 16. Distances for Nominal Attributes Several metrics to deal with nominal attributes Overlap distance function Idea: Two nominal attributes are equal only if they have the same value Slide 16 Artificial Intelligence Machine Learning 17. Distances for Nominal Attributes Several metrics to deal with nominal attributes Value difference metric (VDM)C = number of classesP(a exia, c) = conditional probabilityP(a,that the output class is c given thatthe attribute a has de value exia. Idea: Two nominal values are similar if they have more similar correlations with the output classes See (Wilson & Martinez) for more distance functionsSlide 17 Artificial Intelligence Machine Learning 18. Distances for Heterogeneous Attributes What if my data set is described by both nominal and continuous attributes? Apply the same distance function Use nominal distance functions for nominal attributes Use continuous distance function for continuous attributesSlide 18 Artificial Intelligence Machine Learning 19. Variants of kNN Different variants of kNN Distance-weighted kNN Attribute-weighted kNN Slide 19 Artificial IntelligenceMachine Learning 20. Distance-Weighted kNNInference of original kNNgThe k nearest neighbors vote for the classShouldn tShouldnt the closest examples have a higher influence in thedecision process?Weight the contribution of each of the k neighbors wrt their distance E.g., k f ( xq ) = arg max wi (v, f ( xi )) k wi f ( xi ) vVi =1f ( xq ) =i =11k where wi = wid ( xq , xi ) 2 i =1More robust to noisy instances and outliers E.g.: Shepards method (Shepard,1968) Slide 20 Artificial IntelligenceMachine Learning 21. Attribute-weighted kNN What if some attributes are irrelevant or misleading?g If irrelevant cost increases, but accuracy is not affected If misleading i l di cost increases and accuracy may d tiddecrease Weight attributes: n d w( x, y ) = wi ( xi yi ) 2 i =1How to determine the weights? Option 1: The expert ppp provide us with the weightsg Option 2: Use a machine learning approach More will be said in the next lecture! Slide 21 Artificial Intelligence Machine Learning 22. Strengths and Weaknesses Strengths of kNNBuilding of a new local model for each test instanceLearning has no costEmpirical results show that the method is highly accurate w.r.t othermachine learning techniques WeaknessesRetrieving approach, but does not learnNo global model. The knowledge is not legibleTest cost increases linearly with the input instancesNo generalizationCurse of dimensionality: What happens if we have many attributes?Noise and outliers may have a very negative effect Slide 22 Artificial IntelligenceMachine Learning 23. Next ClassFrom instance-based to case-based reasoning A little bit more on learning Distance functions Prototype selectionSlide 23 Artificial Intelligence Machine Learning 24. Introduction to MachineLearningLecture 7 Instance Based Learning Albert Orriols i Puig aorriols@salle.url.edu i l @ ll ld Artificial Intelligence Machine LearningEnginyeria i Arquitectura La Sallegy q Universitat Ramon Llull