Machine Learning

Download Machine Learning

Post on 31-Oct-2014

568 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

 

TRANSCRIPT

<ul><li> 1. Machine Learning, Data Mining INFO 629 Dr. R. Weber</li></ul> <p> 2. The picnic game </p> <ul><li>How did you reason to find the rule? </li></ul> <ul><li>According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances . </li></ul> <p> 3. Learning </p> <ul><li>Rote Learning </li></ul> <ul><li><ul><li>Learn multiplication tables </li></ul></li></ul> <ul><li>Supervised L e a r n i n g </li></ul> <ul><li><ul><li>Examples are used to help a program identify a concept </li></ul></li></ul> <ul><li><ul><li>Examples are typically represented with attribute-value pairs </li></ul></li></ul> <ul><li><ul><li>Notion of supervision originates from guidance from examples </li></ul></li></ul> <ul><li>Unsupervised Learning </li></ul> <ul><li><ul><li>Human efforts at scientific discovery, theory formation </li></ul></li></ul> <p> 4. Inductive Learning </p> <ul><li>Learning by generalization </li></ul> <ul><li>Performance of classification tasks </li></ul> <ul><li><ul><li>Classification, categorization, clustering </li></ul></li></ul> <ul><li>Rules indicate categories </li></ul> <ul><li>Goal:</li></ul> <ul><li><ul><li>Characterize a concept </li></ul></li></ul> <p> 5. Concept Learning is a Form of Inductive Learning </p> <ul><li>Learner uses: </li></ul> <ul><li><ul><li>positive examples (instances ARE examples of a concept) and</li></ul></li></ul> <ul><li><ul><li>negative examples (instances ARE NOT examples of a concept)</li></ul></li></ul> <p> 6. Concept Learning </p> <ul><li>Needs empirical validation </li></ul> <ul><li>Dense or sparse data determine quality of different methods </li></ul> <p> 7. Validation of Concept Learning i </p> <ul><li>The learned concept should be able to correctly classify new instances of the concept </li></ul> <ul><li><ul><li>When it succeeds in a real instance of the concept it finds true positives</li></ul></li></ul> <ul><li><ul><li>When it fails in a real instance of the concept it finds false negatives </li></ul></li></ul> <p> 8. Validation of Concept Learning ii </p> <ul><li>The learned concept should be able to correctly classify new instances of the concept </li></ul> <ul><li><ul><li>When it succeeds in a counterexample it finds true negatives </li></ul></li></ul> <ul><li><ul><li>When it fails in a counterexample it finds false positives </li></ul></li></ul> <p> 9. Basic classification tasks </p> <ul><li>Classification </li></ul> <ul><li>Categorization </li></ul> <ul><li>Clustering</li></ul> <p> 10. Categorization 11. Classification 12. Clustering 13. Clustering </p> <ul><li>Data analysis method applied to data </li></ul> <ul><li>Data should naturally possess groupings </li></ul> <ul><li>Goal: group data into clusters </li></ul> <ul><li>Resulting clusters are collections where objects within a cluster are similar to each other </li></ul> <ul><li>Objects outside the cluster are dissimilar to objects inside </li></ul> <ul><li>Objects from one cluster are dissimilar to objects in other clusters</li></ul> <ul><li>Distance measures are used to compute similarity </li></ul> <p> 14. Rule Learning </p> <ul><li>Learning widely used in data mining </li></ul> <ul><li>Version Space Learning is a search method to learn rules </li></ul> <ul><li>Decision Trees </li></ul> <p> 15. Version Space i </p> <ul><li>A=1,B=1,C=1 Outcome=1 </li></ul> <ul><li>A=0,B=.5,C=.5 Outcome=0 </li></ul> <ul><li>A=0,B=0,C=.3 Outcome=.5 </li></ul> <ul><li>Creates tree that includes all possible combinations </li></ul> <ul><li>Does not learn for rules with disjunctions (i.e. OR statements) </li></ul> <ul><li>Incremental method, trains additional data without the need to retrain all data </li></ul> <p> 16. Decision trees </p> <ul><li>Knowledge representation formalism </li></ul> <ul><li>Represent mutually exclusive rules (disjunction) </li></ul> <ul><li>A way of breaking up a data set into classes or categories </li></ul> <ul><li>Classification rules that determine, for each instance with attribute values, whether it belongs to one or another class </li></ul> <p> 17. Decision trees consist of: - leaf nodes (classes) -decision nodes(tests on attribute values) - from decision nodes branches grow for each possible outcome of the test From Cawsey, 1997 18. Decision tree induction </p> <ul><li>Goal is to correctly classify all example data </li></ul> <ul><li>Several algorithms to induce decision trees:ID3 (Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5 </li></ul> <ul><li>Constructs decision tree from past data </li></ul> <ul><li>Not incremental </li></ul> <ul><li>Attempts to find the simplest tree (not guaranteed because it is based on heuristics) </li></ul> <p> 19. </p> <ul><li>From: </li></ul> <ul><li><ul><li>a set of target classes </li></ul></li></ul> <ul><li><ul><li>Training data containing objects of more than one class </li></ul></li></ul> <ul><li>ID3 uses test to refine the training data set into subsets that contain objects of only one class each </li></ul> <ul><li>Choosing the right test is the key </li></ul> <p>ID3 algorithm 20. </p> <ul><li>Information gain or minimum entropy </li></ul> <ul><li>Maximizing information gain corresponds to minimizing entropy </li></ul> <ul><li>Predictive features (good indicators of the outcome) </li></ul> <p>How does ID3 chooses tests 21. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 22. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 23. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 24. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 25. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 26. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 27. ID3 algorithm no yes yes yes no Simon 6 yes yes yes no yes Gail 5 no yes no yes no Jeff 4 yes no yes no no Alison 3 yes no yes yes yes Alan 2 yesyes no yes yes Richard 1 First this year? Drinks? Works hard? Male? First last year? Student No. 28. Explanation-based learning </p> <ul><li>Incorporates domain knowledge into the learning process </li></ul> <ul><li>Feature values are assigned a relevance factor if their values are consistent with domain knowledge </li></ul> <ul><li>Features that are assigned relevance factors are considered in the learning process </li></ul> <p> 29. Familiar Learning Task </p> <ul><li>Learn relative importance of features </li></ul> <ul><li>Goal: learn individual weights </li></ul> <ul><li>Commonly used in case-based reasoning </li></ul> <ul><li>Methods include a similarity measure to get feedback about verify their relative importance: feedback methods </li></ul> <ul><li>Search methods: gradient descent </li></ul> <ul><li>ID3 </li></ul> <p> 30. Classificationusing Naive Bayes </p> <ul><li>NaveBayes classifier uses two sources of information to classify a new instance </li></ul> <ul><li><ul><li>The distribution of the rtaining dataset (prior probability) </li></ul></li></ul> <ul><li><ul><li>The region surrounding the new instance in the dataset (likelihood) </li></ul></li></ul> <ul><li>Nave because assumes conditional independence not always applicable </li></ul> <ul><li>It is made to simplify the computation and in this sense considered to be Nave. </li></ul> <ul><li>Conditional independence reduces the requirement for large number of observations </li></ul> <ul><li>Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications. </li></ul> <ul><li>Comparable in performance with classification trees and with neural networks</li></ul> <ul><li>Highly accurate and fast when applied to large databases </li></ul> <ul><li>Some links: </li></ul> <ul><li><ul><li>http ://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm </li></ul></li></ul> <ul><li><ul><li>http://www.statsoft.com/textbook/stnaiveb.html </li></ul></li></ul> <p> 31. KDD : definition </p> <ul><li>Knowledge Discovery in Databases (KDD)is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000) </li></ul> <ul><li>KDDis the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6).</li></ul> <ul><li>Data miningis one of the steps in the KDD process. </li></ul> <ul><li>Text mining concerns applying data mining techniques to unstructured text. </li></ul> <p> 32. The KDD Process DATA patterns interpretation SELECTED DATA PROCESSED DATA browsing KNOWLEDGE TRANSFORMED DATA filtering preprocessing transformation Data mining 33. </p> <ul><li>Predictive modeling/risk assessment </li></ul> <ul><li>Database segmentation </li></ul> <p>Data mining tasks i Classification, decision trees Kohonen nets, clustering techniques 34. </p> <ul><li>Link analysis </li></ul> <ul><li>Deviation detection </li></ul> <p>Data mining tasks ii </p> <ul><li>Rules:</li></ul> <ul><li>Association generation </li></ul> <ul><li>Relationships between entities </li></ul> <ul><li>How things change over time, trends </li></ul> <p> 35. KDD applications </p> <ul><li>Fraud detection </li></ul> <ul><li><ul><li>Telecom (calling cards, cell phones) </li></ul></li></ul> <ul><li><ul><li>Credit cards </li></ul></li></ul> <ul><li><ul><li>Health insurance </li></ul></li></ul> <ul><li>Loan approval </li></ul> <ul><li>Investment analysis </li></ul> <ul><li>Marketing and sales data analysis </li></ul> <ul><li><ul><li>Identify potential customers </li></ul></li></ul> <ul><li><ul><li>Effectiveness of sales campaign </li></ul></li></ul> <ul><li><ul><li>Store layout </li></ul></li></ul> <p> 36. Text mining </p> <ul><li>The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query. </li></ul> <p> 37. Text mining applications </p> <ul><li>IBM Text Navigator </li></ul> <ul><li><ul><li>Cluster documents by content; </li></ul></li></ul> <ul><li><ul><li>Each document is annotated by the 2 most frequently used words in the cluster; </li></ul></li></ul> <ul><li>Concept Extraction (Los Alamos) </li></ul> <ul><li><ul><li>Text analysis of medical records; </li></ul></li></ul> <ul><li><ul><li>Uses a clustering approach based on trigram representation; </li></ul></li></ul> <ul><li><ul><li>Documents in vectors, cosine for comparison; </li></ul></li></ul>