three kinds of learning

24
Three kinds of learning Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can you find? Reinforcement learning Learn from positive negative reinforcement

Upload: sani

Post on 14-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Three kinds of learning. Supervised learning Learning some mapping from inputs to outputs Unsupervised learning Given “data”, what kinds of patterns can you find? Reinforcement learning Learn from positive negative reinforcement. Categorical data example. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Three kinds of learning

Three kinds of learning Supervised learning

Learning some mapping from inputs to outputs

Unsupervised learning Given “data”, what kinds of patterns

can you find? Reinforcement learning

Learn from positive negative reinforcement

Page 2: Three kinds of learning

Categorical data example

Example from Ross Quinlan, Decision Tree Induction; graphics from Tom Mitchell, Machine Learning

Page 3: Three kinds of learning

Decision Tree Classification

Page 4: Three kinds of learning

Which feature to split on?

Try to classify as many as possible with each split(This is a good split)

Page 5: Three kinds of learning

Which feature to split on?

These are bad splits – no classifications obtained

Page 6: Three kinds of learning

Improving a good split

Page 7: Three kinds of learning

Decision Tree Algorithm Framework

Use splitting criterion to decide on best attribute to split

Each child is new decision tree – recurse with parent feature removed

If all data points in child node are same class, classify node as that class

If no attributes left, classify by majority rule If no data points left, no such example

seen: classify as majority class from entire dataset

Page 8: Three kinds of learning

How do we know which splits are good?

Want nodes as “pure” as possible How do we quantify “randomness”

of a node? Want All elements +: “randomness” = 0 All elements –: “randomness” = 0 Half +, half -: “randomness” = 1 Draw plot

What should “randomness” function look like?

Page 9: Three kinds of learning

Typical solution: Entropy pp = proportion of + examples pn = proportion of – examples

A collection with low entropy is good.

NNPP ppppEntropy lglg

Page 10: Three kinds of learning

ID3 Criterion Split on feature with most

information gain. Gain = entropy in original node –

weighted sum of entropy in child nodes

child

childEntropyparentofsizechildofsize

parentEntropysplitGain

)(

)()(

Page 11: Three kinds of learning

How good is this split?

985.74log

74

73log

73

22 592.071log

71

76log

76

22

789.0)592(.147)985(.

147average Weighted

940.145log

145

149log

149

22

151.789.940. Gain

Page 12: Three kinds of learning

How good is this split?

971.53log

53

52log

52

22

694.0)971(.145)0(

144)971(.

145average Weighted

940.145log

145

149log

149

22

246.694.940. Gain

040log

40

44log

44

22 971.52log

52

53log

53

22

Page 13: Three kinds of learning

The big picture Start with root Find attribute to split on with most

gain Recurse

Page 14: Three kinds of learning

Assessment How do I know how well my

decision tree works? Training set: data that you use to

build decision tree Test set: data that you did not use

for training that you use to assess the quality of decision tree

Page 15: Three kinds of learning

Issues on training and test sets

Do you know the correct classification for the test set?

If you do, why not include it in the training set to get a better classifier?

If you don’t, how can you measure the performance of your classifier?

Page 16: Three kinds of learning

Cross Validation Tenfold cross-validation

Ten iterations Pull a different tenth of the dataset out

each time to act as a test set Train on the remaining training set Measure performance on the test set

Leave one out cross-validation Similar, but leave only one point out

each time, then count correct vs. incorrect

Page 17: Three kinds of learning

Noise and Overfitting Can we always obtain a decision tree

that is consistent with the data? Do we always want a decision tree that is

consistent with the data? Example: Predict Carleton students who

become CEOs Features: state/country of origin, GPA letter,

major, age, high school GPA, junior high GPA, ...

What happens with only a few features? What happens with many features?

Page 18: Three kinds of learning

Overfitting Fitting a classifier “too closely” to

the data finding patterns that aren’t really there

Prevented in decision trees by pruning When building trees, stop recursion on

irrelevant attributes Do statistical tests at node to determine

if should continue or not

Page 19: Three kinds of learning

Examples of decision treesusing Weka

Page 20: Three kinds of learning

Preventing overfitting by cross validation

Another technique to prevent overfitting (is this valid)? Keep on recursing on decision tree as

long as you continue to get improved accuracy on the test set

Page 21: Three kinds of learning

Ensemble Methods Many “weak” learners, when

combined together, can perform more strongly than any one by itself

Bagging & Boosting: many different learners, voting on which classification Multiple algorithms, or different

features, or both

Page 22: Three kinds of learning

Bagging / Boosting Bagging: vote to determine answer

Run one algorithm on random subsets of data to obtain multiple classifiers

Boosting: weighted vote to determine answer Each iteration, weight more heavily data that

learner got wrong What does it mean to “weight more heavily” for

k-nn? For decision trees? AdaBoost is recent (1997) and has become

popular, fast

Page 23: Three kinds of learning

Computational Learning Theory

Page 24: Three kinds of learning

Chapter 20 up next Moving on to Chapter 20:

statistical learning methods Skipping to: will revisit earlier

topics (perhaps) near end of course 20.5: Neural Networks 20.6: Support vector machines