introduction to artificial intelligence massimo poesio machine learning: decision trees

44
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Upload: cameron-washington

Post on 17-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Massimo Poesio

Machine Learning: Decision Trees

Page 2: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

A DEFINITION OF LEARNING: LEARNING AS IMPROVEMENT

Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the task or tasks drawn from the same population more efficiently and more effectively the next time. -- Herb Simon

Page 3: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

3

LEARNING AS IMPROVEMENT, 2Improve on task, T, with respect to

performance metric, P, based on experience, E.T: Assign to words their senses.P: Percentage of words correctly classified.E: Corpus of words, some with human-given labels

T: Categorize email messages as spam or legitimate.P: Percentage of email messages correctly classified.E: Database of emails, some with human-given labels

T: Playing checkersP: Percentage of games won against an arbitrary opponent E: Playing practice games against itself

T: Recognizing hand-written wordsP: Percentage of words correctly classifiedE: Database of human-labeled images of handwritten words

Page 4: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

4

SPECIFYING A LEARNING SYSTEM• Choose the training experience• Choose exactly what is too be learned, i.e. the

target function.• Choose how to represent the target function.• Choose a learning algorithm to infer the target

function from the experience.

Environment/Experience

Learner

Knowledge

PerformanceElement

Page 5: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

FEATURES

• The functions learned by ML algorithms specify a mapping from input FEATURES to output FEATURES

Page 6: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

EXAMPLE 1: CHECKERS

• Features used in the linear function seen last time:– bp(b): number of black pieces on board b– rp(b): number of red pieces on board b– bk(b): number of black kings on board b – rk(b): number of red kings on board b– bt(b): number of black pieces threatened (i.e. which

can be immediately taken by red on its next turn)– rt(b): number of red pieces threatened

Page 7: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

EXAMPLE 2: WHEN THE NEIGHBOR DRIVES

• Suppose we are trying to learn when our neighbor goes to work by car so we can ask a ride

• Their decision appears to be influenced by– Temperature– Whether it’s going to rain or not– Day of the week– Whether they need to stop at a shop on the way back– How they are dressed

Page 8: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

PAST EXPERIENCE IN TERMS OF FEATURES

Page 9: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

PREDICTION TASK

Page 10: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

PREDICTION

Page 11: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

THE NEED FOR AVERAGING

Page 12: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

THE NEED FOR GENERALIZATION

Page 13: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

FIRST EXAMPLE OF ML ALGORITHM: DECISION TREES

• A method independently developed by Quinlan in AI and by Breiman et al in statistics

Page 14: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

14

DECISION TREES• Tree-based classifiers for instances represented as feature-vectors. Nodes

test features, there is one branch for each value of the feature, and leaves specify the category.

• Can represent arbitrary conjunction and disjunction. Can represent any classification function over discrete feature vectors.

color

red blue green

shape

circle square triangleneg pos

pos neg neg

color

red blue green

shape

circle square triangle B C

A B C

Page 15: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

A DECISION TREE FOR THE DRIVING PROBLEM

Page 16: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

LEARNING DECISION TREES FROM DATA

• Use the data in the training set to build a decision tree that will then be used to make decisions with unseen data

• The decision tree specifies a function

Page 17: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

TRAVERSING THE DECISION TREE

Page 18: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

TRAVERSING THE DECISION TREE

Page 19: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

TRAVERSING THE DECISION TREE

Page 20: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Decision Tree Learning

• Discrete class values – Slight changes in the input: either no or full effect on

the classification

• Discrete feature values (or discretized)• Fast• Modern DT induction algorithms:– Handling noisy feature values– Handling noisy labels– Handling missing feature values

Page 21: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

• Partition training examples into good “splits”, based on values of a single “good” feature:

(1) Sat, hot, no, casual, keys -> +(2) Mon, cold, snow, casual, no-keys -> -(3) Tue, hot, no, casual, no-keys -> -(4) Tue, cold, rain, casual, no-keys -> -(5) Wed, hot, rain, casual, keys -> +

Page 22: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

keys?

yes no

Drive: 1,5 Walk: 2,3,4

Page 23: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

• Partition training examples into good “splits”, based on values of a single “good” feature

(1) Sat, hot, no, casual -> +(2) Mon, cold, snow, casual -> -(3) Tue, hot, no, casual -> -(4) Tue, cold, rain, casual -> -(5) Wed, hot, rain, casual -> +• No acceptable classification: proceed recursively

Page 24: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

t?

cold hot

Walk: 2,4 Drive: 1,5Walk: 3

Page 25: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

t?

cold hot

Walk: 2,4 day?

SatTue

Wed

Drive: 1 Walk: 3 Drive: 5

Page 26: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

t?

cold hot

Walk: 2,4 day?

SatTue

Wed

Drive: 1 Walk: 3 Drive: 5

Mo, Thu, Fr, Su

?Drive

Page 27: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction: divide and conquer algorithm

• Pick a feature• Split your examples into subsets based on the

values of the feature• For each subsets, examine the examples:– Zero: assign the most popular class for the parent– All from the same class: assign this class– Otherwise, process recursively

Page 28: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-Down DT induction

Different trees can be built for the same data, depending on the order of features:

t?

cold hot

Walk: 2,4 day?

SatTue

Wed

Drive: 1 Walk: 3 Drive: 5

Mo, Thu, Fr, Su

?Drive

Page 29: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Top-down DT induction

Different trees can be built for the same data, depending on the order of features:

t?

cold hot

Walk: 2,4 day?

Sat Tue WedDrive: 1 Walk: 3 Drive: 5

Mo

Drive:?

clothing

casualhalloween

walk:?

Page 30: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Selecting features

• Intuitively–We want more “informative” features to be

higher in the tree:• Is it Monday? Is it raining? Good political news? No

halloween cloths? Hat on? Coat on? Car keys? Yes?? -> Driving! (doesn't look as a good learning job)

–We want a nice compact tree..

Page 31: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Selecting features-2

• Formally– Define “tree size” (number of nodes, leaves;

depth,..)– Try all the possible trees, find the smallest one– NP-hard

• Top-down DT induction – greedy search, depends on heuristics for feature ordering (=> no guarantee)– Information gain

Page 32: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Entropy

Information theory: entropy – number of bits needed to encode some information.

S – set of N examples: p*N positive (“Walk”) and q*N negative (“Drive”)

Entropy(S)= -p*lg(p) – q*lg(q)

p=1, q=0 => Entropy(S)=0p=1/2, q=1/2 => Entropy(S)=1

Page 33: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Entropy and Decision Trees

keys?

no yes

Walk: 2,4 Drive: 1,3,5

E(S)=-0.6*lg(0.6)-0.4*lg(0.4)= 0.97

E(Sno)=0 E(Skeys)=0

Page 34: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Entropy and Decision Trees

t?

cold hot

Walk: 2,4 Drive: 1,5Walk: 3

E(S)=-0.6*lg(0.6)-0.4*lg(0.4)= 0.97

E(Scold)=0 E(Shot)=-0.33*lg(0.33)-0.66*lg(0.66)= 0.92

Page 35: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Information gain

• For each feature f, compute the reduction in entropy on the split:

Gain(S,f)=E(S)-∑(Entropy(Si)*|Si|/|S|)

f=keys? : Gain(S,f)=0.97f=t?: Gain(S,f)=0.97-0*2/5-0.92*3/5=0.42f=clothing?: Gain(S,f)= ?

Page 36: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Conquer-and-divide with Information gain

• Batch learning (read all the input data, compute information gain based on all the examples simultaneously)

• Greedy search, may find local optima• Outputs a single solution• Optimizes depth

Page 37: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Complexity

• Worst case: build a complete tree– Compute gains on all nodes: at level i, we have

already examined i features; m-i remaining.

• In practice: tree is rarely complete, linear on number of features, number of examples (== very fast)

Page 38: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Overfitting

• Suppose we build a very complex tree.. Is it good?

• Last lecture: we measure the quality (“goodness”) of the prediction, not the performance on the training data

• Why can complex trees yield mistakes:– Noise in the data– Even without noise, solutions at the last levels are

based on too few observations

Page 39: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Overfitting

Mo: Walk (50 observations), Drive (5)Tue: Walk (40), Drive (3)We: Drive (1)Thu: Walk (42), Drive (14)Fri: Walk (50)Sa: Drive (20), Walk (20)Su: Drive (10)

• Can we conclude that “We->Drive”?

Page 40: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Overfitting

• A hypothesis H is said to overfit the training data if there exist another hypothesis H' such that:Error(H, train data) <= Error (H', train data)Error(H, unseen data) > Error (H', unseen data)

• Overfitting is related to hypothesis complexity: a more complex hypothesis (e.g., a larger decision tree) overfits more

Page 41: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Overfitting Prevention for DT: Pruning

• “prune” a complex tree: produce a smaller tree that is less accurate on the training dataOriginal tree: ...Mo: hot->drive (2), cold -> walk (100)Pruned tree: .. Mo-> walk (100/2)

• post-/pre- pruning

Page 42: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

Pruning criteria

• Cross-validation– Reserve some training data to evaluate the utility

of the subtrees

• Statistical tests: use a test to determine whether observations at given level can be random

• MDL (minimum description length): compare the added complexity against memorizing exceptions

Page 43: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

DT: issues

• Splitting criteria– Information gain: split at features with many values

• Non-discrete features• Non-discrete outputs (“regression trees”)• Costs• Missing values• Incremental learning• Memory issues

Page 44: INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio Machine Learning: Decision Trees

ACKNOWLEDGMENTS

• Some of the slides from– Ray Mooney’s Utexas ML course–MIT Open Course Ware AI course