machine learning - school of computingzhe/pdf/lec-3-decision-trees-representation... · constructa...

38
1 Decision Trees: Representation Machine Learning Spring 2018

Upload: others

Post on 19-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

1

Decision Trees: Representation

MachineLearningFall2017

SupervisedLearning:TheSetup

1

Machine LearningSpring 2018

Page 2: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Last Lecture: Supervised Learning Settings

2

1. What is our instance space?What are the inputs to the problem? What are the features?

2. What is our label space?What is the prediction task?

3. What is our hypothesis space?What functions should the learning algorithm search over?

4. What is our learning algorithm?How do we learn from the labeled data?

5. What is our loss function or evaluation metric?What is success?

Formulation

Implementation

Page 3: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Coming up… (the rest of the semester)

Different hypothesis spaces and learning algorithms– Decision trees and the ID3 algorithm– Linear classifiers

• Perceptron• Winnow• SVM• Logistic regression

– Combining multiple classifiers• Boosting, bagging

– Non-linear classifiers– Nearest neighbors

3

Page 4: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Coming up… (the rest of the semester)

Different hypothesis spaces and learning algorithms– Decision trees and the ID3 algorithm– Linear classifiers

• Perceptron• Winnow• SVM• Logistic regression

– Combining multiple classifiers• Boosting, bagging

– Non-linear classifiers– Nearest neighbors

4

Important issues to consider

1. What do these hypotheses represent?

2. Implicit assumptions and tradeoffs

3. Generalization?

4. How do we learn?

Page 5: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

This lecture: Learning Decision Trees

1. Representation : What are decision trees?

2. Algorithm: Learning decision trees– The ID3 algorithm: A greedy heuristic

3. Some extensions

5

Page 6: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

This lecture: Learning Decision Trees

1. Representation : What are decision trees?

2. Algorithm: Learning decision trees– The ID3 algorithm: A greedy heuristic

3. Some extensions

6

Page 7: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different attributes

Name LabelClaire Cardie -Peter Bartlett +Eric Baum +Haym Hirsh +Shai Ben-David +Michael I. Jordan -

7

Page 8: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different features/attributes

NameSpecial

character in name?

Second character of first name

Length of first

name>5?Gender Label

Claire Cardie No l Yes Female -Peter Bartlett No e No Male +Eric Baum No r No Male +Haym Hirsh No a No Male +Shai Ben-David

Yes h No Male +

Michael I. Jordan

Yes i Yes Male -8

Page 9: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different attributes

NameSpecial

character in name?

Second character of first name

Length of first

name>5?Gender Label

Claire Cardie No l Yes Female -Peter Bartlett No e No Male +Eric Baum No r No Male -Haym Hirsh No a No Male +Shai Ben-David

Yes h No Male -

Michael I. Jordan

Yes i Yes Male +

With these four attributes, how many unique rows are possible? 2¢ 26¢ 26¢ 2 = 2704

If there are 100 attributes, all binary, how many unique rows are possible?2100

9

Page 10: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different attributes

NameSpecial

character in name?

Second character of first name

Length of first

name>5?Gender Label

Claire Cardie No l Yes Female -Peter Bartlett No e No Male +Eric Baum No r No Male -Haym Hirsh No a No Male +Shai Ben-David

Yes h No Male -

Michael I. Jordan

Yes i Yes Male +

With these four attributes, how many unique rows are possible? 2 × 26 ×2 ×2 = 208

If there are 100 attributes, all binary, how many unique rows are possible?2100

10

Page 11: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different attributes

NameSpecial

character in name?

Second character of first name

Length of first

name>5?Gender Label

Claire Cardie No l Yes Female -Peter Bartlett No e No Male +Eric Baum No r No Male -Haym Hirsh No a No Male +Shai Ben-David

Yes h No Male -

Michael I. Jordan

Yes i Yes Male +

With these four attributes, how many unique rows are possible? 2 × 26 ×2 ×2 = 208

If there are 100 attributes, all binary, how many unique rows are possible?2100

11

Page 12: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different attributes

NameSpecial

character in name?

Second character of first name

Length of first

name>5?Gender Label

Claire Cardie No l Yes Female -Peter Bartlett No e No Male +Eric Baum No r No Male -Haym Hirsh No a No Male +Shai Ben-David

Yes h No Male -

Michael I. Jordan

Yes i Yes Male +

With these four attributes, how many unique rows are possible? 2 × 26 ×2 ×2 = 208

If there are 100 attributes, all binary, how many unique rows are possible?2100

12

Page 13: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Representing data

Data can be represented as a big table, with columns denoting different attributes

NameSpecial

character in name?

Second character of first name

Length of first

name>5?Gender Label

Claire Cardie No l Yes Female -Peter Bartlett No e No Male +Eric Baum No r No Male -Haym Hirsh No a No Male +Shai Ben-David

Yes h No Male -

Michael I. Jordan

Yes i Yes Male +

With these four attributes, how many unique rows are possible? 2 × 26 ×2 ×2 = 208

If there are 100 attributes, all binary, how many unique rows are possible for a function ?2100

13

We need to figure out how to represent in a better, more efficient way

Page 14: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

What are decision trees?

A hierarchical data structure that represents data using a divide-and-conquer strategy

Can be used as flexible hypothesis class for classification or regression

General idea: Given a collection of labeled examples, construct a decision tree that represents it

14

Page 15: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

What are decision trees?

• Decision trees are a family of classifiers for instances that are represented by collections of attributes (i.e. features)

• Nodes are tests for feature values

• There is one branch for every value that the feature can take

• Leaves of the tree specify the class labels

15

Page 16: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

Label=ALabel=C Label=B

16

Page 17: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

Label=ALabel=C Label=B

17

Before building a decision tree:

What is the label for a red triangle? And why?

Page 18: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?

18

Label=ALabel=C Label=B

Page 19: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape

19

Label=ALabel=C Label=B

Page 20: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape Color?

20

Label=ALabel=C Label=B

Page 21: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape Color?

Blue Red Green

21

Label=ALabel=C Label=B

Page 22: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape Color?

Blue Red Green

B

22

Label=ALabel=C Label=B

Page 23: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape Color?

Blue Red Green

B

squaretriangle circle

CAB

Shape?

23

Label=ALabel=C Label=B

Page 24: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape Color?

Shape?circlesquare

AB

Blue Red Green

B

squaretriangle circle

CAB

Shape?

24

Label=ALabel=C Label=B

Page 25: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Let’s build a decision tree for classifying shapes

What are some attributes of the examples?Color, Shape

Label=ALabel=C Label=B

Color?

Shape?circlesquare

AB

Blue Red Green

squaretriangle circle

CAB

Shape?

1. How to use a decision tree for prediction? • What is the label for a red triangle?

• Just follow a path from the root to a leaf

• What about a green triangle?

25

B

Page 26: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Expressivity of Decision trees

What Boolean functions can decision trees represent?

Every path from the tree to a root is a rule

The full tree is equivalent to the conjunction of all the rules

26

Page 27: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Expressivity of Decision trees

What Boolean functions can decision trees represent?

(Color=blue AND Shape=triangle ) Label=B) AND(Color=blue AND Shape=square ) Label=A) AND(Color=blue AND Shape=circle ) Label=C) AND….

Every path from the tree to a root is a rule

The full tree is equivalent to the conjunction of all the rules

Any Boolean function can be represented as a decision tree.

27

Page 28: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Expressivity of Decision trees

What Boolean functions can decision trees represent?

Any Boolean function can be represented as a decision tree.

28

Why?

Page 29: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Decision Trees

• Outputs are discrete categories

• But real valued outputs are also possible (regression trees)

• Methods for handling noisy data (noise in the label or in the features) and for handling missing attributes– Pruning trees helps with noise– More on this later…

29

Page 30: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.)– Values have been categorical

• How do we deal with numeric feature values? (eg length = ?)– Discretize them or use thresholds on the numeric values– This example divides the feature space into axis parallel rectangles

30

Page 31: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.)– Values have been categorical

• How do we deal with numeric feature values? (eg length = ?)– Discretize them or use thresholds on the numeric values– This example divides the feature space into axis parallel rectangles

31

Page 32: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.)– Values have been categorical

• How do we deal with numeric feature values? (eg length = ?)– Discretize them or use thresholds on the numeric values– This example divides the feature space into axis parallel rectangles

32

Page 33: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.)– Values have been categorical

• How do we deal with numeric feature values? (eg length = ?)– Discretize them or use thresholds on the numeric values– This example divides the feature space into axis parallel rectangles

1 3 X

7

5

Y

- +

+ +

+ +

-

-

+

33

Page 34: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.)

– Values have been categorical

• How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values– This example divides the feature space into axis parallel rectangles

1 3 X

7

5

Y

- +

+ +

+ +

-

-

+

34

X<3

Y<5

no yes

Y>7yesno

X < 1

no yes

- + ++ -

yesno

Page 35: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Numeric attributes and decision boundaries

• We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.)

– Values have been categorical

• How do we deal with numeric feature values? (eg length = ?)

– Discretize them or use thresholds on the numeric values– This example divides the feature space into axis parallel rectangles

1 3 X

7

5

Y

- +

+ +

+ +

-

-

+Decision boundaries can be non-linear

35

X<3

Y<5

no yes

Y>7yesno

X < 1

no yes

- + ++ -

yesno

Page 36: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Summary: Decision trees

• Decision trees can represent any Boolean function• A way to represent lot of data• A natural representation (think 20 questions)• Predicting with a decision tree is easy

• Clearly, given a dataset, there are many decision trees that can represent it. Why?

• Learning a good representation from data is the next question

36

Page 37: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Summary: Decision trees

• Decision trees can represent any Boolean function• A way to represent lot of data• A natural representation (think 20 questions)• Predicting with a decision tree is easy

• Clearly, given a dataset, there are many decision trees that can represent it. Why?

• Learning a good representation from data is the next question

37

Page 38: Machine Learning - School of Computingzhe/pdf/lec-3-decision-trees-representation... · constructa decision tree that represents it 14. What are decision trees? • Decision trees

Exercise

Write down the decision tree for the shapes data if the root node was Shape instead of ColorWill the two trees make the same predictions for unseen shapes/color combinations?

38

Label=ALabel=C Label=B