machine learning decision trees. exercise solutions

17
Machine Learning Decision Trees. Exercise Solutions

Upload: lynn-shields

Post on 15-Jan-2016

261 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Machine Learning Decision Trees. Exercise Solutions

Machine Learning

Decision Trees.

Exercise Solutions

Page 2: Machine Learning Decision Trees. Exercise Solutions

Exercise 1

a) Machine learning methods are often categorised in three main types: supervised, unsupervised and reinforcement learning methods. Explain these in not more than a sentence each and explain in which category does Decision Tree Learning fall and why?

Page 3: Machine Learning Decision Trees. Exercise Solutions

Answer Supervised learning is learning with a teacher, i.e.

input-output examples are given to the system in the training phase. After training the system is asked to predict the output from new inputs. E.g. classification

Unsupervised learning is in fact learning for structure discovery with no teacher. Only input data are seen in both the training and the testing phase. E.g. ICA, clustering.

Reinforcement learning is learning with no teacher but with feedback from the environment. The feedback consists of rewards, which are typically delayed. E.g. Q-learning.

Decision Trees are supervised learning methods.They do classification based on given examples.

Page 4: Machine Learning Decision Trees. Exercise Solutions

c) For the sunbathers example given in the lecture, calculate the Disorder function for the attribute ‘height’ at the root node.

Page 5: Machine Learning Decision Trees. Exercise Solutions

Disorder of height

Height

is_sunburned

TallAverageShort

Alex Annie Katie

Sarah Emily John

Dana Pete

)( shortshort SDS

S)( av

av SDS

S

Page 6: Machine Learning Decision Trees. Exercise Solutions

Disorder of height (contd)

Alex Annie Katie

Sarah Emily John

)2,1(83

D )1,2(83

D

918.0

32

log32

31

log31

322.3

32

log32

31

log31

)2,1(

1010

22

D

69.0)2,1(8

3*2 DheightofDisorderAverage

Page 7: Machine Learning Decision Trees. Exercise Solutions

Exercise 2

For the sunbathers example given in the lecture, calculate the Disorder function associated with the possible branches of the decision tree once the root node (hair colour) has been chosen.

Page 8: Machine Learning Decision Trees. Exercise Solutions

Answer: 1st branch

Sarah AnnieDana Katie

Hair colour

is_sunburned

Blonde

Height Weight Lotion used

ShortAverage Tall

SarahAnnie Katie

Dana Sarah Katie

AnnieDana

AverageLight No Yes

Sarah Annie

Dana Katie

0.5 1.0 0

Page 9: Machine Learning Decision Trees. Exercise Solutions

So in this branch (1st branch) we found the “Lotion Used” is the next attribute to split on

We also found that by doing that this branch is done.

The method of computation for the other 2 branches (red and brown) is exactly the same.

Page 10: Machine Learning Decision Trees. Exercise Solutions

Exercise 3

Using the decision tree learning algorithm, calculate the decision tree for the following data set

Page 11: Machine Learning Decision Trees. Exercise Solutions

Data for Exercise 3

Name Hair Height Weight Lotion Result

Sarah Blonde Average Light No Sunburned

Dana Blonde Tall Average Yes None

Alex Brown Short Average Yes None

Annie Blonde Short Average No Sunburned

Julie Blonde Average Light No None

Pete Brown Tall Heavy No None

John Brown Average Heavy No None

Ruth Blonde Average Light No None

Page 12: Machine Learning Decision Trees. Exercise Solutions

Ex 3: Search for Root. Candidate: Hair Colour

Hair colour

Blonde Brown

)( blondeblonde SD

S

S)( brown

brown SDS

S

Sarah AnnieDana Julie Ruth

Alex Pete John

is_sunburned

)3,2(85

D 0

Av Disorder = (5/8)* 0.971 = 0.6069

Page 13: Machine Learning Decision Trees. Exercise Solutions

Height

is_sunburned

TallAverageShort

Alex Annie

Sarah Julie John Ruth

Dana Pete

)( shortshort SDS

S)( av

av SDS

S

)1,1(82

D )3,1(84

D 0

Av Disorder = ¼ + 1/2 * 0.8113 + 0 = 0.655

)( talltall SDS

S

Ex 3: Search for Root. Candidate: Height

Page 14: Machine Learning Decision Trees. Exercise Solutions

Weight

is_sunburned

HeavyAverageLight

Dana Alex Annie

Sarah Julie Ruth

Pete John

)( lightlight

SDS

S )( avav SDS

S

)2,1(8

3D )2,1(

8

3D 0

Av Disorder = 2*(3/8)*0.9183 = 0.6887

)( heavyheavy

SDS

S

Ex 3: Search for Root. Candidate: Weight

Page 15: Machine Learning Decision Trees. Exercise Solutions

Lotion used

is_sunburned

YesNo

Dana Alex

Sarah Annie Julie Pete John Ruth

)( nono SDS

S

)4,2(86

D 0

Av Disorder =(3/4)*0.9183 = 0.6887

)( yesyes

SDS

S

Ex 3: Search for Root. Candidate: Lotion

Page 16: Machine Learning Decision Trees. Exercise Solutions

Ex 3: Next

)2,1(5

3D

Dana

Hair colour

Blonde Brown

Sarah AnnieDana Julie Ruth

No

is_sunburned

Height Weight Lotion used

???

Short Av Tall Light Av Heavy No Yes

Annie Sarah Julie Ruth

Dana Sarah Julie Ruth

Dana Annie

No Sarah Annie Julie Ruth

)2,1(5

3D )1,1(

5

2D

)2,2(54

D

Page 17: Machine Learning Decision Trees. Exercise Solutions

Ex 3: NextHair colour

Blonde Brown

No

is_sunburned

HeightShort

AvTall

Yes No

Sarah Julie Ruth

No further split will improve the classification accuracy on the training data. We can assign a decision to this leaf node based on the majority. That gives a ‘No’.