machine learning decision trees. exercise solutions
TRANSCRIPT
Machine Learning
Decision Trees.
Exercise Solutions
Exercise 1
a) Machine learning methods are often categorised in three main types: supervised, unsupervised and reinforcement learning methods. Explain these in not more than a sentence each and explain in which category does Decision Tree Learning fall and why?
Answer Supervised learning is learning with a teacher, i.e.
input-output examples are given to the system in the training phase. After training the system is asked to predict the output from new inputs. E.g. classification
Unsupervised learning is in fact learning for structure discovery with no teacher. Only input data are seen in both the training and the testing phase. E.g. ICA, clustering.
Reinforcement learning is learning with no teacher but with feedback from the environment. The feedback consists of rewards, which are typically delayed. E.g. Q-learning.
Decision Trees are supervised learning methods.They do classification based on given examples.
c) For the sunbathers example given in the lecture, calculate the Disorder function for the attribute ‘height’ at the root node.
Disorder of height
Height
is_sunburned
TallAverageShort
Alex Annie Katie
Sarah Emily John
Dana Pete
)( shortshort SDS
S)( av
av SDS
S
Disorder of height (contd)
Alex Annie Katie
Sarah Emily John
)2,1(83
D )1,2(83
D
918.0
32
log32
31
log31
322.3
32
log32
31
log31
)2,1(
1010
22
D
69.0)2,1(8
3*2 DheightofDisorderAverage
Exercise 2
For the sunbathers example given in the lecture, calculate the Disorder function associated with the possible branches of the decision tree once the root node (hair colour) has been chosen.
Answer: 1st branch
Sarah AnnieDana Katie
Hair colour
is_sunburned
Blonde
Height Weight Lotion used
ShortAverage Tall
SarahAnnie Katie
Dana Sarah Katie
AnnieDana
AverageLight No Yes
Sarah Annie
Dana Katie
0.5 1.0 0
So in this branch (1st branch) we found the “Lotion Used” is the next attribute to split on
We also found that by doing that this branch is done.
The method of computation for the other 2 branches (red and brown) is exactly the same.
Exercise 3
Using the decision tree learning algorithm, calculate the decision tree for the following data set
Data for Exercise 3
Name Hair Height Weight Lotion Result
Sarah Blonde Average Light No Sunburned
Dana Blonde Tall Average Yes None
Alex Brown Short Average Yes None
Annie Blonde Short Average No Sunburned
Julie Blonde Average Light No None
Pete Brown Tall Heavy No None
John Brown Average Heavy No None
Ruth Blonde Average Light No None
Ex 3: Search for Root. Candidate: Hair Colour
Hair colour
Blonde Brown
)( blondeblonde SD
S
S)( brown
brown SDS
S
Sarah AnnieDana Julie Ruth
Alex Pete John
is_sunburned
)3,2(85
D 0
Av Disorder = (5/8)* 0.971 = 0.6069
Height
is_sunburned
TallAverageShort
Alex Annie
Sarah Julie John Ruth
Dana Pete
)( shortshort SDS
S)( av
av SDS
S
)1,1(82
D )3,1(84
D 0
Av Disorder = ¼ + 1/2 * 0.8113 + 0 = 0.655
)( talltall SDS
S
Ex 3: Search for Root. Candidate: Height
Weight
is_sunburned
HeavyAverageLight
Dana Alex Annie
Sarah Julie Ruth
Pete John
)( lightlight
SDS
S )( avav SDS
S
)2,1(8
3D )2,1(
8
3D 0
Av Disorder = 2*(3/8)*0.9183 = 0.6887
)( heavyheavy
SDS
S
Ex 3: Search for Root. Candidate: Weight
Lotion used
is_sunburned
YesNo
Dana Alex
Sarah Annie Julie Pete John Ruth
)( nono SDS
S
)4,2(86
D 0
Av Disorder =(3/4)*0.9183 = 0.6887
)( yesyes
SDS
S
Ex 3: Search for Root. Candidate: Lotion
Ex 3: Next
)2,1(5
3D
Dana
Hair colour
Blonde Brown
Sarah AnnieDana Julie Ruth
No
is_sunburned
Height Weight Lotion used
???
Short Av Tall Light Av Heavy No Yes
Annie Sarah Julie Ruth
Dana Sarah Julie Ruth
Dana Annie
No Sarah Annie Julie Ruth
)2,1(5
3D )1,1(
5
2D
)2,2(54
D
Ex 3: NextHair colour
Blonde Brown
No
is_sunburned
HeightShort
AvTall
Yes No
Sarah Julie Ruth
No further split will improve the classification accuracy on the training data. We can assign a decision to this leaf node based on the majority. That gives a ‘No’.