homework 1: solutions
DESCRIPTION
CS4445/B12 Provided by: Kenneth J. Loomis. Homework 1: Solutions. Entropy of the original set. Entropy (target attribute). Determine the root node attribute. genre. =comedy. =drama. =action. .6935. Determine the root node attribute. c ritics-reviews. =thumbs-down. =neutral. - PowerPoint PPT PresentationTRANSCRIPT
Entropy of the original set
genre critics-reviews rating IMAX likescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE nocomedy neutral R FALSE noaction thumbs-down PG-13 TRUE noaction neutral R TRUE nocomedy thumbs-down PG-13 FALSE yescomedy neutral PG-13 TRUE yesdrama thumbs-up R FALSE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yesdrama thumbs-up PG-13 FALSE yesaction neutral R FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral PG-13 FALSE yes
Entropy (target attribute)
Determine the root node attribute
.6935
genre
=comedy =drama =action
genre critics-reviews rating IMAX likesaction thumbs-down PG-13 TRUE noaction neutral R TRUE noaction neutral R FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral PG-13 FALSE yescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE nocomedy neutral R FALSE nocomedy thumbs-down PG-13 FALSE yescomedy neutral PG-13 TRUE yesdrama thumbs-up R FALSE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yesdrama thumbs-up PG-13 FALSE yes
Determine the root node attribute
genre critics-reviews rating IMAX likesaction neutral R TRUE nocomedy neutral R FALSE noaction neutral R FALSE yesaction neutral PG-13 FALSE yescomedy neutral PG-13 TRUE yesdrama neutral R TRUE yesaction thumbs-down PG-13 TRUE noaction thumbs-down PG-13 FALSE yescomedy thumbs-down PG-13 FALSE yesdrama thumbs-down PG-13 TRUE yescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE nodrama thumbs-up R FALSE yesdrama thumbs-up PG-13 FALSE yes
.9111
critics-
reviews
=thumbs-up=neutral =thumbs-down
Determine the root node attribute
genre critics-reviews rating IMAX likesaction thumbs-down PG-13 TRUE noaction neutral PG-13 FALSE yescomedy neutral PG-13 TRUE yesaction thumbs-down PG-13 FALSE yescomedy thumbs-down PG-13 FALSE yesdrama thumbs-down PG-13 TRUE yesdrama thumbs-up PG-13 FALSE yesaction neutral R TRUE nocomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE noaction neutral R FALSE yesdrama neutral R TRUE yesdrama thumbs-up R FALSE yes
.7885
rating
=PG-13=R
Determine the root node attribute
genre critics-reviews rating IMAX likescomedy neutral R FALSE nocomedy thumbs-up R FALSE noaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yescomedy thumbs-down PG-13 FALSE yesdrama thumbs-up PG-13 FALSE yesaction neutral R FALSE yesdrama thumbs-up R FALSE yesaction thumbs-down PG-13 TRUE noaction neutral R TRUE nocomedy thumbs-up R TRUE nocomedy neutral PG-13 TRUE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yes
.8922
IMAX
=FALSE=TRUE
Determine the root node attribute
.6935.9111.7885.8922
genre
=comedy =drama =action
• We can see that genre provides us with the lowest entropy, thus it becomes the root node of our ID3 tree.
Determine the left child attribute
genre
=comedy =drama =action
Options:critics-reviewsratingIMAX
?
We now move on to the left child node of our tree. What attribute do we choose for this node?
Determine the left child attribute
genre
=comedy =drama =actioncritics
-reviews
=thumbs-up=neutral =thumbs-down
genre critics-reviews rating IMAX likescomedy neutral R FALSE nocomedy neutral PG-13 TRUE yescomedy thumbs-down PG-13 FALSE yescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE no
.4000
Determine the left child attribute
genre
=comedy =drama =action
rating
=R=PG-13
genre critics-reviews rating IMAX likescomedy neutral PG-13 TRUE yescomedy thumbs-down PG-13 FALSE yescomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE no
Determine the left child attribute
genre
=comedy =drama =action
IMAX
=R=PG-13
genre critics-reviews rating IMAX likescomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-down PG-13 FALSE yescomedy thumbs-up R TRUE nocomedy neutral PG-13 TRUE yes
Determine the left child attribute
genre
=comedy =drama =action
rating
=R=PG-13
.4000• We can see that rating provides us with the lowest entropy, thus it becomes the left child node of our ID3 tree.
Determine the left child attribute
genre
=comedy =drama =action
rating
=R=PG-13
• This also makes this split homogeneous so we can add our leaf nodes here.
[yes] [no]
genre critics-reviews rating IMAX likescomedy neutral PG-13 TRUE yescomedy thumbs-down PG-13 FALSE yescomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE no
Determine the center child attribute
genre
=comedy =drama =action
rating =R=PG-13
• We can see that genre = drama provides us with a homogeneous sub-set, so we can provide a leaf node here.
[yes]
genre critics-reviews rating IMAX likesdrama thumbs-up R FALSE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yesdrama thumbs-up PG-13 FALSE yes
[yes] [no]
Determine the right child attribute
genre
=comedy =drama =action
rating =R=PG-13
We now move on to the right child node of our tree. What attribute do we choose for this node?Options:critics-reviewsratingIMAX
?
[yes] [no]
[yes]
Determine the right child attribute
genre
=comedy =drama =action
rating =R=PG-13
Critics-
reviews =thumbs-up=neutral
=thumbs-down
genre critics-reviews rating IMAX likesaction neutral R TRUE noaction neutral R FALSE yesaction neutral PG-13 FALSE yesaction thumbs-down PG-13 TRUE noaction thumbs-down PG-13 FALSE yes
[yes] [no]
[yes]
Determine the right child attribute
genre
=comedy =drama =action
rating =R=PG-13
rating=R=PG-13
genre critics-reviews rating IMAX likesaction thumbs-down PG-13 TRUE noaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral R TRUE noaction neutral R FALSE yes
[yes] [no]
[yes]
Determine the right child attribute
genre
=comedy =drama =action
rating =R=PG-13
IMAX=TRUE=FALSE
genre critics-reviews rating IMAX likesaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral R FALSE yesaction thumbs-down PG-13 TRUE noaction neutral R TRUE no
[yes] [no]
[yes]
Determine the right child attribute
genre
=comedy =drama =action
rating =R=PG-13
IMAX=TRUE=FALSE
Entropy (critics-reviews) = .9510 = .9510Entropy (IMAX) = 0.0• We can see that IMAX provides us with the lowest entropy, thus it becomes the right child node of our ID3 tree.
[yes] [no]
[yes]
Determine the right child attribute
genre
=comedy =drama =action
rating =R=PG-13
IMAX=TRUE=FALSE
• This also makes this split homogeneous so we can add our leaf nodes here.genre critics-reviews rating IMAX likesaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral R FALSE yesaction thumbs-down PG-13 TRUE noaction neutral R TRUE no
[yes] [no]
[yes]
[yes] [no]
ID3 Decision tree is complete
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
• Since we have only leaf nodes remaining we are finished building our tree.
[yes] [no]
[yes]
[yes] [no]
Handling missing values during prediction
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
• How can we handle missing values using this decision tree?
• Given an instance:• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ? How do we classify it?
[yes] [no]
[yes]
[yes] [no]
Handling missing values during prediction: a solution
• Consider adding frequency counts to each leaf node:shown here in curly braces.genr
e=comedy =drama
=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}
Handling missing values during prediction: a solution
• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?
• Traverse the tree.
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}
Handling missing values during prediction: a solution
• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?
• Traverse the decision tree normally when the attribute value is known.
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}
Handling missing values during prediction: a solution
• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?
• Traverse every possible path when a missing value is encountered.
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}
Handling missing values during prediction: a solution
• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?
• Traverse every possible path when a missing value is encountered.• Sum the frequency counts of all like leaf nodes that are reached:
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}
Handling missing values during prediction: a solution
• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?• like = yes
• Follow every possible path when a missing value is encountered.• Determine the frequency count by summing like classification frequencies:
• Classify based on the highest frequency count.
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}
Handling missing values during prediction: 2nd example
• Genre = ?• Critics-reviews = ?• Rating = R• IMAX = TRUE• like = no
• Consider this 2nd example:
genre
=comedy =drama=action
rating =R=PG-13
IMAX=TRUE=FALSE
[yes] {2} [no] {3}
[yes] {4}
[yes] {3} [no] {2}