homework 1: solutions

30
Homework 1: Solutions CS4445/B12 Provided by: Kenneth J. Loomis

Upload: kiaria

Post on 22-Feb-2016

112 views

Category:

Documents


0 download

DESCRIPTION

CS4445/B12 Provided by: Kenneth J. Loomis. Homework 1: Solutions. Entropy of the original set. Entropy (target attribute). Determine the root node attribute. genre. =comedy. =drama. =action. .6935. Determine the root node attribute. c ritics-reviews. =thumbs-down. =neutral. - PowerPoint PPT Presentation

TRANSCRIPT

Homework 1: Solutions

CS4445/B12Provided by: Kenneth J. Loomis

Entropy of the original set

genre critics-reviews rating IMAX likescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE nocomedy neutral R FALSE noaction thumbs-down PG-13 TRUE noaction neutral R TRUE nocomedy thumbs-down PG-13 FALSE yescomedy neutral PG-13 TRUE yesdrama thumbs-up R FALSE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yesdrama thumbs-up PG-13 FALSE yesaction neutral R FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral PG-13 FALSE yes

Entropy (target attribute)

Determine the root node attribute

.6935

genre

=comedy =drama =action

genre critics-reviews rating IMAX likesaction thumbs-down PG-13 TRUE noaction neutral R TRUE noaction neutral R FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral PG-13 FALSE yescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE nocomedy neutral R FALSE nocomedy thumbs-down PG-13 FALSE yescomedy neutral PG-13 TRUE yesdrama thumbs-up R FALSE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yesdrama thumbs-up PG-13 FALSE yes

Determine the root node attribute

genre critics-reviews rating IMAX likesaction neutral R TRUE nocomedy neutral R FALSE noaction neutral R FALSE yesaction neutral PG-13 FALSE yescomedy neutral PG-13 TRUE yesdrama neutral R TRUE yesaction thumbs-down PG-13 TRUE noaction thumbs-down PG-13 FALSE yescomedy thumbs-down PG-13 FALSE yesdrama thumbs-down PG-13 TRUE yescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE nodrama thumbs-up R FALSE yesdrama thumbs-up PG-13 FALSE yes

.9111

critics-

reviews

=thumbs-up=neutral =thumbs-down

Determine the root node attribute

genre critics-reviews rating IMAX likesaction thumbs-down PG-13 TRUE noaction neutral PG-13 FALSE yescomedy neutral PG-13 TRUE yesaction thumbs-down PG-13 FALSE yescomedy thumbs-down PG-13 FALSE yesdrama thumbs-down PG-13 TRUE yesdrama thumbs-up PG-13 FALSE yesaction neutral R TRUE nocomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE noaction neutral R FALSE yesdrama neutral R TRUE yesdrama thumbs-up R FALSE yes

.7885

rating

=PG-13=R

Determine the root node attribute

genre critics-reviews rating IMAX likescomedy neutral R FALSE nocomedy thumbs-up R FALSE noaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yescomedy thumbs-down PG-13 FALSE yesdrama thumbs-up PG-13 FALSE yesaction neutral R FALSE yesdrama thumbs-up R FALSE yesaction thumbs-down PG-13 TRUE noaction neutral R TRUE nocomedy thumbs-up R TRUE nocomedy neutral PG-13 TRUE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yes

.8922

IMAX

=FALSE=TRUE

Determine the root node attribute

.6935.9111.7885.8922

genre

=comedy =drama =action

• We can see that genre provides us with the lowest entropy, thus it becomes the root node of our ID3 tree.

Determine the left child attribute

genre

=comedy =drama =action

Options:critics-reviewsratingIMAX

?

We now move on to the left child node of our tree. What attribute do we choose for this node?

Determine the left child attribute

genre

=comedy =drama =actioncritics

-reviews

=thumbs-up=neutral =thumbs-down

genre critics-reviews rating IMAX likescomedy neutral R FALSE nocomedy neutral PG-13 TRUE yescomedy thumbs-down PG-13 FALSE yescomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE no

.4000

Determine the left child attribute

genre

=comedy =drama =action

rating

=R=PG-13

genre critics-reviews rating IMAX likescomedy neutral PG-13 TRUE yescomedy thumbs-down PG-13 FALSE yescomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE no

Determine the left child attribute

genre

=comedy =drama =action

IMAX

=R=PG-13

genre critics-reviews rating IMAX likescomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-down PG-13 FALSE yescomedy thumbs-up R TRUE nocomedy neutral PG-13 TRUE yes

Determine the left child attribute

genre

=comedy =drama =action

rating

=R=PG-13

.4000• We can see that rating provides us with the lowest entropy, thus it becomes the left child node of our ID3 tree.

Determine the left child attribute

genre

=comedy =drama =action

rating

=R=PG-13

• This also makes this split homogeneous so we can add our leaf nodes here.

[yes] [no]

genre critics-reviews rating IMAX likescomedy neutral PG-13 TRUE yescomedy thumbs-down PG-13 FALSE yescomedy neutral R FALSE nocomedy thumbs-up R FALSE nocomedy thumbs-up R TRUE no

Determine the center child attribute

genre

=comedy =drama =action

rating =R=PG-13

• We can see that genre = drama provides us with a homogeneous sub-set, so we can provide a leaf node here.

[yes]

genre critics-reviews rating IMAX likesdrama thumbs-up R FALSE yesdrama thumbs-down PG-13 TRUE yesdrama neutral R TRUE yesdrama thumbs-up PG-13 FALSE yes

[yes] [no]

Determine the right child attribute

genre

=comedy =drama =action

rating =R=PG-13

We now move on to the right child node of our tree. What attribute do we choose for this node?Options:critics-reviewsratingIMAX

?

[yes] [no]

[yes]

Determine the right child attribute

genre

=comedy =drama =action

rating =R=PG-13

Critics-

reviews =thumbs-up=neutral

=thumbs-down

genre critics-reviews rating IMAX likesaction neutral R TRUE noaction neutral R FALSE yesaction neutral PG-13 FALSE yesaction thumbs-down PG-13 TRUE noaction thumbs-down PG-13 FALSE yes

[yes] [no]

[yes]

Determine the right child attribute

genre

=comedy =drama =action

rating =R=PG-13

rating=R=PG-13

genre critics-reviews rating IMAX likesaction thumbs-down PG-13 TRUE noaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral R TRUE noaction neutral R FALSE yes

[yes] [no]

[yes]

Determine the right child attribute

genre

=comedy =drama =action

rating =R=PG-13

IMAX=TRUE=FALSE

genre critics-reviews rating IMAX likesaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral R FALSE yesaction thumbs-down PG-13 TRUE noaction neutral R TRUE no

[yes] [no]

[yes]

Determine the right child attribute

genre

=comedy =drama =action

rating =R=PG-13

IMAX=TRUE=FALSE

Entropy (critics-reviews) = .9510 = .9510Entropy (IMAX) = 0.0• We can see that IMAX provides us with the lowest entropy, thus it becomes the right child node of our ID3 tree.

[yes] [no]

[yes]

Determine the right child attribute

genre

=comedy =drama =action

rating =R=PG-13

IMAX=TRUE=FALSE

• This also makes this split homogeneous so we can add our leaf nodes here.genre critics-reviews rating IMAX likesaction neutral PG-13 FALSE yesaction thumbs-down PG-13 FALSE yesaction neutral R FALSE yesaction thumbs-down PG-13 TRUE noaction neutral R TRUE no

[yes] [no]

[yes]

[yes] [no]

ID3 Decision tree is complete

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

• Since we have only leaf nodes remaining we are finished building our tree.

[yes] [no]

[yes]

[yes] [no]

Handling missing values during prediction

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

• How can we handle missing values using this decision tree?

• Given an instance:• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ? How do we classify it?

[yes] [no]

[yes]

[yes] [no]

Handling missing values during prediction: a solution

• Consider adding frequency counts to each leaf node:shown here in curly braces.genr

e=comedy =drama

=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: a solution

• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?

• Traverse the tree.

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: a solution

• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?

• Traverse the decision tree normally when the attribute value is known.

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: a solution

• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?

• Traverse every possible path when a missing value is encountered.

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: a solution

• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?

• Traverse every possible path when a missing value is encountered.• Sum the frequency counts of all like leaf nodes that are reached:

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: a solution

• Genre = action• Critics-reviews = ?• Rating = R• IMAX = ?• like = yes

• Follow every possible path when a missing value is encountered.• Determine the frequency count by summing like classification frequencies:

• Classify based on the highest frequency count.

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: 2nd example

• Genre = ?• Critics-reviews = ?• Rating = R• IMAX = TRUE• like = no

• Consider this 2nd example:

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

Handling missing values during prediction: 3rd example

• Genre = ?• Critics-reviews = ?• Rating = ?• IMAX = ?• likes = yes

genre

=comedy =drama=action

rating =R=PG-13

IMAX=TRUE=FALSE

[yes] {2} [no] {3}

[yes] {4}

[yes] {3} [no] {2}

• Consider if all attribute values are unknown: