may 20081 id3 and decision tree by tuan nguyen. may 2008 2 id3 algorithm is the algorithm to...

13
May 2008 1 ID3 and Decision tree by Tuan Nguyen

Upload: gervase-norris

Post on 15-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 2008 1

ID3 and Decision tree

byTuan Nguyen

Page 2: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20082

ID3 algorithm

Is the algorithm to construct a decision tree Using Entropy to generate the information gain The best value then be selected

ID3 and Decision tree

Page 3: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20083

Entropy

The complete formula for entropy is:

E(S) = -(p+)*log2(p+ ) - (p_ )*log2(p_ )

Where p+ is the positive samples Where p_ is the negative samples Where S is the sample of attributions

ID3 and Decision tree

Page 4: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20084

ExampleThe Entropy of A1 is computed as the following:

A1=?

True

False

[21+, 5-]

[8+, 30-]

[29+,35-]

E(A) = -29/(29+35)*log2(29/(29+35)) – 35/(35+29)log2(35/(35+29)) = 0.9937

E(TRUE) = - 21/(21+5)*log2(21/(21+5)) – 5/(5+21)*log2(5/(5+21))

= 0.7960

E(FALSE) = -8/(8+30)*log2(8/(8+30)) – 30/(30+8)*log2(30/(30+8))

= 0.7426

• The Entropy of True:

• The Entropy of False:

ID3 and Decision tree

Page 5: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20085

Information Gain Gain (Sample, Attributes) or Gain (S,A) is expected

reduction in entropy due to sorting S on attribute A

So, for the previous example, the Information gain is calculated:

G(A1) = E(A1) - (21+5)/(29+35) * E(TRUE)

- (8+30)/(29+35) * E(FALSE)

= E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE)

= 0.9937– 26/64 * 0.796 – 38/64* 0.7426 = 0.5465

ID3 and Decision tree

Gain(S,A) = Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Page 6: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20086

The complete example

Day Outlook Temp. Humidity Wind Play Tennis

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Weak Yes

D8 Sunny Mild High Weak No

D9 Sunny Cold Normal Weak Yes

D10 Rain Mild Normal Strong Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

ID3 and Decision tree

Consider the following table

Page 7: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20087

Decision tree

We want to build a decision tree for the tennis matches

The schedule of matches depend on the weather (Outlook, Temperature, Humidity, and Wind)

So to apply what we know to build a decision tree based on this table

ID3 and Decision tree

Page 8: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20088

Example

Calculating the information gains for each of the weather attributes: For the Wind For the Humidity For the Outlook

ID3 and Decision tree

Page 9: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 20089

For the Wind

Wind

Weak

Strong[6+, 2-] [3+, 3-]

S=[9+,5-]E=0.940

Gain(S,Wind):

=0.940 - (8/14)*0.811 - (6/14)*1.0=0.048

ID3 and Decision tree

Page 10: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 200810

For the Humidity

Humidity

High Normal

[6+, 1-]

S=[9+,5-]E=0.940

Gain(S,Humidity)

=0.940-(7/14)*0.985 – (7/14)*0.592=0.151

[3+, 4-]

ID3 and Decision tree

Page 11: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 200811

For the Outlook

Outlook

Sunny Rain

[2+, 3-] [3+, 2-]

S=[9+,5-]E=0.940

E=0.971

E=0.971

Overcast

[4+, 0]

E=0.0

Gain(S,Outlook)

=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971=0.247

ID3 and Decision tree

Page 12: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 200812

Complete tree

Then here is the complete tree:

Outlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

[D3,D7,D12,D13]

[D8,D9,D11] [D6,D14][D1,D2]

ID3 and Decision tree

Page 13: May 20081 ID3 and Decision tree by Tuan Nguyen. May 2008 2 ID3 algorithm Is the algorithm to construct a decision tree Using Entropy to generate the information

May 200813

Reference:

Dr. Lee’s Slides, San Jose State University, Spring 2007

"Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996

"Incremental Induction of Decision Trees", by Paul E. Utgoff, Kluwer Academic Publishers, 1989

http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm

http://decisiontrees.net/node/27