may 20081 id3 and decision tree by tuan nguyen. may 2008 2 id3 algorithm is the algorithm to...
TRANSCRIPT
May 2008 1
ID3 and Decision tree
byTuan Nguyen
May 20082
ID3 algorithm
Is the algorithm to construct a decision tree Using Entropy to generate the information gain The best value then be selected
ID3 and Decision tree
May 20083
Entropy
The complete formula for entropy is:
E(S) = -(p+)*log2(p+ ) - (p_ )*log2(p_ )
Where p+ is the positive samples Where p_ is the negative samples Where S is the sample of attributions
ID3 and Decision tree
May 20084
ExampleThe Entropy of A1 is computed as the following:
A1=?
True
False
[21+, 5-]
[8+, 30-]
[29+,35-]
E(A) = -29/(29+35)*log2(29/(29+35)) – 35/(35+29)log2(35/(35+29)) = 0.9937
E(TRUE) = - 21/(21+5)*log2(21/(21+5)) – 5/(5+21)*log2(5/(5+21))
= 0.7960
E(FALSE) = -8/(8+30)*log2(8/(8+30)) – 30/(30+8)*log2(30/(30+8))
= 0.7426
• The Entropy of True:
• The Entropy of False:
ID3 and Decision tree
May 20085
Information Gain Gain (Sample, Attributes) or Gain (S,A) is expected
reduction in entropy due to sorting S on attribute A
So, for the previous example, the Information gain is calculated:
G(A1) = E(A1) - (21+5)/(29+35) * E(TRUE)
- (8+30)/(29+35) * E(FALSE)
= E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE)
= 0.9937– 26/64 * 0.796 – 38/64* 0.7426 = 0.5465
ID3 and Decision tree
Gain(S,A) = Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
May 20086
The complete example
Day Outlook Temp. Humidity Wind Play Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
ID3 and Decision tree
Consider the following table
May 20087
Decision tree
We want to build a decision tree for the tennis matches
The schedule of matches depend on the weather (Outlook, Temperature, Humidity, and Wind)
So to apply what we know to build a decision tree based on this table
ID3 and Decision tree
May 20088
Example
Calculating the information gains for each of the weather attributes: For the Wind For the Humidity For the Outlook
ID3 and Decision tree
May 20089
For the Wind
Wind
Weak
Strong[6+, 2-] [3+, 3-]
S=[9+,5-]E=0.940
Gain(S,Wind):
=0.940 - (8/14)*0.811 - (6/14)*1.0=0.048
ID3 and Decision tree
May 200810
For the Humidity
Humidity
High Normal
[6+, 1-]
S=[9+,5-]E=0.940
Gain(S,Humidity)
=0.940-(7/14)*0.985 – (7/14)*0.592=0.151
[3+, 4-]
ID3 and Decision tree
May 200811
For the Outlook
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]E=0.940
E=0.971
E=0.971
Overcast
[4+, 0]
E=0.0
Gain(S,Outlook)
=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971=0.247
ID3 and Decision tree
May 200812
Complete tree
Then here is the complete tree:
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14][D1,D2]
ID3 and Decision tree
May 200813
Reference:
Dr. Lee’s Slides, San Jose State University, Spring 2007
"Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996
"Incremental Induction of Decision Trees", by Paul E. Utgoff, Kluwer Academic Publishers, 1989
http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm
http://decisiontrees.net/node/27