decision trees in the big picture classification (vs. rule pattern discovery) supervised learning...
Post on 19-Dec-2015
231 views
TRANSCRIPT
![Page 1: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/1.jpg)
Decision Trees in the Big Picture
• Classification (vs. Rule Pattern Discovery)• Supervised Learning (vs. Unsupervised)• Inductive• Generation (vs. Discrimination)
![Page 2: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/2.jpg)
Example age income veteran
college_educated
support_hillary
youth low no no noyouth low yes no nomiddle_aged low no no yessenior low no no yessenior medium no yes nosenior medium yes no yesmiddle_aged medium no yes noyouth low no yes noyouth low no yes nosenior high no yes yesyouth low no no nomiddle_aged high no yes nomiddle_aged medium yes yes yessenior high no yes no
![Page 3: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/3.jpg)
Example age income veteran
college_educated
support_ hillary
youth low no no noyouth low yes no nomiddle_aged low no no yessenior low no no yessenior medium no yes nosenior medium yes no yesmiddle_aged medium no yes noyouth low no yes noyouth low no yes nosenior high no yes yesyouth low no no nomiddle_aged high no yes nomiddle_aged medium yes yes yessenior high no yes no
Class-labels
![Page 4: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/4.jpg)
Exampleage income veteran
college_educated
support_hillary
middle_aged medium no no ?????
no
ageyouth middle_aged
college_educated
income yes
yes
low medium high
no
senior
no yes
noyes
Inner nodes are ATTRIBUTES
Branches are attribute VALUES
Leaves are class-label VALUES
![Page 5: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/5.jpg)
Exampleage income veteran
college_educated
support_hillary
middle_aged medium no no yes (predicted)
no
ageyouth middle_aged
college_educated
income yes
yes
low medium high
no
senior
no yes
noyes
Inner nodes are ATTRIBUTES
Branches are attribute VALUES
Leaves are class-label VALUES
ANSWER
![Page 6: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/6.jpg)
Example
no
ageyouth middle_aged
college_educated
income yes
yes
low medium high
no
senior
no yes
noyes
Induced Rules:
The youth do not support Hillary.
All who are middle-aged and low-income support Hillary.
Seniors support Hillary.
Etc…A rule is generated for each leaf.
![Page 7: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/7.jpg)
ExampleInduced Rules:
The youth do not support Hillary.
All who are middle-aged and low-income support Hillary.
Seniors support Hillary.
Nested IF-THEN:
IF age == youthTHEN support_hillary = no
ELSE IF age == middle_aged & income == lowTHEN support_hillary = yes
ELSE IF age = seniorTHEN support_hillary = yes
![Page 8: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/8.jpg)
How do you construct one?
1. Select an attribute to place at the root node and make one branch for each possible value.
14 tuples; Entire Training Set
5 tuples 4 tuples 5 tuples
age
youth middle_aged senior
![Page 9: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/9.jpg)
How do you construct one?
2. For each branch, recursively process the remaining training examples by choosing an attribute to split them on. The chosen attribute cannot be one used in the ancestor nodes. If at anytime all the training examples have the same class, stop processing that part of the tree.
![Page 10: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/10.jpg)
How do you construct one?age=youth Income veteran
college_educated
support_ hillary
youth low no no noyouth low yes no noyouth low no yes noyouth low no yes noyouth low no no no
no
age
youthmiddle_aged senior
![Page 11: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/11.jpg)
How do you construct one?age=middle_aged income veteran
college_educated
supports_ hillary
middle_aged low no no yesmiddle_aged medium no yes nomiddle_aged high no yes nomiddle_aged medium yes yes yes
no veteran
age
youthmiddle_aged senior
yes no
![Page 12: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/12.jpg)
no veteran
age
youthmiddle_aged senior
yes
yes no
age=middle_aged income veteran
college_educated
supports_hillary
middle_aged low no no yesmiddle_aged medium no yes nomiddle_aged high no yes nomiddle_aged medium yes yes yes
![Page 13: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/13.jpg)
no veteran
age
youthmiddle_aged senior
yes
yes no
age=middle_aged income veteran
college_educated
supports_hillary
middle_aged low no no yesmiddle_aged medium no yes nomiddle_aged high no yes nomiddle_aged medium yes yes yes
college_educated
yes no
![Page 14: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/14.jpg)
age=middle_aged income veteran=no
college_educated
supports_hillary
middle_aged low no no yesmiddle_aged medium no yes nomiddle_aged high no yes no
no veteran
age
youth middle_aged
yes
yes no
college_educated
yes no
senior
no
![Page 15: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/15.jpg)
age=middle_aged income veteran=no
college_educated
supports_hillary
middle_aged low no no yesmiddle_aged medium no yes nomiddle_aged high no yes no
no veteran
age
youth middle_aged
yes
yes no
college_educated
yes no
senior
no yes
![Page 16: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/16.jpg)
no veteran
ageyouth
middle_aged
yes
yes no
college_educated
yes no
senior
no yes
age=senior income veterancollege_educated
supports_ hillary
senior low no no yessenior medium no yes nosenior medium yes no yessenior high no yes yessenior high no yes no
![Page 17: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/17.jpg)
no veteran
ageyouth
middle_aged
yes
yes no
college_educated
yes no
senior
no yes
age=senior income veterancollege_educated
supports_ hillary
senior low no no yessenior medium no yes nosenior medium yes no yessenior high no yes yessenior high no yes no
college_educated
yes no
![Page 18: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/18.jpg)
no veteran
age
youth middle_aged
yes
yes no
college_educated
yes no
senior
no yes
college_educated
yes no
age=senior income veterancollege_educated=yes
supports_hillary
senior medium no yes nosenior high no yes yessenior high no yes no
income
low medium high
![Page 19: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/19.jpg)
no veteran
age
youth middle_aged
yes
yes no
college_educated
yes no
senior
no yes
college_educated
yes no
age=senior income veterancollege_educated=yes
supports_hillary
senior medium no yes nosenior high no yes yessenior high no yes no
income
low medium high
No low-income college-educated seniors…
![Page 20: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/20.jpg)
no veteran
age
youth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
age=senior income veterancollege_educated=yes
supports_hillary
senior medium no yes nosenior high no yes yessenior high no yes no
income
low medium high
No low-income college-educated seniors…
no
no
“Majority Vote”
![Page 21: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/21.jpg)
no veteran
age
youth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
age=seniorincome=medium veteran
college_educated=yes
supports_hillary
senior medium no yes no
income
low medium high
no
no
![Page 22: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/22.jpg)
no veteran
age
youth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
age=seniorincome=medium veteran
college_educated=yes
supports_hillary
senior medium no yes no
income
low medium high
no
no
no
![Page 23: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/23.jpg)
no veteran
age
youth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
income
low medium high
no
no
no
age=senior income=high veterancollege_educated=yes
supports_hillary
senior high no yes yessenior high no yes no
![Page 24: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/24.jpg)
no veteran
ageyouth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
income
low medium high
nono
no
age=senior income=high veterancollege_educated=yes
supports_hillary
senior high no yes yessenior high no yes no
veteran
yes no
![Page 25: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/25.jpg)
no veteran
ageyouth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
income
low medium high
nono
no
age=senior income=high veterancollege_educated=yes
supports_hillary
senior high no yes yessenior high no yes no
veteran
yes no
“Majority Vote” split…No Veterans
??? ???
![Page 26: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/26.jpg)
no veteran
ageyouth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
income
low medium high
nono
no veteran
yes no
??? ???
age=senior income veterancollege_educated=no
supports_hillary
senior low no no yessenior medium yes no yes
![Page 27: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/27.jpg)
no veteran
ageyouth middle_aged
yes
yes nocollege_educated
yes
senior
no yes
college_educated
yes no
income
low medium high
nono
no veteran
yes no
??? ???
age=senior income veterancollege_educated=no
supports_hillary
senior low no no yessenior medium yes no yes
yes
![Page 28: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/28.jpg)
no veteran
ageyouth
middle_aged
yes
yes no
college_educated
yes
senior
no yes
college_educated
yes no
income
lowmedium
high
no
no
no veteran
yes no
??? ???
yes
![Page 29: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/29.jpg)
Cost to grow?
n = number of AttributesD = Training Set of tuples
O( n * |D| * log|D| )
![Page 30: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/30.jpg)
Cost to grow?
n = number of AttributesD = Training Set of tuples
O( n * |D| * log|D| )
Amount of work at each tree level
Max height of the tree
![Page 31: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/31.jpg)
How do we minimize the cost?
• Optimal decision trees are NP-complete (shown by Hyafil and Rivest)
![Page 32: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/32.jpg)
How do we minimize the cost?
• Optimal decision trees are NP-complete (shown by Hyafil and Rivest)
• Need Heuristic to pick “best” attribute to split on.
![Page 33: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/33.jpg)
no veteran
ageyouth
middle_aged
yes
yes no
college_educated
yes
senior
no yes
college_educated
yes no
income
lowmedium
high
no
no
no veteran
yes no
??? ???
yes
![Page 34: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/34.jpg)
How do we minimize the cost?
• Optimal decision trees are NP-complete (shown by Hyafil and Rivest)
• Most common approach is “greedy”• Need Heuristic to pick “best” attribute to split
on.• “Best” attribute results in “purest” split Pure = all tuples belong to the same class
![Page 35: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/35.jpg)
….A good split increase purity of all children nodes
![Page 36: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/36.jpg)
Three Heuristics
1. Information gain
2. Gain Ratio
3. Gini Index
![Page 37: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/37.jpg)
Information Gain
• Ross Quinlan’s ID3 (iterative dichotomizer 3rd) uses info gain as its heuristic.
• Heuristic based on Claude Shannon’s information theory.
![Page 38: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/38.jpg)
HIGHENTROPY
LOWENTROPY
![Page 39: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/39.jpg)
Calculate Entropy for DD = Training Set D=14m = num. of classes m=2i = 1,…,mCi = distinct class C1 = yes, C2 = no
Ci,D = tuples in D of class Ci C1,D = yes, C2,D = no
pi = prob. a random tuple in p1 = 5/14, p2 = 9/14
D belongs to class Ci
=|Ci,D|/|D|
![Page 40: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/40.jpg)
= -[ 5/14 * log(5/14) + 9/14 * log(9/14)]= -[ .3571 * -1.4854 + .6428 * -.6374] = -[ -.5304 + -.4097] = .9400 bits
Extremes: = -[ 7/14 * log(7/14) + 7/14 * log(7/14)] = 1 bit
= -[ 1/14 * log(1/14) + 13/14 * log(13/14)] = .3712 bits
= -[ 0/14 * log(0/14) + 14/14 * log(14/14)] = 0 bits
![Page 41: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/41.jpg)
Entropy for D split by AA = attribute to split D on E.g. agev = distinct values of A E.g. youth,
middle_aged, seniorj = 1,…,vDj = subset of D where A=j E.g. All
tuples where age=youth
![Page 42: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/42.jpg)
Entropyage (D)= 5/14 * -[0/5*log(0/5) + 5/5*log(5/5)]
+ 4/14 * -[2/4*log(2/4) + 2/4*log(2/4)] + 5/14 * -[3/5*log(3/5) + 2/5*log(2/5)]
= .6324 bits
Entropyincome (D)= 7/14 * -[2/7*log(2/7) + 5/7*log(5/7)]
+ 4/14 * -[2/4*log(2/4) + 2/4*log(2/4)] + 3/14 * -[1/3*log(1/3) + 2/3*log(2/3)]
= .9140 bits
Entropyveteran (D)= 3/14 * -[2/3*log(2/3) + 1/3*log(1/3)]
+ 11/14 * -[3/11*log(3/11) + 8/11*log(8/11)]
= .8609 bits
Entropycollege_educated (D)= 8/14 * -[6/8*log(6/8) + 2/8*log(2/8)]
+ 6/14 * -[3/6*log(3/6) + 3/6*log(3/6)]
= .8921 bits
![Page 43: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/43.jpg)
Information Gain
Gain(A) = Entropy(D) - EntropyA (D)
Set of tuples D Subset of D split on attribute A
Choose the A with the highest Gain. decreases Entropy
![Page 44: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/44.jpg)
Gain(A) = Entropy(D) - EntropyA (D)
Gain(age) = Entropy(D) - Entropyage (D)
= .9400 - .6324 = .3076 bits
Gain(income) = .0259 bits
Gain(veteran) = .0790 bits
Gain(college_educated) = .0479 bits
![Page 45: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/45.jpg)
Entropy with values >2
Entropy = -[7/13*log(7/13) + 2/13*log(2/13) + 2/13*log(2/13) + 2/13*log(2/13)] = 1.7272 bits
Entropy = -[5/13*log(5/13) + 1/13*log(1/13) + 6/13*log(6/13) + 1/13*log(1/13)] = 1.6143 bits
![Page 46: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/46.jpg)
ss age income veterancollege_educated
support_hillary
215-98-9343 youth low no no no238-34-3493 youth low yes no no234-28-2434 middle_aged low no no yes243-24-2343 senior low no no yes634-35-2345 senior medium no yes no553-32-2323 senior medium yes no yes554-23-4324 middle_aged medium no yes no523-43-2343 youth low no yes no553-23-1223 youth low no yes no344-23-2321 senior high no yes yes212-23-1232 youth low no no no112-12-4521 middle_aged high no yes no423-13-3425 middle_aged medium yes yes yes423-53-4817 senior high no yes no
Added social security number attribute
![Page 47: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/47.jpg)
ss no
yes
yesnononoyes
no
yes
yesno
no
no
no
215-98-9343……..423-53-4817
Will Information Gain split on ss?
![Page 48: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/48.jpg)
ss no
yes
yesnononoyes
no
yes
yesno
no
no
no
215-98-9343……..423-53-4817
Will Information Gain split on ss?
Yes, because Entropyss (D) = 0. *Entropyss (D) = 1/14 * -14[1/1*log(1/1) + 0/1*log(0/1)]
![Page 49: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/49.jpg)
Gain ratio
• C4.5, a successor of ID3, uses this heuristic.
• Attempts to overcome Information Gain’s bias in favor of attributes with large number of values.
![Page 50: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/50.jpg)
Gain ratio
![Page 51: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/51.jpg)
Gain ratio
Gain(ss) = .9400
SplitInfoss (D) = 3.9068
GainRatio(ss) = .2406
Gain(age) = .3076
SplitInfoage (D) = 1.5849
GainRatio(age) = .1940
![Page 52: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/52.jpg)
Gini Index
• CART uses this heuristic.
• Binary splits.
• Not biased toward multi-value attributes like Info Gain.
age
youthmiddle_aged
senior
age
senioryouth, middle_aged
![Page 53: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/53.jpg)
Gini IndexFor the attribute age the possible subsets are:
{youth, middle_aged, senior}, {youth, middle_aged}, {youth, senior},
{middle_aged, senior}, {youth}, {middle_aged}, {senior} and {}.
We exclude the powerset and the empty set.
So we have to examine 2v – 2 subsets.
![Page 54: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/54.jpg)
Gini IndexFor the attribute age the possible subsets are:
{youth, middle_aged, senior}, {youth, middle_aged}, {youth, senior},
{middle_aged, senior}, {youth}, {middle_aged}, {senior} and {}.
We exclude the powerset and the empty set.
So we have to examine 2v – 2 subsets.
CALCULATE GINI INDEX ON EACH SUBSET
![Page 55: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/55.jpg)
Gini Index
![Page 56: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/56.jpg)
Miscellaneous thoughts
• Widely applicable to data exploration, classification and scoring tasks
• Generate understandable rules• Better for predicting discrete outcomes than
continuous (lumpy)• Error-prone when # of training examples for a
class is small• Most business cases trying to predict few broad
categories
![Page 57: Decision Trees in the Big Picture Classification (vs. Rule Pattern Discovery) Supervised Learning (vs. Unsupervised) Inductive Generation (vs. Discrimination)](https://reader036.vdocuments.mx/reader036/viewer/2022062320/56649d2f5503460f94a06fa3/html5/thumbnails/57.jpg)