seem4630 2013-2014 tutorial 2 classification: decision tree, naïve bayes & k-nn wentao tian,...
TRANSCRIPT
![Page 1: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/1.jpg)
SEEM4630 2013-2014 Tutorial 2
Classification:
Decision tree, Naïve Bayes & k-NN
Wentao TIAN, [email protected]
![Page 2: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/2.jpg)
Given a collection of records (training set ), each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other attributes. Decision tree Naïve bayes k-NN
Goal: previously unseen records should be assigned a class as accurately as possible.
2
Classification: Definition
![Page 3: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/3.jpg)
GoalConstruct a tree so that instances belonging to
different classes should be separated Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive manner
At start, all the training examples are at the rootTest attributes are selected on the basis of a
heuristics or statistical measure (e.g., information gain)
Examples are partitioned recursively based on selected attributes 3
Decision Tree
![Page 4: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/4.jpg)
4
Attribute Selection Measure 1: Information Gain
Let pi be the probability that a tuple belongs to class Ci, estimated by |Ci,D|/|D|
Expected information (entropy) needed to classify a tuple in D:
Information needed (after using A to split D into v partitions) to classify D:
Information gained by branching on attribute A
)(||
||)(
1j
v
j
jA DInfo
D
DDInfo
(D)InfoInfo(D)Gain(A) A
)(log)( 21
i
m
ii ppDInfo
![Page 5: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/5.jpg)
5
Attribute Selection Measure 2: Gain Ratio
Information gain measure is biased towards attributes with a large number of values
C4.5 (a successor of ID3) uses gain ratio to overcome the problem (normalization to information gain)
GainRatio(A) = Gain(A)/SplitInfo(A)
)||
||(log
||
||)( 2
1 D
D
D
DDSplitInfo j
v
j
jA
![Page 6: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/6.jpg)
6
Attribute Selection Measure 3: Gini index
If a data set D contains examples from n classes, gini index, gini(D) is defined as
where pj is the relative frequency of class j in D
If a data set D is split on A into two subsets D1
and D2, the gini index gini(D) is defined as
Reduction in Impurity:
n
jp jDgini
1
21)(
)(||||)(
||||)( 2
21
1 DginiDD
DginiDDDginiA
)()()( DginiDginiAginiA
![Page 7: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/7.jpg)
ExampleOutlook Temperature Humidity Wind Play Tennis
Sunny >25 High Weak No
Sunny >25 High Strong No
Overcast >25 High Weak Yes
Rain 15-25 High Weak Yes
Rain <15 Normal Weak Yes
Rain <15 Normal Strong No
Overcast <15 Normal Strong Yes
Sunny 15-25 High Weak No
Sunny <15 Normal Weak Yes
Rain 15-25 Normal Weak Yes
Sunny 15-25 Normal Strong Yes
Overcast 15-25 High Strong Yes
Overcast >25 Normal Weak Yes
Rain 15-25 High Strong No
7
![Page 8: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/8.jpg)
8
Tree induction example
S[9+, 5-] Outlook
Sunny [2+,3-] Overcast [4+,0-] Rain [3+,2-]
Gain(Outlook) = 0.94 – 5/14[-2/5(log2(2/5))-3/5(log2(3/5))] – 4/14[-4/4(log2(4/4))-0/4(log2(0/4))] – 5/14[-3/5(log2(3/5))-2/5(log2(2/5))] = 0.94 – 0.69 = 0.25
Entropy of data S
Split data by attribute Outlook
Info(S) = -9/14(log2(9/14))-5/14(log2(5/14)) = 0.94
![Page 9: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/9.jpg)
9
Tree induction example
S[9+, 5-] Temperature
<15 [3+,1-]15-25 [5+,1-]>25 [2+,2-]
Gain(Temperature) = 0.94 – 4/14[-3/4(log2(3/4))-1/4(log2(1/4))] – 6/14[-5/6(log2(5/6))-1/6(log2(1/6))] – 4/14[-2/4(log2(2/4))-2/4(log2(2/4))] = 0.94 – 0.80 = 0.14
Split data by attribute Temperature
![Page 10: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/10.jpg)
10
S[9+, 5-] Wind
Weak [6+, 2-]
Strong [3+, 3-]
Gain(Humidity) = 0.94 – 7/14[-3/7(log2(3/7))-4/7(log2(4/7))] – 7/14[-6/7(log2(6/7))-1/7(log2(1/7))] = 0.94 – 0.79 = 0.15
Gain(Wind) = 0.94 – 8/14[-6/8(log2(6/8))-2/8(log2(2/8))] – 6/14[-3/6(log2(3/6))-3/6(log2(3/6))] = 0.94 – 0.89 = 0.05
Split data by attribute Humidity
Split data by attribute Wind
Tree induction example
S[9+, 5-] Humidity
High [3+,4-]
Normal [6+, 1-]
![Page 11: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/11.jpg)
11
Outlook
Yes?? ??
Overcast
Sunny Rain
Gain(Outlook) = 0.25Gain(Temperature)=0.14Gain(Humidity) = 0.15Gain(Wind) = 0.05
NoWeakHigh>25Sunny
NoStrongHigh>25Sunny
YesWeakHigh>25Overcast
YesWeakHigh15-25Rain
YesWeakNormal<15Rain
NoStrongNormal<15Rain
YesStrongNormal<15Overcast
NoWeakHigh15-25Sunny
YesWeakNormal<15Sunny
YesWeakNormal15-25Rain
YesStrongNormal15-25Sunny
YesStrongHigh15-25Overcast
YesWeakNormal>25Overcast
NoStrongHigh15-25Rain
Play Tennis
WindHumidity
Temperature
Outlook
Tree induction example
![Page 12: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/12.jpg)
12
Sunny[2+, 3-] Wind
Weak [1+, 2-]
Strong [1+, 1-]
Gain(Humidity) = 0.97 – 3/5[-0/3(log2(0/3))-3/3(log2(3/3))] – 2/5[-2/2(log2(2/2))-0/2(log2(0/2))]= 0.97 – 0 = 0.97
Gain(Wind)= 0.97 – 3/5[-1/3(log2(1/3))-2/3(log2(2/3))] – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))]= 0.97 – 0.95= 0.02
Entropy of branch Sunny
Split Sunny branch by attribute Temperature
Split Sunny branch by attribute Humidity
Split Sunny branch by attribute Wind
Info(Sunny) = -2/5(log2(2/5))-3/5(log2(3/5)) = 0.97
Sunny[2+,3-] Temperature
<15 [1+,0-]
15-25 [1+,1-]>25 [0+,2-]
Gain(Temperature) = 0.97 – 1/5[-1/1(log2(1/1))-0/1(log2(0/1))] – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]= 0.97 – 0.4 = 0.57
Sunny[2+,3-] Humidity
High [0+,3-]
Normal [2+, 0-]
![Page 13: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/13.jpg)
13
Outlook
YesHumidity ??
YesNo
High
Sunny Rain
Normal
Overcast
Tree induction example
![Page 14: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/14.jpg)
Gain(Humidity) = 0.97 – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))]= 0.97 – 0.95 = 0.02
Gain(Wind) = 0.97 – 3/5[-3/3(log2(3/3))-0/3(log2(0/3))] – 2/5[-0/2(log2(0/2))-2/2(log2(2/2))]= 0.97 – 0 = 0.97
Entropy of branch Rain
Split Rain branch by attribute Temperature
Split Rain branch by attribute Humidity
Split Rain branch by attribute Wind
14
Info(Rain) = -3/5(log2(3/5))-2/5(log2(2/5)) = 0.97
Rain[3+,2-] Temperature
<15 [1+,1-]
15-25 [2+,1-]>25 [0+,0-]
Gain(Outlook) = 0.97 – 2/5[-1/2(log2(1/2))-1/2(log2(1/2))] – 3/5[-2/3(log2(2/3))-1/3(log2(1/3))] – 0/5[-0/0(log2(0/0))-0/0(log2(0/0))]= 0.97 – 0.95 = 0.02
Rain[3+,2-] Wind
Weak [3+, 0-]
Strong [0+, 2-]
Rain[3+,2-] Humidity
High [1+,1-]
Normal [2+, 1-]
![Page 15: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/15.jpg)
15
Outlook
YesHumidity Wind
YesNo
NormalHigh
NoYes
StrongWeak
OvercastSunny Rain
![Page 16: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/16.jpg)
16
Bayesian Classification A statistical classifier: performs probabilistic
prediction, i.e., predicts class membership probabilities
where xi is the value of attribute
Ai
Choose the class label that has the highest probability Foundation: Based on Bayes’ Theorem.
posteriori probability
prior probability
likelihood
),...,,|( 21 ni xxxCP
),...,,|( 21 ni xxxCP
)|,...,,( 21 in CxxxP
)( iCP
),...,,(
)()|,...,,(),...,,|(
21
2121
n
iinni xxxP
CPCxxxPxxxCP
Model: compute
from data
![Page 17: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/17.jpg)
)|,...,,( 21 in CxxxP
17
Naïve Bayes Classifier Problem: joint probabilities are difficult to estimate
Naïve Bayes Classifier Assumption: attributes are conditionally independent
)|()|()|,...,,( 121 iniin CxPCxPCxxxP
11 2
1 2
( | ) ( )( | , ,..., )
( , ,..., )
n
j i iji n
n
P x C P CP C x x x
P x x x
![Page 18: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/18.jpg)
A B C
m b t
m s t
g q t
h s t
g q t
g q f
g s f
h b F
h q f
m b f
18
Example: Naïve Bayes Classifier
P(C=t) = 1/2 P(C=f) = 1/2
P(A=m|C=t) = 2/5P(A=m|C=f) = 1/5P(B=q|C=t) = 2/5P(B=q|C=f) = 2/5
Test Record: A=m, B=q, C=?
![Page 19: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/19.jpg)
For C = tP(A=m|C=t) * P(B=q|C=t) * P(C=t) = 2/5 * 2/5 *
1/2 = 2/25
P(C=t|A=m, B=q) = (2/25) / P(A=m, B=q)
For C = fP(A=m|C=f) * P(B=q|C=f) * P(C=f) = 1/5 * 2/5 *
1/2 = 1/25
P(C=t|A=m, B=q) = (1/25) / P(A=m, B=q)
Conclusion: A=m, B=q, C=t19
Example: Naïve Bayes Classifier
Higher!
![Page 20: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/20.jpg)
InputA set of stored recordsk: # of nearest neighbors
OutputCompute distance: Identify k nearest neighborsDetermine the class label of unknown record based on
class labels of nearest neighbors (i.e. by taking majority vote)
20
Nearest Neighbor Classification
i ii
qpqpd 2)(),(
![Page 21: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/21.jpg)
21
Nearest Neighbor ClassificationInput Given 8 training
instancesP1 (4, 2) OrangeP2 (0.5, 2.5) OrangeP3 (2.5, 2.5) OrangeP4 (3, 3.5) OrangeP5 (5.5, 3.5) OrangeP6 (2, 4) BlackP7 (4, 5) BlackP8 (2.5, 5.5) Black k = 1 & k = 3
New Instance:Pn (4, 4) ?
Calculate the distances:
d(P1, Pn) = d(P2, Pn) = 3.80d(P3, Pn) = 2.12d(P4, Pn) = 1.12d(P5, Pn) = 1.58d(P6, Pn) = 2d(P7, Pn) = 1d(P8, Pn) = 2.12
A Discrete Example
2)42()44( 22
![Page 22: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/22.jpg)
22
Nearest Neighbor Classification
k = 1
P1P2 P3
P4 P5P6
P7P8
Pn
P1P2 P3
P4 P5
P6
P7P8
Pn
k = 3
![Page 23: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/23.jpg)
Scaling issuesAttributes may have to be scaled to
prevent distance measures from being dominated by one of the attributes• Each attribute must follow in the same range• Min-Max normalization
Example:• Two data records: a = (1, 1000), b = (0.5, 1)• dis(a, b) = ?
23
Nearest Neighbor Classification…
![Page 24: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/24.jpg)
Two Types of Learning MethodologiesLazy Learning
• Instance-based learning. (k-NN)Eager Learning
• Decision-tree and Bayesian classification.• ANN & SVM
24
Classification: Lazy & Eager Learning
P1P2 P3
P4 P5
P6
P7P8
Pn
P1P2 P3
P4 P5
P6
P7P8
Pn
![Page 25: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/25.jpg)
Lazy Learninga. Do not require model buildingb. Less time training but more time predictingc. Lazy method effectively uses a richer
hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function
Eager Learninga. Require model buildingb. More time training but less time predictingc. Must commit to a single hypothesis that
covers the entire instance space
25
Differences Between Lazy &Eager Learning
![Page 26: SEEM4630 2013-2014 Tutorial 2 Classification: Decision tree, Naïve Bayes & k-NN Wentao TIAN, wttian@se.cuhk.edu.hk](https://reader038.vdocuments.mx/reader038/viewer/2022110207/56649d845503460f94a6ac8d/html5/thumbnails/26.jpg)
Thank you & Questions?
26