rule based classification
TRANSCRIPT
-
8/14/2019 Rule Based Classification
1/42
L22/11-09-09 1
Rule based classification
Lecture 21/10-09-09(no class bcoz of placements)
Lecture 22/11-09-09(No class)
Lecture 23/12-09-09(No Class)
Lecture 24/14-09-09
-
8/14/2019 Rule Based Classification
2/42
L22/11-09-09 2
Building Classification Rules Direct Method :
Extract rules directly from data. e.g.: RIPPER, CN2, Holtes 1R
Indirect Method : Extract rules from other classification
models (e.g. decision trees, neuralnetworks, etc).
e.g: C4.5rules
-
8/14/2019 Rule Based Classification
3/42
L22/11-09-09 3
Direct Method: Sequential CoveringAlgorithm
Extracts rules directly from data.
Extraction of rules are done one class at a time if no. of classes are more.
The criterion for selecting which class should be
considered, depends on no. of factors like classprevalence.
-
8/14/2019 Rule Based Classification
4/42
L22/11-09-09 4
Algorithm1: Let E - training Records, A set of attr. value pairs, {(A j,v j)}.
2: Let Y o be an ordered set of classes {y 1,y2,y3-------- ,yk}.3: Let R = { } be initial rule (decision) list.4: for each class y Y o- {yk} do
5: while stopping condition is not met do
6: r Learn-One-Rule (E, A, y).Remove training records from E that are covered by r .Add r to the bottom of the rule list: R R V r .
end while
end for Insert the default rule, { } y k, to the bottom of the rule list R
-
8/14/2019 Rule Based Classification
5/42
L22/11-09-09 5
Sequential Covering Algorithm(in short for your quick reference)
1. Start from an empty rule set.2. Extract a rule using the Learn-One-Rule
function.3. Remove training records covered by the
rule.
4. Repeat Step (2) and (3) until stoppingcriterion is met.
-
8/14/2019 Rule Based Classification
6/42
L22/11-09-09 6
Learn-One-Rule function Objective - extract a rule that covers
maximum of +ve examples and none or fewnegative examples in the training dataset.
Computationally expensive bcoz of exponential size of search space.
Generates an initial rule and keeps ongrowing (refining) it till stopping criteria ismet.
-
8/14/2019 Rule Based Classification
7/42
L22/11-09-09 7
Example of Sequential Covering
(i) Original Data
(ii) Step 1
-
8/14/2019 Rule Based Classification
8/42
L22/11-09-09 8
Example of SequentialCovering
(iii) Step 2
R1
(iv) Step 3
R1
R2
-
8/14/2019 Rule Based Classification
9/42
L22/11-09-09 9
Aspects of Sequential Covering Rule Growing Strategy
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning
-
8/14/2019 Rule Based Classification
10/42
L22/11-09-09 10
Rule Growing Two common strategies :
1. General-to-specific2. Specific-to-general
Tid
1
2
-
8/14/2019 Rule Based Classification
11/42
L22/11-09-09 11
General- to- specific
Rule has poorquality as itcovers allexamples in thetraining set
Conjuncts aresubsequentlyadded toimprove thequality of therule
r: { } y
-
8/14/2019 Rule Based Classification
12/42
L22/11-09-09 12
Specific to general
Refund=No,Status=Single,Income=85K(Class=Yes)
Refund=No,Status=Single,Income=90K(Class=Yes)
Refund=No,
Status = Single(Class = Yes)
(b) Specific-to-general
-
8/14/2019 Rule Based Classification
13/42
L22/11-09-09 13
Body temp=warm-blooded, Skin cover=hair, givesbirth=yes, aquatic creature=no, Aerial creature= no,has legs= yes, hibernates=no => MAMMALS
Body temp=warm-blooded,Skin cover=hair, givesbirth=yes, aquaticcreature=no, Aerial creature=no, has legs= yes => Mammals
Skin cover=hair, gives birth=yes, aquaticcreature=no, Aerial creature= no, haslegs= yes, hibernates=no => MAMMALS
Specific-to-general
One of thepositiveexamples arechosen randomlyas initial seed
One of theconjuncts areremoved so that itcan cover more
+ve examples
-
8/14/2019 Rule Based Classification
14/42
L22/11-09-09 14
Rule Evaluation Suppose a tr set contains 60 +ve and 100 ve
examples. Consider two rules:
R1: covers 50 +ve examples and 5 -ve examples
R2: covers 2 +ve ex and 0 ve ex.
The accuracy for R1 is 90.9% and R2 is 100%.
Still R1 is better bcoz its coverage.Other measures clearly state
-
8/14/2019 Rule Based Classification
15/42
L22/11-09-09 15
Rule Evaluation
Metrics: Accuracy
Likelihood ratio Statistics, Laplace
M-estimate
OR
k n
nc
+
+=
1
k nkpn c
+
+
=
n : total no. of instances
n c : Number of instancescovered by rule
k : Number of classes
p : Prior probability
nn c=
==
k
i iii e f f R 1 )/log(2
mn
pmn c+
+=
-
8/14/2019 Rule Based Classification
16/42
L22/11-09-09 16
FOILs Information gain (Rule EvaluationContd)
r: A + covers p 0 +ve examples and n 0 veexamples
Suppose we add a new conjunct B, theextended rule become
r: A^B + covers p 1 +ve examples and n 1 ve examples
Then FOIL IG= )log(log00
02
11
121 n p
pn p
p p+
+
-
8/14/2019 Rule Based Classification
17/42
L22/11-09-09 17
Q2. Consider two rules: R1:A C R2:A B C
Suppose R1 is covered by 350 +ve examples and 150 ve examples, while R2 is covered by 300 +veexamples and 50 ve examples. Compute the FOILsinformation gain for rule R2 wrt R1.
Q4. Page no. 317
-
8/14/2019 Rule Based Classification
18/42
L22/11-09-09 18
Aspects of Sequential CoveringAlgorithm
Rule Growing Strategy
Rule Evaluation
Stopping Criterion
Rule Pruning
Instance Elimination
-
8/14/2019 Rule Based Classification
19/42
L22/11-09-09 19
Stopping Criterion and RulePruning
Stopping criterion Compute the gain If gain is not significant, discard the new rule
Rule Pruning Remove one of the conjuncts in the rule Compare error rate on validation set before
and after pruning If error improves, prune the conjunct
-
8/14/2019 Rule Based Classification
20/42
L22/11-09-09 20
Aspects of Sequential CoveringAlgorithm
Rule Growing Strategy
Rule Evaluation
Stopping Criterion
Rule Pruning
Instance Elimination
-
8/14/2019 Rule Based Classification
21/42
L22/11-09-09 21
Instance Elimination
Why do we need to eliminateinstances? Otherwise, the next rule is
identical to previous rule class = +
class = -
+
+ +
++
++
+
++
+
+
+
+
+
+
+
-
-
--
--
--
--
-
-
-
-
-
-
-
+
+
++
+
+
+
R1
R3
+
+
Why do we remove +ve and ve instances?-Ensure that the next rule is different-Prevent underestimating the accuracy of rule.
-
8/14/2019 Rule Based Classification
22/42
L22/11-09-09 22
Indirect methods for rule-basedclassifiers and Instance-Based
Classifiers
Lecture 26/17-09-09
-
8/14/2019 Rule Based Classification
23/42
23
Indirect Methods(Generating rule set from Decision tree)
Rule Set
r1: (P=No,Q=No) ==> -r2: (P=No,Q=Yes) ==> +r3: (P=Yes,R=No) ==> +r4: (P=Yes,R=Yes,Q=No) ==r5: (P=Yes,R=Yes,Q=Yes) ==
P
Q R
Q- + +
- +
No No
No
Yes Yes
Yes
No Yes
Consider r2, r3, r5It shows that the class label is always predicted as + whenQ=Yes
So we can say r2: (Q=yes) +
r3: (P=yes) (R=no) +
Simplified rules
-
8/14/2019 Rule Based Classification
24/42
L22/11-09-09 24
Classification rules extracted from DT
C4.5rules:(Give Birth=No, Lives in water=No , Can Fly=Yes) Birds
(Give Birth=No, Live in Water=Yes) Fishes
(Give Birth=Yes) Mammals
(Give Birth=No, Live in Water=No, Can Fly=No,) Reptiles
( ) Amphibians
Give
Birth?
Live InWater?
CanFly?
Mammals
Fishes Amphibians
Birds Reptiles
Yes No
Yes
Sometimes
No
Yes No
-
8/14/2019 Rule Based Classification
25/42
L22/11-09-09 25
Advantages of Rule-BasedClassifiers
As highly expressive as decision trees Easy to interpret
Easy to generate Can classify new instances rapidly Performance comparable to decision trees
-
8/14/2019 Rule Based Classification
26/42
L22/11-09-09 26
InstanceBased Classifiers
-
8/14/2019 Rule Based Classification
27/42
L22/11-09-09 27
Eager learners vs. Lazy learners Eager learners
DT and rule-based classifiers are ex. of eager learners
They are designed to learn a model that maps theinput attr. to the class label as soon as training databecomes available .
Lazy Learners They delay the process of modeling the tr. Data until it
is provided with an unseen instance to be classified. Instance-based classifiers belong to this class. They memorizes the entire tr. data and perform
classification only when attr. of a test instancematches it completely.
-
8/14/2019 Rule Based Classification
28/42
L22/11-09-09 28
Instance-Based Classifiers
Atr1 ... AtrN ClassA
B
B
C
A
CB
Set of Stored Cases
Atr1 ... AtrN
Unseen Case
Store the training records Use training records to
predict the class label of unseen cases
-
8/14/2019 Rule Based Classification
29/42
L22/11-09-09 29
Instance Based Classifiers Examples:
Rote-learner (classifier) Memorizes entire training data and performsclassification only if attributes of record match one of the training examples exactly.
Its drawback is that some test rec. may not at all beclassified bcoz they dont match any of the instancein the tr. Data.
SLOUTION??
Nearest neighbor Uses k closest points (nearest neighbors) for
performing classification
-
8/14/2019 Rule Based Classification
30/42
L22/11-09-09 30
Nearest Neighbor Classifiers Basic idea: The main idea or justification of nearest neighbor
classifier is emphasized with the following example: If it walks like a duck, quacks like a duck, looks like a
duck, then its probably a duck
TrainingRecords
Test Re
ComputeDistance
Choose k of thenearest records
-
8/14/2019 Rule Based Classification
31/42
L22/11-09-09 31
A nearest-neighbor classifier represents
each instance as a d-dimensional datapoint in space, where d is the no. of attributes.
-
8/14/2019 Rule Based Classification
32/42
32
Nearest-Neighbor Classifiersq Requires three things
The set of stored records Distance Metric to compute
distance between records The value of k , the number of
nearest neighbors to retrieve
q To classify an unknown record: Compute distance to other
training records
Identify k nearest neighbors Use class labels of nearest
neighbors to determine theclass label of unknown record(e.g., by taking majority vote)
Unknown record
-
8/14/2019 Rule Based Classification
33/42
L22/11-09-09 33
Definition of Nearest Neighbor
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data pointsthat have the k smallest distance to x
-
8/14/2019 Rule Based Classification
34/42
34
Nearest Neighbor Classification Compute distance between two points:
Euclidean distance
Determine the class from nearest neighbor list take the majority vote of class labels among
the k-nearest neighbors
=i
iiq pq pd 2)(),(
-
8/14/2019 Rule Based Classification
35/42
L22/11-09-09 35
Nearest Neighbor Classification Choosing the value of k:
If k is too small, sensitive to noise points If k is too large, neighborhood may include points
from other classes
X
K-nearestclassificationwith large k
-
8/14/2019 Rule Based Classification
36/42
L22/11-09-09 36
Nearest Neighbor Classification
Scaling issues Attributes may have to be scaled to prevent
distance measures from being dominated byone of the attributes
Example: height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M
-
8/14/2019 Rule Based Classification
37/42
L22/11-09-09 37
Nearest neighbor Classification
k-NN classifiers are lazy learners It does not build models explicitly Unlike eager learners such as decision tree
induction and rule-based systems Classifying unknown records are relatively
expensive
-
8/14/2019 Rule Based Classification
38/42
L22/11-09-09 38
Algorithm 1: Let k be the no. of NN and D be the set of tr. examples. 2: for each test example z=(x,y) do 3: compute d(x,x), the distance between z and
every example, (x,y) D.
4: Select D z D, the set of k closest trainingexamples to z.
5: y=
6: end for
)(maxarg ),( = z ii D y x iv
yv I
-
8/14/2019 Rule Based Classification
39/42
L22/11-09-09 39
Once the NN list is obtained, the test sample isclassified based on the majority class of its NN : Majority voting: y=
v is the class label, y i is the class label for one of theNN and I(.) is an indicator function that returns the
value 1 if its argument is true and 0 otherwise.
)(maxarg ),( = z ii D y x iv yv I
-
8/14/2019 Rule Based Classification
40/42
L22/11-09-09 40
In majority voting approach, everyneighbor has the same impact on theclassification. (Refer slide 15 fig.)
This factor makes classification algo.sensitive to the choice of k.
In order to reduce this impact of k, weassign weight to each of the distance for NN say x i:wi=1/d(x,x i)2.
-
8/14/2019 Rule Based Classification
41/42
L22/11-09-09 41
As a result of applying wt. to the distance, the tr.Ex that are located far away from z have aweaker impact on the classification.
Using the distance-weighted voting scheme, theclass label can be determined:
Distance-wtd. Voting
y= )(maxarg ),( = z ii D y x iiv
yv I w
-
8/14/2019 Rule Based Classification
42/42
Characteristics 1. NN classification is a part of instance-based
learning. 2. Lazy learners like NN classifiers do not need
model building.
3. NN classifiers make their predictions basedon local information whereas DT and rule-basedclassifiers attempt to find a global model that fitsthe entire input space.
4. Appropriate proximity measures play asignificant role in NN classifiers.