rule based classification

Upload: allison-collier

Post on 30-May-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Rule Based Classification

    1/42

    L22/11-09-09 1

    Rule based classification

    Lecture 21/10-09-09(no class bcoz of placements)

    Lecture 22/11-09-09(No class)

    Lecture 23/12-09-09(No Class)

    Lecture 24/14-09-09

  • 8/14/2019 Rule Based Classification

    2/42

    L22/11-09-09 2

    Building Classification Rules Direct Method :

    Extract rules directly from data. e.g.: RIPPER, CN2, Holtes 1R

    Indirect Method : Extract rules from other classification

    models (e.g. decision trees, neuralnetworks, etc).

    e.g: C4.5rules

  • 8/14/2019 Rule Based Classification

    3/42

    L22/11-09-09 3

    Direct Method: Sequential CoveringAlgorithm

    Extracts rules directly from data.

    Extraction of rules are done one class at a time if no. of classes are more.

    The criterion for selecting which class should be

    considered, depends on no. of factors like classprevalence.

  • 8/14/2019 Rule Based Classification

    4/42

    L22/11-09-09 4

    Algorithm1: Let E - training Records, A set of attr. value pairs, {(A j,v j)}.

    2: Let Y o be an ordered set of classes {y 1,y2,y3-------- ,yk}.3: Let R = { } be initial rule (decision) list.4: for each class y Y o- {yk} do

    5: while stopping condition is not met do

    6: r Learn-One-Rule (E, A, y).Remove training records from E that are covered by r .Add r to the bottom of the rule list: R R V r .

    end while

    end for Insert the default rule, { } y k, to the bottom of the rule list R

  • 8/14/2019 Rule Based Classification

    5/42

    L22/11-09-09 5

    Sequential Covering Algorithm(in short for your quick reference)

    1. Start from an empty rule set.2. Extract a rule using the Learn-One-Rule

    function.3. Remove training records covered by the

    rule.

    4. Repeat Step (2) and (3) until stoppingcriterion is met.

  • 8/14/2019 Rule Based Classification

    6/42

    L22/11-09-09 6

    Learn-One-Rule function Objective - extract a rule that covers

    maximum of +ve examples and none or fewnegative examples in the training dataset.

    Computationally expensive bcoz of exponential size of search space.

    Generates an initial rule and keeps ongrowing (refining) it till stopping criteria ismet.

  • 8/14/2019 Rule Based Classification

    7/42

    L22/11-09-09 7

    Example of Sequential Covering

    (i) Original Data

    (ii) Step 1

  • 8/14/2019 Rule Based Classification

    8/42

    L22/11-09-09 8

    Example of SequentialCovering

    (iii) Step 2

    R1

    (iv) Step 3

    R1

    R2

  • 8/14/2019 Rule Based Classification

    9/42

    L22/11-09-09 9

    Aspects of Sequential Covering Rule Growing Strategy

    Instance Elimination

    Rule Evaluation

    Stopping Criterion

    Rule Pruning

  • 8/14/2019 Rule Based Classification

    10/42

    L22/11-09-09 10

    Rule Growing Two common strategies :

    1. General-to-specific2. Specific-to-general

    Tid

    1

    2

  • 8/14/2019 Rule Based Classification

    11/42

    L22/11-09-09 11

    General- to- specific

    Rule has poorquality as itcovers allexamples in thetraining set

    Conjuncts aresubsequentlyadded toimprove thequality of therule

    r: { } y

  • 8/14/2019 Rule Based Classification

    12/42

    L22/11-09-09 12

    Specific to general

    Refund=No,Status=Single,Income=85K(Class=Yes)

    Refund=No,Status=Single,Income=90K(Class=Yes)

    Refund=No,

    Status = Single(Class = Yes)

    (b) Specific-to-general

  • 8/14/2019 Rule Based Classification

    13/42

    L22/11-09-09 13

    Body temp=warm-blooded, Skin cover=hair, givesbirth=yes, aquatic creature=no, Aerial creature= no,has legs= yes, hibernates=no => MAMMALS

    Body temp=warm-blooded,Skin cover=hair, givesbirth=yes, aquaticcreature=no, Aerial creature=no, has legs= yes => Mammals

    Skin cover=hair, gives birth=yes, aquaticcreature=no, Aerial creature= no, haslegs= yes, hibernates=no => MAMMALS

    Specific-to-general

    One of thepositiveexamples arechosen randomlyas initial seed

    One of theconjuncts areremoved so that itcan cover more

    +ve examples

  • 8/14/2019 Rule Based Classification

    14/42

    L22/11-09-09 14

    Rule Evaluation Suppose a tr set contains 60 +ve and 100 ve

    examples. Consider two rules:

    R1: covers 50 +ve examples and 5 -ve examples

    R2: covers 2 +ve ex and 0 ve ex.

    The accuracy for R1 is 90.9% and R2 is 100%.

    Still R1 is better bcoz its coverage.Other measures clearly state

  • 8/14/2019 Rule Based Classification

    15/42

    L22/11-09-09 15

    Rule Evaluation

    Metrics: Accuracy

    Likelihood ratio Statistics, Laplace

    M-estimate

    OR

    k n

    nc

    +

    +=

    1

    k nkpn c

    +

    +

    =

    n : total no. of instances

    n c : Number of instancescovered by rule

    k : Number of classes

    p : Prior probability

    nn c=

    ==

    k

    i iii e f f R 1 )/log(2

    mn

    pmn c+

    +=

  • 8/14/2019 Rule Based Classification

    16/42

    L22/11-09-09 16

    FOILs Information gain (Rule EvaluationContd)

    r: A + covers p 0 +ve examples and n 0 veexamples

    Suppose we add a new conjunct B, theextended rule become

    r: A^B + covers p 1 +ve examples and n 1 ve examples

    Then FOIL IG= )log(log00

    02

    11

    121 n p

    pn p

    p p+

    +

  • 8/14/2019 Rule Based Classification

    17/42

    L22/11-09-09 17

    Q2. Consider two rules: R1:A C R2:A B C

    Suppose R1 is covered by 350 +ve examples and 150 ve examples, while R2 is covered by 300 +veexamples and 50 ve examples. Compute the FOILsinformation gain for rule R2 wrt R1.

    Q4. Page no. 317

  • 8/14/2019 Rule Based Classification

    18/42

    L22/11-09-09 18

    Aspects of Sequential CoveringAlgorithm

    Rule Growing Strategy

    Rule Evaluation

    Stopping Criterion

    Rule Pruning

    Instance Elimination

  • 8/14/2019 Rule Based Classification

    19/42

    L22/11-09-09 19

    Stopping Criterion and RulePruning

    Stopping criterion Compute the gain If gain is not significant, discard the new rule

    Rule Pruning Remove one of the conjuncts in the rule Compare error rate on validation set before

    and after pruning If error improves, prune the conjunct

  • 8/14/2019 Rule Based Classification

    20/42

    L22/11-09-09 20

    Aspects of Sequential CoveringAlgorithm

    Rule Growing Strategy

    Rule Evaluation

    Stopping Criterion

    Rule Pruning

    Instance Elimination

  • 8/14/2019 Rule Based Classification

    21/42

    L22/11-09-09 21

    Instance Elimination

    Why do we need to eliminateinstances? Otherwise, the next rule is

    identical to previous rule class = +

    class = -

    +

    + +

    ++

    ++

    +

    ++

    +

    +

    +

    +

    +

    +

    +

    -

    -

    --

    --

    --

    --

    -

    -

    -

    -

    -

    -

    -

    +

    +

    ++

    +

    +

    +

    R1

    R3

    +

    +

    Why do we remove +ve and ve instances?-Ensure that the next rule is different-Prevent underestimating the accuracy of rule.

  • 8/14/2019 Rule Based Classification

    22/42

    L22/11-09-09 22

    Indirect methods for rule-basedclassifiers and Instance-Based

    Classifiers

    Lecture 26/17-09-09

  • 8/14/2019 Rule Based Classification

    23/42

    23

    Indirect Methods(Generating rule set from Decision tree)

    Rule Set

    r1: (P=No,Q=No) ==> -r2: (P=No,Q=Yes) ==> +r3: (P=Yes,R=No) ==> +r4: (P=Yes,R=Yes,Q=No) ==r5: (P=Yes,R=Yes,Q=Yes) ==

    P

    Q R

    Q- + +

    - +

    No No

    No

    Yes Yes

    Yes

    No Yes

    Consider r2, r3, r5It shows that the class label is always predicted as + whenQ=Yes

    So we can say r2: (Q=yes) +

    r3: (P=yes) (R=no) +

    Simplified rules

  • 8/14/2019 Rule Based Classification

    24/42

    L22/11-09-09 24

    Classification rules extracted from DT

    C4.5rules:(Give Birth=No, Lives in water=No , Can Fly=Yes) Birds

    (Give Birth=No, Live in Water=Yes) Fishes

    (Give Birth=Yes) Mammals

    (Give Birth=No, Live in Water=No, Can Fly=No,) Reptiles

    ( ) Amphibians

    Give

    Birth?

    Live InWater?

    CanFly?

    Mammals

    Fishes Amphibians

    Birds Reptiles

    Yes No

    Yes

    Sometimes

    No

    Yes No

  • 8/14/2019 Rule Based Classification

    25/42

    L22/11-09-09 25

    Advantages of Rule-BasedClassifiers

    As highly expressive as decision trees Easy to interpret

    Easy to generate Can classify new instances rapidly Performance comparable to decision trees

  • 8/14/2019 Rule Based Classification

    26/42

    L22/11-09-09 26

    InstanceBased Classifiers

  • 8/14/2019 Rule Based Classification

    27/42

    L22/11-09-09 27

    Eager learners vs. Lazy learners Eager learners

    DT and rule-based classifiers are ex. of eager learners

    They are designed to learn a model that maps theinput attr. to the class label as soon as training databecomes available .

    Lazy Learners They delay the process of modeling the tr. Data until it

    is provided with an unseen instance to be classified. Instance-based classifiers belong to this class. They memorizes the entire tr. data and perform

    classification only when attr. of a test instancematches it completely.

  • 8/14/2019 Rule Based Classification

    28/42

    L22/11-09-09 28

    Instance-Based Classifiers

    Atr1 ... AtrN ClassA

    B

    B

    C

    A

    CB

    Set of Stored Cases

    Atr1 ... AtrN

    Unseen Case

    Store the training records Use training records to

    predict the class label of unseen cases

  • 8/14/2019 Rule Based Classification

    29/42

    L22/11-09-09 29

    Instance Based Classifiers Examples:

    Rote-learner (classifier) Memorizes entire training data and performsclassification only if attributes of record match one of the training examples exactly.

    Its drawback is that some test rec. may not at all beclassified bcoz they dont match any of the instancein the tr. Data.

    SLOUTION??

    Nearest neighbor Uses k closest points (nearest neighbors) for

    performing classification

  • 8/14/2019 Rule Based Classification

    30/42

    L22/11-09-09 30

    Nearest Neighbor Classifiers Basic idea: The main idea or justification of nearest neighbor

    classifier is emphasized with the following example: If it walks like a duck, quacks like a duck, looks like a

    duck, then its probably a duck

    TrainingRecords

    Test Re

    ComputeDistance

    Choose k of thenearest records

  • 8/14/2019 Rule Based Classification

    31/42

    L22/11-09-09 31

    A nearest-neighbor classifier represents

    each instance as a d-dimensional datapoint in space, where d is the no. of attributes.

  • 8/14/2019 Rule Based Classification

    32/42

    32

    Nearest-Neighbor Classifiersq Requires three things

    The set of stored records Distance Metric to compute

    distance between records The value of k , the number of

    nearest neighbors to retrieve

    q To classify an unknown record: Compute distance to other

    training records

    Identify k nearest neighbors Use class labels of nearest

    neighbors to determine theclass label of unknown record(e.g., by taking majority vote)

    Unknown record

  • 8/14/2019 Rule Based Classification

    33/42

    L22/11-09-09 33

    Definition of Nearest Neighbor

    X X X

    (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

    K-nearest neighbors of a record x are data pointsthat have the k smallest distance to x

  • 8/14/2019 Rule Based Classification

    34/42

    34

    Nearest Neighbor Classification Compute distance between two points:

    Euclidean distance

    Determine the class from nearest neighbor list take the majority vote of class labels among

    the k-nearest neighbors

    =i

    iiq pq pd 2)(),(

  • 8/14/2019 Rule Based Classification

    35/42

    L22/11-09-09 35

    Nearest Neighbor Classification Choosing the value of k:

    If k is too small, sensitive to noise points If k is too large, neighborhood may include points

    from other classes

    X

    K-nearestclassificationwith large k

  • 8/14/2019 Rule Based Classification

    36/42

    L22/11-09-09 36

    Nearest Neighbor Classification

    Scaling issues Attributes may have to be scaled to prevent

    distance measures from being dominated byone of the attributes

    Example: height of a person may vary from 1.5m to 1.8m

    weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M

  • 8/14/2019 Rule Based Classification

    37/42

    L22/11-09-09 37

    Nearest neighbor Classification

    k-NN classifiers are lazy learners It does not build models explicitly Unlike eager learners such as decision tree

    induction and rule-based systems Classifying unknown records are relatively

    expensive

  • 8/14/2019 Rule Based Classification

    38/42

    L22/11-09-09 38

    Algorithm 1: Let k be the no. of NN and D be the set of tr. examples. 2: for each test example z=(x,y) do 3: compute d(x,x), the distance between z and

    every example, (x,y) D.

    4: Select D z D, the set of k closest trainingexamples to z.

    5: y=

    6: end for

    )(maxarg ),( = z ii D y x iv

    yv I

  • 8/14/2019 Rule Based Classification

    39/42

    L22/11-09-09 39

    Once the NN list is obtained, the test sample isclassified based on the majority class of its NN : Majority voting: y=

    v is the class label, y i is the class label for one of theNN and I(.) is an indicator function that returns the

    value 1 if its argument is true and 0 otherwise.

    )(maxarg ),( = z ii D y x iv yv I

  • 8/14/2019 Rule Based Classification

    40/42

    L22/11-09-09 40

    In majority voting approach, everyneighbor has the same impact on theclassification. (Refer slide 15 fig.)

    This factor makes classification algo.sensitive to the choice of k.

    In order to reduce this impact of k, weassign weight to each of the distance for NN say x i:wi=1/d(x,x i)2.

  • 8/14/2019 Rule Based Classification

    41/42

    L22/11-09-09 41

    As a result of applying wt. to the distance, the tr.Ex that are located far away from z have aweaker impact on the classification.

    Using the distance-weighted voting scheme, theclass label can be determined:

    Distance-wtd. Voting

    y= )(maxarg ),( = z ii D y x iiv

    yv I w

  • 8/14/2019 Rule Based Classification

    42/42

    Characteristics 1. NN classification is a part of instance-based

    learning. 2. Lazy learners like NN classifiers do not need

    model building.

    3. NN classifiers make their predictions basedon local information whereas DT and rule-basedclassifiers attempt to find a global model that fitsthe entire input space.

    4. Appropriate proximity measures play asignificant role in NN classifiers.