learning fuzzy association rules and associative classification rules jianchao han computer science...

22
Learning Fuzzy Association Learning Fuzzy Association Rules and Rules and Associative Classification Associative Classification Rules Rules Jianchao Han Jianchao Han Computer Science Department Computer Science Department California State University California State University Dominguez Hills Dominguez Hills

Upload: walter-walton

Post on 18-Dec-2015

231 views

Category:

Documents


0 download

TRANSCRIPT

Learning Fuzzy Association Rules Learning Fuzzy Association Rules and and

Associative Classification Rules Associative Classification Rules

Jianchao HanJianchao Han

Computer Science DepartmentComputer Science DepartmentCalifornia State University Dominguez HillsCalifornia State University Dominguez Hills

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 22

AgendaAgenda

IntroductionIntroductionTraditional Association Rules Traditional Association Rules Positive and Negative Fuzzy Positive and Negative Fuzzy Association RulesAssociation RulesAn Illustrative Example An Illustrative Example Positive and Negative Fuzzy Positive and Negative Fuzzy Associative Classification RulesAssociative Classification RulesImplementation AlgorithmsImplementation AlgorithmsConclusionConclusion

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 33

IntroductionIntroductionAssociation Association – a relationship between data itemsa relationship between data items

Sales data associationSales data association– If a set of items A occurs in a sale transaction, If a set of items A occurs in a sale transaction,

then another set of items B will likely also then another set of items B will likely also occurs in the same transaction occurs in the same transaction

LimitationsLimitations– Data are described in binary attribute valuesData are described in binary attribute values– Only positive associations are pursuedOnly positive associations are pursued

SolutionsSolutions– Fuzzy attribute valuesFuzzy attribute values– Negative associationsNegative associations

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 44

Traditional Association RulesTraditional Association RulesBasket dataBasket data– I={II={I11, I, I2 2 , … , I, … , Imm}, a set of possible items}, a set of possible items

– D={tD={t11, t, t2 2 , … , t, … , tnn}, a database of transactions}, a database of transactions

– tt∈∈D is represented as a binary vector, with D is represented as a binary vector, with t[It[Ikk]=1 if t contains I]=1 if t contains Ikk

t[It[Ikk]=0 if t does not contain I]=0 if t does not contain Ikk

Support of itemsetSupport of itemset– ∀∀XX⊂⊂I, t satisfies X, if I, t satisfies X, if ∀∀IIkk∈∈I, I, t[It[Ikk]=1]=1

– The support of X in D is defined asThe support of X in D is defined asSupp(X) = |{tSupp(X) = |{t∈∈D| D| t satisfies X}|t satisfies X}|

That is the number of transactions that satisfy XThat is the number of transactions that satisfy X

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 55

Traditional Association RulesTraditional Association RulesItemset (binary) association rulesItemset (binary) association rules– For any X, YFor any X, Y⊂⊂I, XI, X⋂⋂Y=Y=ФФ, X, XY is an association Y is an association

rule ifrule if

– The support of the rule Supp(XThe support of the rule Supp(XY) is the Y) is the probability of occurrence of Xprobability of occurrence of X⋃⋃Y in DY in D

– The confidence of the rule Conf(XThe confidence of the rule Conf(XY) is the Y) is the conditional probability of Y given X conditional probability of Y given X

Mining association rulesMining association rules– Look for all possible associations XLook for all possible associations XY such Y such

that Supp(Xthat Supp(XY) ≥ Y) ≥ αα – a given threshold and – a given threshold and Conf(XConf(XY) ≥ Y) ≥ ββ– another given threshold– another given threshold

||

)()(

D

YXSuppYXSupp

)(

)()(

XSupp

YXSuppYXConf

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 66

Association Rules Mining Association Rules Mining AlgorithmAlgorithm

Two stepsTwo steps– Discovering all frequent itemsets that have the Discovering all frequent itemsets that have the

support ≥support ≥αα– Generating association rules Generating association rules

Partition each frequent itemset into two parts, X and YPartition each frequent itemset into two parts, X and YTest the Conf(XTest the Conf(XY)Y)

Level-wise algorithmLevel-wise algorithm– Observation: if X is a frequent itemset, its all Observation: if X is a frequent itemset, its all

subsets aresubsets are– Test all 1-item itemsetsTest all 1-item itemsets– Test all 2-item itemsets that are the superset of Test all 2-item itemsets that are the superset of

frequent 1-item itemsetsfrequent 1-item itemsets– Repeat until no new frequent itemsets are foundRepeat until no new frequent itemsets are found

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 77

Fuzzy Association RulesFuzzy Association RulesBinary value is extended to the interval [0,1]Binary value is extended to the interval [0,1]

Example -- Item Example -- Item TomatoTomato belongs to belongs to VegetableVegetable in in some degree, say 0.7some degree, say 0.7

Itemset A={AItemset A={A11, A, A2 2 , … , A, … , All}}⊂⊂II, where A, where Ai i is a fuzzy is a fuzzy subset of Isubset of I

Support of an itemset A is defined asSupport of an itemset A is defined as

Support of a rule ASupport of a rule AB isB is

Confidence of a rule AConfidence of a rule AB isB is

Dt

l

iA tASuppi

1

)()(

||

)(

)(D

t

BASupp Dt BAxx

Dt Axx

Dt BAxx

t

t

BAConf)(

)(

)(

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 88

Positive vs. Negative Association RulesPositive vs. Negative Association RulesPositive association rulesPositive association rules

– Like ALike ABB

Negative association rulesNegative association rules– Like ¬ALike ¬AB, ¬AB, ¬A¬B, A¬B, A¬B¬B

Different rule-interest measures exist for Different rule-interest measures exist for negative association rules, e.g.negative association rules, e.g.

– Negative example of ANegative example of AB is positive example B is positive example of Bof BAA

– AA¬B, if ¬B, if AA⋃⋃B is infrequentB is infrequentAA⋃⋃¬B is frequent¬B is frequentSupp(Supp(AA⋃⋃¬B) – Supp(A)Supp(¬B)≥¬B) – Supp(A)Supp(¬B)≥ααSupp(Supp(AA⋃⋃¬B)/Supp(A) ≥¬B)/Supp(A) ≥ββ

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 99

Fuzzy Positive Association RulesFuzzy Positive Association RulesSimple fuzzy extension to traditional Simple fuzzy extension to traditional association rulesassociation rules

AAB is a fuzzy positive association rule, ifB is a fuzzy positive association rule, if

1)1) AA⋂⋂B = B = ФФ

2)2)

3)3)

||

)()()(

D

ttBASupp Dt

By yAx

x

DtAx x

Dt AxBy yx

t

ttBAConf

)(

)()()(

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1010

Fuzzy Negative Association RulesFuzzy Negative Association RulesAA¬B is a negative association rule if¬B is a negative association rule if1)1) AA⋂⋂B = B = ФФ

2)2) Supp(A) ≥Supp(A) ≥αα

3)3) Supp(B) ≥Supp(B) ≥αα

4)4) Supp(ASupp(AB) < B) <

5)5)

6)6)

||

))(1()()(

D

ttBASupp Dt

By yAx

x

DtAx x

Dt AxBy yx

t

ttBAConf

)(

))(1()()(

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1111

Fuzzy Negative Association RulesFuzzy Negative Association Rules

¬A¬AB is a negative association rule ifB is a negative association rule if1)1) AA⋂⋂B = B = ФФ

2)2) Supp(A) ≥Supp(A) ≥αα

3)3) Supp(B) ≥Supp(B) ≥αα

4)4) Supp(ASupp(AB) < B) <

5)5)

6)6)

||

)())(1()(

D

ttBASupp Dt

By yAx

x

DtAx x

Dt AxBy yx

t

ttBAConf

))(1(

)())(1()(

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1212

Fuzzy Negative Association RulesFuzzy Negative Association Rules¬A¬A¬B is a negative association rule if¬B is a negative association rule if1)1) AA⋂⋂B = B = ФФ

2)2) Supp(A) ≥Supp(A) ≥αα

3)3) Supp(B) ≥Supp(B) ≥αα

4)4) Supp(ASupp(AB) < B) <

5)5)

6)6)

||

))(1())(1()(

D

ttBASupp Dt

By yAx

x

DtAx x

Dt AxBy yx

t

ttBAConf

))(1(

))(1())(1()(

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1313

Algorithm for Mining both Positive Algorithm for Mining both Positive and Negative Fuzzy Rulesand Negative Fuzzy Rules

Two stepsTwo steps– Generating all frequent and infrequent Generating all frequent and infrequent

itemsetsitemsets– Extracting fuzzy association rulesExtracting fuzzy association rules

Positive rules are extracted from the Positive rules are extracted from the frequent itemsetsfrequent itemsets

Negative rules are extracted from the Negative rules are extracted from the infrequent itemsets infrequent itemsets

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1414

An ExampleAn Example

Trans. i1 i2 i3 i4 i5 i6

t1 1.0 0.7 0.2 0.0 1.0 1.0

t2 0.8 0.0 0.6 0.8 0.4 0.2

t3 0.5 0.8 0.0 0.8 0.8 0.0

t4 0.7 0.2 1.0 0.9 1.0 0.8

t5 0.4 0.4 0.0 0.6 0.8 0.9

t6 0.8 0.0 0.1 1.0 0.1 0.8

t7 0.9 0.9 0.8 0.2 1.0 1.0

t8 0.6 0.1 0.1 0.8 0.7 0.8

1-itemset 2-itemsets 3-itemsets

itemset support itemset Support itemset support

i1 5.7/8 i1, i4 3.37/8 i1, i4, i5 1.99/8

i2 3.1/8 i1, i5 4.14/8 i1, i5, i6 3.21/8

i3 2.8/8 i1, i6 4.10/8

i4 5.1/8 i4, i5 3.20/8

i5 5.8/8 i4, i6 3.06/8

i6 5.5/8 i5, i6 4.24/8

Transaction Database Frequent vs. Infrequent ItemsetsWith support threshold 40%

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1515

An Example: An Example: Positive Fuzzy Association RulesPositive Fuzzy Association Rules itemset association support confidence

i1, i4 i1i4

i4i1

3.37/8 59.1%66.1%

i1, i5 i1i5

i5i1

4.14/8 72.6%71.4%

i1, i6 i1i6

i6i1

4.10/8 71.9%74.5%

i4, i5 i4i5

i5i4

3.20/8 62.7%55.2%

i5, i6 i5i6

i6i5

4.24/8 73.1%77.1%

i1, i5, i6 i1, i5i6

i1, i6i5

i5, i6i1

i1i5, i6

i5i1, i6

i6i1, i5

3.21/8 77.6%78.3%75.8%56.4%55.4%58.4%

Support threshold: 40%

Confidence threshold: 75%

Support threshold: 50%

Confidence threshold: 70%

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1616

An Example: An Example: Negative Fuzzy Association RulesNegative Fuzzy Association Rules

Support threshold: 25%

Confidence threshold: 70%

itemset association support confidence  

i4, i6i4i6

i6i4

2.04/8 35.8%73.0%

i6i4

i4i6

2.44/8 44.4%84.1%

i4i6

i6i6

0.46/8 15.9%18.4%

i1, i4, i5i1, i4i5

i5i1, i4

1.376/8 40.8%62.5%

i1, i5i4

i4i1, i5

2.146/8 51.8%74.0%

i4, i5i1

i1i4, i5

1.206/8 37.6%52.4%

i1i4, i5

i4, i5i1

0.184/8 3.20%61.3%

i4i1, i5

i1, i5i4

0.524/8 10.3%81.9%

i5i1, i4

i1, i4i5

0.454/8 7.80%79.6%

i1i4, i5

i4, i5i1

i4i1, i5

i1, i5i4

i5i1, i4

i1, i4i5

0.116/8 5.00%38.7%4.00%18.1%5.30%20.4%

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1717

Associative Classification RulesAssociative Classification RulesAssociative classification rules are a special Associative classification rules are a special subset of association rules whose right-subset of association rules whose right-hand-side is restricted to the class labels.hand-side is restricted to the class labels.In classification, data attributes are In classification, data attributes are partitioned into two categories: partitioned into two categories: condition condition attributes attributes andand decision attributes decision attributes. .

For simplicity, decision attributes are For simplicity, decision attributes are converted into decision attribute-value pairs converted into decision attribute-value pairs that are indicated as class labels. that are indicated as class labels.

Thus, class labels are also items in the Thus, class labels are also items in the database, but separate from condition database, but separate from condition items.items.

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1818

Two ConstraintsTwo Constraints

the left-hand-side of classification the left-hand-side of classification rules must be frequent itemsets of rules must be frequent itemsets of condition attributes, or the negation condition attributes, or the negation of infrequent conditional itemsets of infrequent conditional itemsets

the class labels that appear in the the class labels that appear in the right-hand-side of classification rules right-hand-side of classification rules must also be frequent 1-itemsets must also be frequent 1-itemsets

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 1919

Positive Fuzzy Associative Positive Fuzzy Associative Classification RulesClassification Rules

Let Let AAII be an itemset, and be an itemset, and cc C C be a be a class label. The relationship class label. The relationship AAcc is a is a positive fuzzy associative positive fuzzy associative classification rule, if the following classification rule, if the following conditions hold:conditions hold:

1)1) AA {c}{c} is a frequent itemsets in is a frequent itemsets in DD,, Supp(ASupp(A{c})/|D| {c})/|D| minsupp minsupp

2)2) AA c c is confident, is confident, Conf(AConf(Ac}=Supp(Ac}=Supp(A{c})/Supp(A){c})/Supp(A) minconf minconf

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 2020

Negative Fuzzy Associative Negative Fuzzy Associative Classification RulesClassification Rules

We only consider the format We only consider the format AAcc – where where AA is a frequent itemset, is a frequent itemset, – {c}{c} is a frequent class label, is a frequent class label, – AA{c}{c} is infrequent is infrequent

AAc c is a negative fuzzy associative is a negative fuzzy associative classification rule ifclassification rule if

1 1 Supp(A) ≥ minsuppSupp(A) ≥ minsupp; ;

22 Supp({c}) ≥ minsupp Supp({c}) ≥ minsupp; ;

3 3 Supp(ASupp(A{c})/|D| < minsupp;{c})/|D| < minsupp;

4 4 Supp(¬ASupp(¬A{c})/|D|{c})/|D| ≥ minsupp; ≥ minsupp;

55 Conf( Conf(AAc)=Supp(¬Ac)=Supp(¬A{c})/Supp(¬A)≥minconf.{c})/Supp(¬A)≥minconf.

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 2121

Learning AlgorithmLearning AlgorithmStep 1:Step 1: Finding the set of frequent conditional Finding the set of frequent conditional itemsets for associative classification rules itemsets for associative classification rules Step 2:Step 2: Inducing both positive and negative Inducing both positive and negative fuzzy associative classification rules fuzzy associative classification rules

– add each frequent class label add each frequent class label cc to each to each frequent itemset frequent itemset XX

If If X X {c}{c} is still is still frequentfrequent, then test if , then test if XXcc is a is a positive fuzzy association rule;positive fuzzy association rule;

If X If X {c} {c} is is infrequentinfrequent, then, then test if test if XXc c is a is a negative fuzzy association rule.negative fuzzy association rule.

– a frequent itemset Y is partitioned into two a frequent itemset Y is partitioned into two subsets A and B, and the associations subsets A and B, and the associations AABBc c andand A ABBc c are tested against the are tested against the support threshold and confidence threshold.support threshold and confidence threshold.

July 19, 2006July 19, 2006 WCCI 2006WCCI 2006 2222

ConclusionConclusion

Traditional association rulesTraditional association rules

Fuzzy extensions and negative rulesFuzzy extensions and negative rules

Fuzzy associative classification rulesFuzzy associative classification rules

An exampleAn example

AlgorithmsAlgorithms