2006/12/06chen yi-chun1 mining positive and negative association rules: an approach for confined...

14
2006/12/06 Chen Yi-Chun 1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

Upload: tyler-richardson

Post on 01-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 1

Mining Positive and Negative Association Rules: An Approach

for Confined Rules

Maria-Luiza Antonie, Osmar R. Zaiane

PKDD2004

Page 2: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 2

Outline

• Motivation– Negative association rule– Correlation coefficient

• Our algorithm

• Others’ algorithms

• Compare

• Conclusion

Page 3: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 3

Positive vs Negative

• Positive association rules– Typical association rule

• Negative association rules– Identify products that conflict with each other

• “customers that buy Coke do not buy Pepsi”

– Identify products that complement each other• “如果 Coke賣完 ,就改買 Pepsi”

Page 4: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 4

Example

• “non-organic organic” has 20% support and 25% confidence

• “ organic non-organic” has 20% support and 50% confidence

Organic Organic

Non-organic 20 60 80

non-organic 20 0 20

40 60 100

row

col

25% confidence

Page 5: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 5

Correlation Coefficient• Correlation coefficient:

• = 0 : X and Y are independent.

• =+1: X and Y are perfectly positive correlated.

• =-1: X and Y are perfectly negative correlated.

• Let X and Y be two binary variables.– correlation coefficient:

– Transform:

X Y

Cov(X,Y) =

Y Y

X

X

N

row

col

11f 10f 1+f

00f01f 0+f

+0f+1f

11 00 10 01

+0 +1 1+ 0+

f f - f f =

f f f f

11 1+ +1

1+ 1+ +1 +1

Nf - f f =

f (N-f )f (N-f )

Page 6: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 6

Summary

• The correlation coefficient between these two items is -0.61– They are negatively correlated.– So the rule “ organic non-organic” is misleading

Page 7: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 7

Negative Association Rules

• Generalized negative association rule is a rule containing a negation of an item– e.g. :

• Confined negative association rulesA B C D E F

X Y , X Y , X Y

Page 8: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 8

Correlation Coefficient (cont.)

• Cohen discusses about the correlation coefficient’s strength– is large

– is moderate

– is small

• Then we introduce an automatic progressive thresholding process– This eliminates the need for manually adjusted threshol

ds.

0.5 0.3 0.1

Page 9: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 9

Our Algo.

Automatic progressive thresholding process.1. We start by setting our correlation threshol

d to 0.52. If no strong correlated rules are found the t

hreshold slides progressively to 0.4, 0.33. Until some rules are found with moderate

correlations

Page 10: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 10

Others’ algo.

• [WZZ02]– Set a parameter “mininterest”

– if mininterest • That is interest.

• [THC02] SRM( substitution rule mining)– Only discuss one negative rule

– 利用 -square value 和 support value 找出 concrete items

– 再利用 correlation coefficient 找出其 rule

sup( ) sup( )sup( )A B A B A B

X Y

Page 11: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 11

Example

TID Items

1 A,C,D

2 B,C

3 C

4 A,B,F

5 A,C,D

6 E

7 B,F

8 B,C,F

9 A,B,E

10 A,D

TID Items Equivalent bit vector

1 A, B,C,D, E, F (101100)

2 A,B,C, D, E, F (011000)

3 A, B,C, D, E, F (001000)

4 A,B, C, D, E,F (110001)

5 A, B,C,D, E, F (101100)

6 A, B, C, D,E, F (000010)

7 A,B, C, D, E,F (010001)

8 A,B,C, D, E,F (011001)

9 A,B, C, D,E, F (110010)

10 A, B, C,D, E, F (100100)

Page 12: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 12

min. sup. = 0.2correlation coefficient = 0.5min. interest = 0.07

TID

Items

1 A,C,D

2 B,C

3 C

4 A,B,F

5 A,C,D

6 E

7 B,F

8 B,C,F

9 A,B,E

10 A,D

Our Int. SRM

ACD ACD

BD BD BD

CE CE

DF

ABC ABC

ABD ABD

BCD

Our Int. SRM

AD AD AD

BF BF BF

2-Itemsets

3-Itemsets

1. A: 5 B:5 C:5 D:3 E:2 F:32. 找出 candidate : AB, AC, AD, BC, BF, CD3. 把彼此的 correlation 算出來若 又

其 則 XY 為 positive rule4. 例 :

min

mins s

A A

D 3 0 3

D 2 5 7

5 5 10

30 150.66

3*7*5*5

1. 若 又其2. 則 Negative rule 會產生3. 或是若 則會產生

4. 例 :

min.sup.s sup( ) min .sup.X Y

( )X Y min

X Y X Y 或

B B

D 0 3 3

D 5 2 7

5 5 10

0 15 150.66

22.93*7*5*5

The itemset DF has a minimum interest of -0.09 , but it has a correlation of only -0.42

因為在 SRM 中其 correlation coefficient 必定要大於 minimum value, 而我們的方法只要大於等於 minimum value 就好 , 而 CE 的 correlation 剛好是 0.5

Page 13: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 13

Experimental Results

Page 14: 2006/12/06Chen Yi-Chun1 Mining Positive and Negative Association Rules: An Approach for Confined Rules Maria-Luiza Antonie, Osmar R. Zaiane PKDD2004

2006/12/06 Chen Yi-Chun 14

Conclusion

• Too many association rules are generated but not always useful on marketing.