2006/12/06chen yi-chun1 mining positive and negative association rules: an approach for confined...
TRANSCRIPT
2006/12/06 Chen Yi-Chun 1
Mining Positive and Negative Association Rules: An Approach
for Confined Rules
Maria-Luiza Antonie, Osmar R. Zaiane
PKDD2004
2006/12/06 Chen Yi-Chun 2
Outline
• Motivation– Negative association rule– Correlation coefficient
• Our algorithm
• Others’ algorithms
• Compare
• Conclusion
2006/12/06 Chen Yi-Chun 3
Positive vs Negative
• Positive association rules– Typical association rule
• Negative association rules– Identify products that conflict with each other
• “customers that buy Coke do not buy Pepsi”
– Identify products that complement each other• “如果 Coke賣完 ,就改買 Pepsi”
2006/12/06 Chen Yi-Chun 4
Example
• “non-organic organic” has 20% support and 25% confidence
• “ organic non-organic” has 20% support and 50% confidence
Organic Organic
Non-organic 20 60 80
non-organic 20 0 20
40 60 100
row
col
25% confidence
2006/12/06 Chen Yi-Chun 5
Correlation Coefficient• Correlation coefficient:
• = 0 : X and Y are independent.
• =+1: X and Y are perfectly positive correlated.
• =-1: X and Y are perfectly negative correlated.
• Let X and Y be two binary variables.– correlation coefficient:
– Transform:
X Y
Cov(X,Y) =
Y Y
X
X
N
row
col
11f 10f 1+f
00f01f 0+f
+0f+1f
11 00 10 01
+0 +1 1+ 0+
f f - f f =
f f f f
11 1+ +1
1+ 1+ +1 +1
Nf - f f =
f (N-f )f (N-f )
2006/12/06 Chen Yi-Chun 6
Summary
• The correlation coefficient between these two items is -0.61– They are negatively correlated.– So the rule “ organic non-organic” is misleading
2006/12/06 Chen Yi-Chun 7
Negative Association Rules
• Generalized negative association rule is a rule containing a negation of an item– e.g. :
• Confined negative association rulesA B C D E F
X Y , X Y , X Y
2006/12/06 Chen Yi-Chun 8
Correlation Coefficient (cont.)
• Cohen discusses about the correlation coefficient’s strength– is large
– is moderate
– is small
• Then we introduce an automatic progressive thresholding process– This eliminates the need for manually adjusted threshol
ds.
0.5 0.3 0.1
2006/12/06 Chen Yi-Chun 9
Our Algo.
Automatic progressive thresholding process.1. We start by setting our correlation threshol
d to 0.52. If no strong correlated rules are found the t
hreshold slides progressively to 0.4, 0.33. Until some rules are found with moderate
correlations
2006/12/06 Chen Yi-Chun 10
Others’ algo.
• [WZZ02]– Set a parameter “mininterest”
– if mininterest • That is interest.
• [THC02] SRM( substitution rule mining)– Only discuss one negative rule
– 利用 -square value 和 support value 找出 concrete items
– 再利用 correlation coefficient 找出其 rule
sup( ) sup( )sup( )A B A B A B
X Y
2006/12/06 Chen Yi-Chun 11
Example
TID Items
1 A,C,D
2 B,C
3 C
4 A,B,F
5 A,C,D
6 E
7 B,F
8 B,C,F
9 A,B,E
10 A,D
TID Items Equivalent bit vector
1 A, B,C,D, E, F (101100)
2 A,B,C, D, E, F (011000)
3 A, B,C, D, E, F (001000)
4 A,B, C, D, E,F (110001)
5 A, B,C,D, E, F (101100)
6 A, B, C, D,E, F (000010)
7 A,B, C, D, E,F (010001)
8 A,B,C, D, E,F (011001)
9 A,B, C, D,E, F (110010)
10 A, B, C,D, E, F (100100)
2006/12/06 Chen Yi-Chun 12
min. sup. = 0.2correlation coefficient = 0.5min. interest = 0.07
TID
Items
1 A,C,D
2 B,C
3 C
4 A,B,F
5 A,C,D
6 E
7 B,F
8 B,C,F
9 A,B,E
10 A,D
Our Int. SRM
ACD ACD
BD BD BD
CE CE
DF
ABC ABC
ABD ABD
BCD
Our Int. SRM
AD AD AD
BF BF BF
2-Itemsets
3-Itemsets
1. A: 5 B:5 C:5 D:3 E:2 F:32. 找出 candidate : AB, AC, AD, BC, BF, CD3. 把彼此的 correlation 算出來若 又
其 則 XY 為 positive rule4. 例 :
min
mins s
A A
D 3 0 3
D 2 5 7
5 5 10
30 150.66
3*7*5*5
1. 若 又其2. 則 Negative rule 會產生3. 或是若 則會產生
4. 例 :
min.sup.s sup( ) min .sup.X Y
( )X Y min
X Y X Y 或
B B
D 0 3 3
D 5 2 7
5 5 10
0 15 150.66
22.93*7*5*5
The itemset DF has a minimum interest of -0.09 , but it has a correlation of only -0.42
因為在 SRM 中其 correlation coefficient 必定要大於 minimum value, 而我們的方法只要大於等於 minimum value 就好 , 而 CE 的 correlation 剛好是 0.5
2006/12/06 Chen Yi-Chun 13
Experimental Results
2006/12/06 Chen Yi-Chun 14
Conclusion
• Too many association rules are generated but not always useful on marketing.