第七章 網路資料庫之關連法則探勘
DESCRIPTION
第七章 網路資料庫之關連法則探勘. 內容概要. 簡介 關連法則探勘 (Association Rule Mining) 多層次關連法則探勘 (Multilevel Association Rule Mining) 數量化關連法則探勘 (Quantitative Association Rule Mining) 關連分析 (Correlation Analysis) 總結. 簡介 (1). 單一購物車告訴我們個別顧客的消費行為,但是累積大量的購物車資料之後,可以分析整體顧客的消費習慣。 - PowerPoint PPT PresentationTRANSCRIPT
-
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
-
(1)IBM PC ViewSonic
-
(2)80%
-
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
-
7-1
-
(1) (itemset)XXTT (support) X
-
(2)X (support count) XX (support) X7-1 2 5 5/10=0.5{2,5} 3 3/10=0.3 X Y [,] X Y X Y X Y
-
(3)X Y (confidence)
-
(4) (minimum support) (minimum confidence) (minimum support count)7-10.20.5 {1,3} {5}20.2{1,3}0.3 {1,3} {5}0.2/0.3=0.67
-
(5) (large itemset)Z XY
-
(6)7-10.20.7{1,3}{1}{3}{3}{1}{1}{3}0.3/0.4=0.75{3}{1}0.3/0.5=0.6{1}{3}
-
Apriori k k- (k-itemset)Lkk- (large k-itemset) Apriori1- L1L1 L2L2L3
-
Apriori Apriori Apriori {A,B}{A,B}{A}{B}{A}{A,B}{A}{B}
-
Apriori Apriori (candidate itemsets) (join) (prune)
- (k-1)-k- (candidate k-itemsets)Ckk-X1X2(k-1)-Xi[j]XijX1X2k-2X1[k-1]
- 7-1X1X23-X1={1,3,5}X2={1,3,6}X1[1]=X2[1]=1X1[2]=X2[2]=3X1[3]
-
Apriori k-CkLkLk XCkApriori XX1 Apriori CkXk-1(k-1)-X k-XCk
-
7-2X1X23-X1={1,3,5}X2={1,3,6}X1X24-{1,3,5,6}Apriori 4-{1,3,5,6}{1,3,5,6}3{1,3,5}{1,3,6}{1,5,6}{3,5,6}{1,3,5}{1,3,6}3-{1,5,6}{3,5,6}3-{1,3,5,6}4-{1,5,6}{3,5,6}3-{1,3,5,6}4-
-
Apriori 1 L1 = 1-;2 for (k = 2; Lk-1; k++) do begin3 Ck = Candidate_gen (Lk-1) 4 for each t 5 Ckctc c1 6 Lk = Ck 7 end8 return L =
- Candidate_gen Procedure1for each X1 Lk-1 /* X1[1],X1[2], , X1[k-1]X1 k-1*/2 for each X2 Lk-1 /* X2[1],X2[2], , X2[k-1]X2 k-1*/3 if (X1[1]=X2[1]) (X1[2]=X2[2]) (X1[k-2]=X2[k-2]) (X1[k-1]
-
7-3 (1)7-1Apriori3 1- C1
-
7-3 (2) L1
-
7-3 (3)L1C2
-
7-3 (4)2-
-
7-3 (5) L2
-
7-3 (6)L2C3
3-
-
7-3 (7){{1},{2},{3},{4},{5},{6},{1,3},{1,5},{2,5},{3,5}}0.7 {1} {3} =3/4=0.75 {3} {1} =3/5=0.6 {1} {5} =3/4=0.75 {5} {1} =3/6=0.5 {2} {5} =3/5=0.6 {5} {2} =3/6=0.5 {3} {5} =3/5=0.6 {5} {3} =3/6=0.5 13
-
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
-
80%PC70%IBM PCViewSonic (lower concept level)
-
7-5 IBM COMPAQ ASUS HP IBM Acer IBM Acer Toshiba
-
7-14 CRT LCD 17 19 15 17
-
7-15 A4 A3+
-
(lower) (higher) HP ViewSonic=0.01 =0.95
-
ViewSonic=0.7=0.9 (multilevel association rules)
-
(top-down) 1 (level-1) 2 (level-2) Apriori
-
(1)ixxi-1x
-
(2)ix i-1 1-x [=0.2]
()
()
= 0.25
-
(1) [=0.2]
[=0.12]
[=0.08]
= 0.3
= 0.06
-
(2)k-ik-X i-1 k-k-X {,LCD}[ = 0.2]
{,15LCD}[= 0.12]
{,17LCD}[= 0.02]
{,15LCD}[= 0.03]
{,17LCD}[= 0.03]
= 0.15 = 0.03
-
IBM 1122 1 1 1 2 2 3 2 4 IBM
-
7-2
-
7-3
-
7-4T[1]
-
TT[1]L[j,k]jk-LL[j]jminsup[j]j
-
7-4 1 2 3 4 5 T[1](7-4)7-3 1600 7-2IBM 1111
-
(1)1for (j=1L[j,1] and jj++) do begin /* 1 */2 if j=1 then {3 L[j,1] = Large_item_gen(T[1],j) /* T[1]1 1- */4 T[2] = Filtered_table(T[1],L[1,1]) /* L[1,1]T[1] */5 }6 else L[j,1] = Large_item_gen(T[2],j)
-
(2)7 for (k = 2;L[j,k-1]; k++) do begin /* j k- */8 Ck = Candidate_gen(L[j,k-1])9 for each T[2]t 10 Ckctcc 111 L[j,k] = Ckminsup[j]k- 12 end13 return LL[j] = j 14end
-
(3)3Large_item_gen (T[1],j) T[1]j1-L[j,1]1Large_item_gen(T[1],1) 1-L[1,1]6j(j>1)Large_item_gen(T[2],j) 1-L[j,1]L[j-1,1]L[j,1]2 11** 1-3 (111*) (112*)
-
(4)4Filtered_table(T[1],L[1,1]) L[1,1] T[1] ttttT[1]Filtered _table(T[1],L[1,1]) T[2]
-
7-57-4114T[1]11-L[1,1]{4***} 235 8 4 * Filtered_table L[1,1]T[1]T[2]T[1] 2 3214 4 9 10
-
7-6 (1)7-511-L[1,1]
-
7-6 (2)T[1]T[2]
-
7-6 (3)Candidate_gen12-14L[1,1]C2={{1***,2***}, {1***,4***}, {2***,4***}}{2***,4***}3L[1,2]={{1***,2***},{1***,4***}}L[1,2]C3={{1***,2***,4***}}{1***,2***,4***}3L[1,3]=
-
7-6 (4)12-L[1,2]
-
7-6 (5)222T[2]21-L[2,1]{41**} 2 3 8321-Candidate_genL[2,1]C2 = {{11**,12**}, {11**,21**}, {11**,22**}, {11**,41**}, {12**,21**}, {12**,22**}, {12**,41**}, {21**,22**}, {21**,41**}, {22**,41**}}2L[2,2] = {{11**,41**},{12**,21**},{12**,22**}}L[2,2]C3={{12**,21**,22**}}{12**,21**,22**}0L[2,3]=
-
7-6 (6)2 L[2,1] L[2,2]
-
7-6 (7)333T[2]31-L[3,1]Candidate_genL[3,1]C2={{111*,121*},{111*,211*},{111*,411*},{121*,211*},{121*,411*},{211*,411*}}3L[3,2] = {{121*,211*}}L[3,2]3-L[3,3]=
-
7-6 (8)3L[3,1] L[3,2]
-
7-6 (9)442T[2]41-L[4,1]Candidate_genL[4,1]C22L[4,2] ={{1212,2112}}L[4,2]3-L[4,3]=
-
7-6 (10)4 L[4,1] L[4,2]
-
7-7 (1)7-612340.80.70.70.6 1{1***} {2***} =7/8=0.875{2***} {1***} =7/7=1{1***} {4***} =4/8=0.5{4***} {1***} =4/4=1 124
-
7-7 (2) 2{11**} {41**} =2/3=0.67{41**} {11**} =2/3=0.67{12**} {21**} =3/5=0.6{21**} {12**} =3/4=0.75{12**} {22**} =2/5=0.4{22**} {12**} =2/3=0.67 4
-
7-7 (3) 3{121*} {211*} =3/3=1{211*} {121*} =3/4=0.75 12 4{1212} {2112} =2/2=1{2112} {1212} =2/3=0.67 12
-
7-7 (4) 1 (7-5) 2 (7-14) 4 (7-15) =0.875 =1 =1 CRT =0.75 17CRT = 1 17CRT = 0.75 IBM 17CRT = 1 17CRT IBM = 0.67
-
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
-
(1) 40% (quantitative association rule)
-
(2) (intervals)
-
q_ (q_item) q_ i qq_ q_ (q_itemset) q_ x q_x
-
q_ q_ q_
-
(1)i q_ , , ... , , ... q_
-
(2) T s 1 2 3 4
-
7-8{,,,,}iq_q_5030100204050010%[1][2..3][4..5]123q_ ( )
-
(1)Xq_Xq_ttq_Xq_X q_X q_q_ (large q_itemset)kq_k-q_ (k-q_itemset)
-
(2) X Y [, ] X Y q_ Z q_XY
-
()q_(LqiTid(large q_itemset generation using Tids))
-
7-6DB
-
7-6DB37-17DBDB7-7
-
7-17 ABCDEFG
-
7-7DB
-
q_(1)TS({x}) q_x (Tids) DBTS ({}) = {5,12,14} TS ({}) = {1,4,5,8}TS ({x1,x2}) q_x1x2TS ({x1}) TS ({x2}) TS ({x1,x2}) = TS ({x1}) TS ({x2}) TS ({,}) = TS ({}) TS ({}) ={5}
-
q_(2) x1,x2,...,xk q_TS ({x1,x2,...,xk}) q_{x1,x2,...,xk}SP ({x1,x2,...,xk}) TS ({x1,x2,...,xk}) : SP ({x1,x2,...,xk}) = Card (TS ({x1,x2,...,xk})) = Card (TS ({x1}) TS ({x2}) TS ({xk})) Card(S) S
-
7-8q_ 7-7DBq_
-
q_(3)LqiTidq_SP({x1,x2,...,xk}) {x1,x2,...,xk} k-q_ q_{x1,x2,...,xk} k-q_q_ Candidate_gen(k-1)-q_k-q_ (candidate k-q_itemset)k-q_
- q_(4)x[1]x[2]x[k-1](k-1)-q_ x k-1 q_Lkk-q_item(x[j]) q_x[j] q_{x[1],x[2],...,x[k-1]} item(x[1])
-
LqiTid LqiTid :q_TSSP1-q_q_SP1-q_k-q_k-q_CkTSSPk-q_
-
LqiTid 1 q_x TS({x}) SP({x}) /* */2L1={x | x q_ SP({x}) } /* 1-q_ */ 3for (k=2; |Lk-1| > 1; k++) do begin /* k-q_ */4 Lk-1k-q_Ck 5 for each q_c Ck do begin /* c (k-1)-q_ S1 S2 */ 6 TS(c)=TS(S1)TS(S2) SP(c)=Card(TS(c)) 7 If SP(c) then 8 Lk = Lk {c} 9 end 10end
-
7-1027-81-q_7-9 2-q_7-9q_C22-q_7-102-q_ 3-q_7-10q_C33-q_7-113-q_C4=L4=
-
7-91-q_
-
7-102-q_
-
7-113-q_
-
7-117-100.657-17
{} {} =2/3=0.67 {} {} =3/3=1 {} {C,[1..2]} =2/3=0.67 {} {} =2/2=1 {,} {} =2/3=0.67 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/2=1 {,} {} =2/3=0.67
-
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
-
(1) 10000 60007500400030%60% [=40%, =67%] 75%67%
-
(2)P(AB) = P(A) P(B)AB (independent)AB (dependent and correlated)AB (correlation)
-
(3)correlation < 1 A B (negatively correlated) A B correlation > 1 A B (positively correlated) A B correlation = 1 A B 1 1
-
(Association Rule Mining) (Multilevel Association Rule Mining) (Quantitative Association Rule Mining) (Correlation Analysis)
-
Apriori (hash) (cache)