an efficient algorithm for incremental mining of association rules
DESCRIPTION
An Efficient Algorithm for Incremental Mining of Association Rules. Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee RIDE-SDMA ’ 05 Speaker :董原賓 Advisor : 柯佳伶. Introduction. Previous incremental mining algorithms FUP (Fast Update Algorithm) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/1.jpg)
1
An Efficient Algorithm for Incremental Mining of Association Rules
Chin-Chen Chang, Yu-Chiang Li, Jung-San Lee
RIDE-SDMA’05
Speaker :董原賓 Advisor :柯佳伶
![Page 2: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/2.jpg)
2
Introduction Previous incremental mining algorithms
FUP (Fast Update Algorithm) FUP2 negative border※They all have to rescan the originally database
Problem Publication-like database
EX : Publication database, web log records, etc. The original database is normally much larger than the incremental database
Solution NFUP (New Fast Update Algorithm)
![Page 3: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/3.jpg)
3
Definition
DB : original database db : the set of newly added transaction
s DB+ : DB + db n, Pn : db is divided into n partitions, db = P1UP2U,…,UPn-1UPn
dbm,n = PmUPm+1U,…,UPn-1UPn
![Page 4: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/4.jpg)
4
Definition α set: frequent itemsets in DB+
β set: frequent in dbm,n , (m ≤ n), but infrequent in dbm-1,n
γ set: frequent in dbm,m, but infrequent in dbm+1,n
X.count : occurrence count
X.start : partition number when X becomes frequent
X.type : denotes one of the three types α,β, and γ
![Page 5: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/5.jpg)
5
FUP (Fast Update Algorithm)
In case2, itemset is easily calculated In case3, FUP needs to rescan the orig
inal database
![Page 6: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/6.jpg)
6
NFUP (New Fast Update Algo.) A backward method that only requires scan
ning incremental database
A frequent itemset in the incremental database is also important even if it is infrequent in the updated database
Partition the incremental database (db) by the time interval
![Page 7: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/7.jpg)
7
NFUP The frequent set of itemsets of DB is k
nown in advance
NFUP scans each partition backward, the last partition is scanned first
In each partition, the process is performed like that of Apriori.
![Page 8: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/8.jpg)
8
NFUP
![Page 9: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/9.jpg)
9
Scan from Pn to P1 and find the α,β,γ itemsets in db
After P1 is scanned, the occurrence count is accumulated with itemsets of DB
![Page 10: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/10.jpg)
10
The latest partition is scanned first, initialize variables and accumulate the occurrence
Still frequent in Pm then
accumulate count
Still frequent in dbm,n then accumulate count
Only frequent in dbm+1,n then Remove from α set and addInto β set
Not belong to any set and frequent in Pm then check if Pm is the latest partitionYes α set No γ set
![Page 11: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/11.jpg)
11
Example
Scan p2 : 1-itemset
α set startcountβ set startcount γ set startcount
Min sup = 50%
{A: 2} {B: 2} {C: 3}{D: 1} {E: 1} {F: 2}
3 x 0.5 = 1.5
Check if itemset belongs to α setElse check itemset doesn’t belongs to any setCheck if itemset’s count >= 1.5Check if P2 is the latest partition yes α no γ
{A} 2 2
{B} 2 2
{C} 2 3
{F} 2 2
{AB} 2 2
{AC} 2 2
{BC} 2 2
{CF} 2 2
{ABC} 2 2
Run Apriori-gen scan P2 : 2-itemset {AB: 2} {AC: 2} {AF: 1} {BC: 2} {BF: 1} {CF: 2}
Check if itemset belongs to α set Else check itemset doesn’t belong to any set Check if itemset’s count >= 1.5 Check if P2 is the latest partition yes α no γ
{ABC: 2}Scan P2 : 3-itemset
![Page 12: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/12.jpg)
12
Example
Scan p1 : 1-itemset
α set startcountβ set startcount γ set startcount
Min sup = 50%
{A: 1} {B: 3} {C: 2}{D: 1} {E: 3} {F: 0}
3 x 0.5 = 1.5
Check if itemset belongs to α set Check itemset doesn’t belongs to any setElse check if itemset’s count >= 1.5Check if P1 is the latest partition yes α no γ
{A} 2 2
{B} 2 2
{C} 2 3
{F} 2 2
{AB} 2 2
{AC} 2 2
{BC} 2 2
{CF} 2 2
{ABC} 2 2
Run Apriori-genscan P1 : 2-itemset {AB: 1} {AC: 0} {BC: 2}{BE: 3} {CE: 2}Check if itemset belon
gs to α set Check itemset doesn’t belong to any set Else check if itemset’s count >= 1.5 Check if P1 is the latest partition yes α no γ
Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set
Yesaccumulate countCount < s*|dbm,n| = 0.5x6 = 3 β set
3
5
51
1
1{F} 2 2 {E} 1 3
3
4
1
1
{AC} 2 2
{CF} 2 2
{BE} 1 3
{CE} 1 2
{ABC} 2 2
![Page 13: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/13.jpg)
13
Example
α set startcount
{A} 1 3
{B} 1 5
{C} 1 5
{AB} 1 3
{BC} 1 4
γ set startcount
{E} 1 3
{BE} 1 3
{CE} 1 2
β set startcount
{F} 2 2
{AC} 2 2
{CF} 2 2
{ABC} 2 2
7
8
90
0
0
{AB} 1 3
{BC} 1 4
{ABC} 2 2
{AE} 0 3
![Page 14: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/14.jpg)
14
Experiment
Intel Pentium IV 1.5GHz CPU, 640 MB main memory
Microsoft Windows 2000 Professional Synthetic datasets:
![Page 15: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/15.jpg)
15
Experiment
![Page 16: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/16.jpg)
16
Experiment
![Page 17: An Efficient Algorithm for Incremental Mining of Association Rules](https://reader035.vdocuments.mx/reader035/viewer/2022062410/568157c9550346895dc5551a/html5/thumbnails/17.jpg)
17
Experiment