M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Inverted Matrix: Efficient Discovery of Frequent Items
in Large Datasets in the Context of Interactive Mining
Mohammad El-Hajj Osmar R. Zaïane
KDD 2003
Department of Computing ScienceUniversity of Alberta, Canada
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Outline
• Introduction
• Pre-Processing Phase
• Mining Phase
Transactional Layouts
Building COFI-trees
Mining COFI-trees• Experimental Studies
• Conclusion and Future work
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Association rule mining is crucial in many applications and plays an essential role in many important mining tasks.
Association Rule Mining
Frequent Itemset Mining Association Rules Generation
1 2FIM
IntroductionPre-processingMining PhaseExperimentsConclusion
Antecedent Consequent Body Head
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Expensive candidacy generation step
OR
Huge Memory based Data structures
Challenges for FIM
3. Non interactive mining
1. High memory dependency
2. Repetitive tasks, (I/O) readings (Superfluous Processing)
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
T#
T1 A G D C B
T2 B C H E D
T3 B D E A M
T4 C E F A N
T5 A B N O P
T6 A C Q R G
T7 A C H I G
T8 L E F K B
T9 A F M N O
T10 C F P J R
T11 A D B H I
T12 D E B K L
T13 M D C G O
T14 C F P Q J
T15 B D E F I
T16 J E B A D
T17 A K E F C
T18 C D L B A
Items T#
T1 A G D C B
T2 B C G E D
T3 B D E A G
T4 C E F A G
T5 A B G G G
T6 A C G G G
T7 A C G G G
T8 G E F G B
T9 A F G G G
T10 C F G G G
T11 A D B G G
T12 D E B G G
T13 G D C G G
T14 C F G G G
T15 B D E F G
T16 G E B A D
T17 A G E F C
T18 C D G B A
ItemsT#
T1 A D C B
T2 B C E D
T3 B D E A
T4 C E F A
T5 A B
T6 A C
T7 A C
T8 E F B
T9 A F
T10 C F
T11 A D B
T12 D E B
T13 D C
T14 C F
T15 B D E F
T16 E B A D
T17 A E F C
T18 C D B A
Items
Support > 4
Challenges for FIM
3. Non interactive mining
1. High memory dependency
2. Repetitive tasks, (I/O) readings (Superfluous Processing)
IntroductionPre-processingMining PhaseExperimentsConclusion
Frequent 1-itemsets {A, B, C, D, E, F}Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Support > 9
T#
T1 A G G C B
T2 B C G G G
T3 B G G A G
T4 C G G A G
T5 A B G G G
T6 A C G G G
T7 A C G G G
T8 G G G G B
T9 A G G G G
T10 C G G G G
T11 A G B G G
T12 G G B G G
T13 G G C G G
T14 C G G G G
T15 B G G G G
T16 G G B A G
T17 A G G G C
T18 C G G B A
ItemsT#
T1 A C B
T2 B C
T3 B A
T4 C A
T5 A B
T6 A C
T7 A C
T8 B
T9 A
T10 C
T11 A B
T12 B
T13 C
T14 C
T15 B
T16 B A
T17 A C
T18 C B A
Items
Challenges for FIM
T#
T1 A G D C B
T2 B C H E D
T3 B D E A M
T4 C E F A N
T5 A B N O P
T6 A C Q R G
T7 A C H I G
T8 L E F K B
T9 A F M N O
T10 C F P J R
T11 A D B H I
T12 D E B K L
T13 M D C G O
T14 C F P Q J
T15 B D E F I
T16 J E B A D
T17 A K E F C
T18 C D L B A
Items
3. Non interactive mining
1. High memory dependency
2. Repetitive tasks, (I/O) readings (Superfluous Processing)
IntroductionPre-processingMining PhaseExperimentsConclusion
Frequent 1-itemsets {A, B, C}Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Changing the support level means expensive steps (whole process is redone)
Knowledge
Data warehouse
Databases
Selection and Transformation
Data Mining
Evaluation and
Presentation
Patterns
Challenges for FIM
3. Non interactive mining
1. High memory dependency
2. Repetitive tasks, (I/O) readings (Superfluous Processing)
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
• New association Rule mining algorithm that has the following features
1. Low Memory Dependency 2. Remove Superfluous Processing3. Interactive Mining Ready
MotivationIntroductionPre-processingMining PhaseExperimentsConclusion
Without compromising scalability
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Transactional Layouts
• Horizontal Layout
Transaction ID
1 A G D C B
2 B C H E D
3 B D E A M
4 C E F A N
5 A B N O P
6 A C Q R G
7 A C H I G
8 L E F K B
9 A F M N O
10 C F P J R
11 A D B H I
12 D E B K L
13 M D C G O
14 C F P Q J
15 B D E F I
16 J E B A D
17 A K E F C
18 C D L B A
Items
Candidacy generation can be removed (FP-Growth)
Superfluous Processing
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Transaction ID
1 A G D C B
2 B C H E D
3 B D E A M
4 C E F A N
5 A B N O P
6 A C Q R G
7 A C H I G
8 L E F K B
9 A F M N O
10 C F P J R
11 A D B H I
12 D E B K L
13 M D C G O
14 C F P Q J
15 B D E F I
16 J E B A D
17 A K E F C
18 C D L B A
Items Items
A 1 3 4 5 6 7 9 11 16 17 18B 1 2 3 5 8 11 12 15 16 18C 1 2 4 6 7 10 13 14 17 18D 1 2 3 11 12 13 15 16 18E 2 3 4 8 12 15 16 17F 4 8 9 10 14 15 17G 1 6 7 13H 2 7 11I 7 11 15J 10 14 16K 8 12 17L 8 12 18M 3 9 13N 4 5 9O 5 9 13P 5 9 13Q 6 14R 6 10
Transaction ID
Candidacy generation is required
Minimize Superfluous Processing
Transactional Layouts
• Vertical Layout
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Suggested Layout
• Inverted Matrix Layout: Combines the horizontal and vertical layouts
Transaction ID
1 A G D C B
2 B C H E D
3 B D E A M
4 C E F A N
5 A B N O P
6 A C Q R G
7 A C H I G
8 L E F K B
9 A F M N O
10 C F P J R
11 A D B H I
12 D E B K L
13 M D C G O
14 C F P Q J
15 B D E F I
16 J E B A D
17 A K E F C
18 C D L B A
Items
2 I/O passes
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
1 2 3 4 5 6 7 8 9 10 11
1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 413 F 714 E 815 D 916 C 1017 B 1018 A 11
IndexTransactional Array
Loc
Transactional Layouts
• Inverted Matrix LayoutPass 1, generates sorted item list (based on frequency)
Transaction ID
1 A G D C B
2 B C H E D
3 B D E A M
4 C E F A N
5 A B N O P
6 A C Q R G
7 A C H I G
8 L E F K B
9 A F M N O
10 C F P J R
11 A D B H I
12 D E B K L
13 M D C G O
14 C F P Q J
15 B D E F I
16 J E B A D
17 A K E F C
18 C D L B A
Items
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
1 2 3 4 5 6 7 8 9 10 11
1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 4 (15,1)13 F 714 E 815 D 9 (16,1)16 C 10 (17,1)17 B 10 (18,1)18 A 11 (¤, ¤)
Loc IndexTransactional Array
Pass 2, Generate the transactional array of the IM
T#
T1 A G D C B
T2 B C H E D
T3 B D E A M
T4 C E F A N
T5 A B N O P
T6 A C Q R G
T7 A C H I G
T8 L E F K B
T9 A F M N O
T10 C F P J R
T11 A D B H I
T12 D E B K L
T13 M D C G O
T14 C F P Q J
T15 B D E F I
T16 J E B A D
T17 A K E F C
T18 C D L B A
Items
Transactional Layouts
• Inverted Matrix Layout
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
1 2 3 4 5 6 7 8 9 10 11
1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 3 (14,1)12 G 4 (15,1)13 F 714 E 8 (15,2)15 D 9 (16,1) (16,2)16 C 10 (17,1) (17,2)17 B 10 (18,1) (¤, ¤)18 A 11 (¤, ¤)
Loc IndexTransactional Array
T#
T1 A G D C B
T2 B C H E D
T3 B D E A M
T4 C E F A N
T5 A B N O P
T6 A C Q R G
T7 A C H I G
T8 L E F K B
T9 A F M N O
T10 C F P J R
T11 A D B H I
T12 D E B K L
T13 M D C G O
T14 C F P Q J
T15 B D E F I
T16 J E B A D
T17 A K E F C
T18 C D L B A
Items
Transactional Layouts
• Inverted Matrix Layout
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Transactional Layouts
• Inverted Matrix Layout
There is no minimum support involved in building the Inverted Matrix.
IntroductionPre-processingMining PhaseExperimentsConclusion
1 2 3 4 5 6 7 8 9 10 11
1 R 2 (2,1) (3,2)
2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10)17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11)18 A 11
LocTransactional Array
Index
(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)
(¤, ¤) (¤, ¤) (¤, ¤)
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
1 2 3 4 5 6 7 8 9 10 11
1 R 2 (2,1) (3,2)2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)
13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)
Loc IndexTransactional Array
Support > 4
Border Support
Transactional Layouts
• Inverted Matrix Layout
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
1 2 3 4 5 6 7 8 9 10 11
13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)
IndexTransactional Array
Loc
Transactional Layouts
• Inverted Matrix Layout
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
1 2 3 4 5 6 7 8 9 10 11
13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)
Loc IndexTransactional Array
T#
T1 A D C B
T2 B C E D
T3 B D E A
T4 C E F A
T5 A B
T6 A C
T7 A C
T8 E F B
T9 A F
T10 C F
T11 A D B
T12 D E B
T13 D C
T14 C F
T15 B D E F
T16 E B A D
T17 A E F C
T18 C D B A
Items
Transactional Layouts
• Inverted Matrix Layout
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Sub transactions generated from IM
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item FE D C B
E D B A
E C A
E B
E D B
E D B
E D B A
E C A
Item ED C B A
D C B
D B A
D B A
D B
D C
D B
D B A
D C B A
Item D
C B A
C B
C A
C A
C A
C A
C B A
Item C
B A
B A
B A
B A
B A
B A
Item B
IntroductionPre-processingMining PhaseExperimentsConclusion
Frequent sub-transaction with item F
Frequent sub-transaction with item D
Frequent sub-transaction with item E
Frequent sub-transaction with item B
Frequent sub-transaction with item C
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Co-Occurrences Frequent Item tree
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
F ( 1 0 )
E 1 E ( 1 0 )D 0
C 1
B 0 C ( 1 0 )A 1
A ( 1 0 )
Building F-COFI-tree
Frequency Count
Participation Count
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
F ( 2 0 )
E 2 E ( 2 0 )D 0
C 1
B 1 C ( 1 0 ) B ( 1 0 )A 1
A ( 1 0 )
Co-Occurrences
Frequent Item tree
Building F-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
F ( 3 0 )
E 2 E ( 2 0 ) A ( 1 0 )D 0
C 1
B 1 C ( 1 0 ) B ( 1 0 )A 2
A ( 1 0 )
Co-Occurrences Frequent Item tree
Building F-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
F ( 4 0 )
E 2 E ( 2 0 ) A ( 1 0 ) C ( 1 0 )D 0
C 2
B 1 C ( 1 0 ) B ( 1 0 )A 2
A ( 1 0 )
Co-Occurrences Frequent Item tree
Building F-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
F ( 5 0 )
E 2 E ( 2 0 ) A ( 1 0 ) C ( 2 0 )D 0
C 3
B 1 C ( 1 0 ) B ( 1 0 )A 2
A ( 1 0 )
Co-Occurrences Frequent Item tree
Building F-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
F ( 7 0 )
E 3 E ( 3 0 ) A ( 1 0 ) C ( 2 0 )D 1
C 3
B 2 C ( 1 0 ) B ( 1 0 ) D ( 1 0 )A 2
A ( 1 0 ) B ( 1 0 )
Co-Occurrences Frequent Item tree
Building F-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
F ( 7 0 )
E 4 E ( 4 0 ) A ( 1 0 ) C ( 2 0 )D 1
C 4
B 2 C ( 2 0 ) B ( 1 0 ) D ( 1 0 )A 3
A ( 2 0 ) B ( 1 0 )
F E C A
F E B
F A
F C
F C
F E D B
F E C A
Item F
Co-Occurrences Frequent Item tree
Building F-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
E D C B
E D B A
E C A
E B
E D B
E D B
E D B A
E C A
Item E
D C B A
D C B
D B A
D B A
D B
D C
D B
D B A
D C B A
Item D
E ( 8 0 )
D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )
C 3
B 6
A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )
B ( 1 0 ) A ( 2 0 )
D ( 9 0 )
C 4 C ( 4 0 ) B ( 5 0 )
B 8
A 5
B ( 3 0 ) A ( 3 0 )
A ( 2 0 )
C B A
C B
C A
C A
C A
C A
C B A
Item CC ( 7 0 )
B ( 3 0 ) A ( 4 0 )
B 3
A 6
A ( 2 0 )
B ( 6 0 )
A 6
A ( 6 0 )
B A
B A
B A
B A
B A
B A
Item B
Co-Occurrences Frequent Item tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
E ( 8 0 )
D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )
C 3
B 6
A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )
B ( 1 0 ) A ( 2 0 )
Mining COFI-trees
E-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
E ( 8 0 ) E ( 8 1 )
E D B 1
D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 ) D ( 5 1 ) E D 1
C 3 E B 1
B 6 E D B 1
A 3 C ( 1 0 ) B ( 4 0 ) A ( 2 0 ) C ( 1 1 )
B ( 1 0 ) A ( 2 0 ) B ( 1 1 )
E-COFI-treeSupport = Frequency count – Participation count
Mining COFI-treesIntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
E ( 8 1 ) E ( 8 5 ) E D B 4
E D 5
D 5 D ( 5 1 ) C ( 2 0 ) B ( 1 0 ) D ( 5 5 ) E B 5
C 3 E D B 5
B 6
A 3 C ( 1 1 ) B ( 4 0 ) A ( 2 0 ) B ( 4 4 )
B ( 1 1 ) A ( 2 0 )
Mining COFI-trees
E-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
E B 1
E ( 8 5 ) E ( 8 6 )
E D 5
E B 6
D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 0 ) B ( 1 1 ) E D B 5
C 3
B 6
A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )
B ( 1 1 ) A ( 2 0 )
Mining COFI-trees
E-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
E ( 8 6 ) E ( 8 6 ) No change
D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 1 ) D ( 5 5 )
C 3
B 6
A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )
B ( 1 1 ) A ( 2 0 ) E D 5
E B 6
E D B 5
Mining COFI-trees
E-COFI-tree
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
D ( 9 0 )
C 4 C ( 4 0 ) B ( 5 0 )
B 8
A 5
B ( 3 0 ) A ( 3 0 )
A ( 2 0 )
C ( 7 0 )
B ( 3 0 ) A ( 4 0 )
B 3
A 6
A ( 2 0 )
B ( 6 0 )
A 6
A ( 6 0 )
Mining COFI-trees
D-COFI-tree
DBA:5
DA:5
DB:8
C-COFI-tree
CA:6
B-COFI-tree
BA:6
IntroductionPre-processingMining PhaseExperimentsConclusion
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
0
2000
4000
6000
8000
10000
12000
0.0025 0.005 0.0075 0.01
Support %
Tim
e in
Sec
on
ds
Apriori
FP-Growth
Inverted Matrix
Experimental Studies
Time needed to mine 1M transactions with different support levels
IntroductionPre-processingMining PhaseExperimentsConclusion
Pentium 700Mhz with 256 MB of RAM
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Experimental Studies
Accumulated time needed to mine 1M transactions using 4 different
support levels
IntroductionPre-processingMining PhaseExperimentsConclusion
Pentium 700Mhz with 256 MB of RAM
Time needed in seconds to mine different transaction sizes
M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta
Canada
ACM SIGKDDAug. 2003 – Washington, DC
Conclusion and Future work
Updateable Inverted Matrix for native storage of transactions
Compressing the size of Inverted Matrix
IntroductionPre-processingMining PhaseExperimentsConclusion
Future work
New AR algorithm1. Low memory dependency2. No Superfluous processing3. Interactive mining ready4. scalable
Parallelizing the mining process as well as the construction of the Inverted Matrix