acm sigkdd aug. 2003 – washington, dc m. el-hajj and o. r. zaïane, 2003 database lab. university...

36
M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta ACM SIGKDD Aug. 2003 – Washington, DC Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining Mohammad El-Hajj Osmar R. Zaïane KDD 2003 Department of Computing Science University of Alberta, Canada

Post on 22-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Inverted Matrix: Efficient Discovery of Frequent Items

in Large Datasets in the Context of Interactive Mining

Mohammad El-Hajj Osmar R. Zaïane

KDD 2003

Department of Computing ScienceUniversity of Alberta, Canada

Page 2: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Outline

• Introduction

• Pre-Processing Phase

• Mining Phase

Transactional Layouts

Building COFI-trees

Mining COFI-trees• Experimental Studies

• Conclusion and Future work

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 3: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Association rule mining is crucial in many applications and plays an essential role in many important mining tasks.

Association Rule Mining

Frequent Itemset Mining Association Rules Generation

1 2FIM

IntroductionPre-processingMining PhaseExperimentsConclusion

Antecedent Consequent Body Head

Page 4: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Expensive candidacy generation step

OR

Huge Memory based Data structures

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 5: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items T#

T1 A G D C B

T2 B C G E D

T3 B D E A G

T4 C E F A G

T5 A B G G G

T6 A C G G G

T7 A C G G G

T8 G E F G B

T9 A F G G G

T10 C F G G G

T11 A D B G G

T12 D E B G G

T13 G D C G G

T14 C F G G G

T15 B D E F G

T16 G E B A D

T17 A G E F C

T18 C D G B A

ItemsT#

T1 A D C B

T2 B C E D

T3 B D E A

T4 C E F A

T5 A B

T6 A C

T7 A C

T8 E F B

T9 A F

T10 C F

T11 A D B

T12 D E B

T13 D C

T14 C F

T15 B D E F

T16 E B A D

T17 A E F C

T18 C D B A

Items

Support > 4

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent 1-itemsets {A, B, C, D, E, F}Non frequent items {G, H, I, J, K, L, M, N, O, P, Q, R}

Page 6: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Support > 9

T#

T1 A G G C B

T2 B C G G G

T3 B G G A G

T4 C G G A G

T5 A B G G G

T6 A C G G G

T7 A C G G G

T8 G G G G B

T9 A G G G G

T10 C G G G G

T11 A G B G G

T12 G G B G G

T13 G G C G G

T14 C G G G G

T15 B G G G G

T16 G G B A G

T17 A G G G C

T18 C G G B A

ItemsT#

T1 A C B

T2 B C

T3 B A

T4 C A

T5 A B

T6 A C

T7 A C

T8 B

T9 A

T10 C

T11 A B

T12 B

T13 C

T14 C

T15 B

T16 B A

T17 A C

T18 C B A

Items

Challenges for FIM

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent 1-itemsets {A, B, C}Non frequent items {D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R}

Page 7: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Changing the support level means expensive steps (whole process is redone)

Knowledge

Data warehouse

Databases

Selection and Transformation

Data Mining

Evaluation and

Presentation

Patterns

Challenges for FIM

3. Non interactive mining

1. High memory dependency

2. Repetitive tasks, (I/O) readings (Superfluous Processing)

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 8: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

• New association Rule mining algorithm that has the following features

1. Low Memory Dependency 2. Remove Superfluous Processing3. Interactive Mining Ready

MotivationIntroductionPre-processingMining PhaseExperimentsConclusion

Without compromising scalability

Page 9: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transactional Layouts

• Horizontal Layout

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

Candidacy generation can be removed (FP-Growth)

Superfluous Processing

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 10: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items Items

A 1 3 4 5 6 7 9 11 16 17 18B 1 2 3 5 8 11 12 15 16 18C 1 2 4 6 7 10 13 14 17 18D 1 2 3 11 12 13 15 16 18E 2 3 4 8 12 15 16 17F 4 8 9 10 14 15 17G 1 6 7 13H 2 7 11I 7 11 15J 10 14 16K 8 12 17L 8 12 18M 3 9 13N 4 5 9O 5 9 13P 5 9 13Q 6 14R 6 10

Transaction ID

Candidacy generation is required

Minimize Superfluous Processing

Transactional Layouts

• Vertical Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 11: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Suggested Layout

• Inverted Matrix Layout: Combines the horizontal and vertical layouts

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

2 I/O passes

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 12: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 413 F 714 E 815 D 916 C 1017 B 1018 A 11

IndexTransactional Array

Loc

Transactional Layouts

• Inverted Matrix LayoutPass 1, generates sorted item list (based on frequency)

Transaction ID

1 A G D C B

2 B C H E D

3 B D E A M

4 C E F A N

5 A B N O P

6 A C Q R G

7 A C H I G

8 L E F K B

9 A F M N O

10 C F P J R

11 A D B H I

12 D E B K L

13 M D C G O

14 C F P Q J

15 B D E F I

16 J E B A D

17 A K E F C

18 C D L B A

Items

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 13: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 312 G 4 (15,1)13 F 714 E 815 D 9 (16,1)16 C 10 (17,1)17 B 10 (18,1)18 A 11 (¤, ¤)

Loc IndexTransactional Array

Pass 2, Generate the transactional array of the IM

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 14: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 22 Q 23 P 34 O 35 N 36 M 37 L 38 K 39 J 310 I 311 H 3 (14,1)12 G 4 (15,1)13 F 714 E 8 (15,2)15 D 9 (16,1) (16,2)16 C 10 (17,1) (17,2)17 B 10 (18,1) (¤, ¤)18 A 11 (¤, ¤)

Loc IndexTransactional Array

T#

T1 A G D C B

T2 B C H E D

T3 B D E A M

T4 C E F A N

T5 A B N O P

T6 A C Q R G

T7 A C H I G

T8 L E F K B

T9 A F M N O

T10 C F P J R

T11 A D B H I

T12 D E B K L

T13 M D C G O

T14 C F P Q J

T15 B D E F I

T16 J E B A D

T17 A K E F C

T18 C D L B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 15: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Transactional Layouts

• Inverted Matrix Layout

There is no minimum support involved in building the Inverted Matrix.

IntroductionPre-processingMining PhaseExperimentsConclusion

1 2 3 4 5 6 7 8 9 10 11

1 R 2 (2,1) (3,2)

2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (18,10) (17,10)17 B 10 (18,1) (18,2) (18,4) (18,8) (18,9) (18,11)18 A 11

LocTransactional Array

Index

(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)(¤, ¤) (¤, ¤) (¤, ¤) (¤, ¤)

(¤, ¤) (¤, ¤) (¤, ¤)

Page 16: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

1 R 2 (2,1) (3,2)2 Q 2 (12,2) (3,3)3 P 3 (4,1) (9,1) (9,2)4 O 3 (5,2) (5,3) (6,3)5 N 3 (13,1) (17,4) (6,2)6 M 3 (14,2) (13,3) (12,4)7 L 3 (8,1) (8,2) (15,9)8 K 3 (13,2) (14,5) (13,7)9 J 3 (13,4) (13,5) (14,7)10 I 3 (11,2) (11,3) (13,6)11 H 3 (14,1) (12,3) 15,4)12 G 4 (15,1) (16,4) (16,5) (15,6)

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

Loc IndexTransactional Array

Support > 4

Border Support

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 17: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

IndexTransactional Array

Loc

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 18: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

1 2 3 4 5 6 7 8 9 10 11

13 F 7 (14,3) (14,4) (18,7) (16,6) (16,8) (14,6) (14,8)14 E 8 (15,2) (15,3) (16,3) (17,5) (15,5) (15,7) (15,8) (16,9)15 D 9 (16,1) (16,2) (17,2) (17,6) (17,7) (16,7) (17,8) (17,9) (16,10)16 C 10 (17,1) (17,2) (18,3) (18,5) (18,6) (☼,☼) (☼,☼) (☼,☼) (18,10) (17,10)17 B 10 (18,1) (☼,☼) (18,2) (18,4) (☼,☼) (18,8) (☼,☼) (☼,☼) (18,9) (18,11)18 A 11 (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼) (☼,☼)

Loc IndexTransactional Array

T#

T1 A D C B

T2 B C E D

T3 B D E A

T4 C E F A

T5 A B

T6 A C

T7 A C

T8 E F B

T9 A F

T10 C F

T11 A D B

T12 D E B

T13 D C

T14 C F

T15 B D E F

T16 E B A D

T17 A E F C

T18 C D B A

Items

Transactional Layouts

• Inverted Matrix Layout

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 19: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Sub transactions generated from IM

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item FE D C B

E D B A

E C A

E B

E D B

E D B

E D B A

E C A

Item ED C B A

D C B

D B A

D B A

D B

D C

D B

D B A

D C B A

Item D

C B A

C B

C A

C A

C A

C A

C B A

Item C

B A

B A

B A

B A

B A

B A

Item B

IntroductionPre-processingMining PhaseExperimentsConclusion

Frequent sub-transaction with item F

Frequent sub-transaction with item D

Frequent sub-transaction with item E

Frequent sub-transaction with item B

Frequent sub-transaction with item C

Page 20: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Co-Occurrences Frequent Item tree

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 1 0 )

E 1 E ( 1 0 )D 0

C 1

B 0 C ( 1 0 )A 1

A ( 1 0 )

Building F-COFI-tree

Frequency Count

Participation Count

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 21: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 2 0 )

E 2 E ( 2 0 )D 0

C 1

B 1 C ( 1 0 ) B ( 1 0 )A 1

A ( 1 0 )

Co-Occurrences

Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 22: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 3 0 )

E 2 E ( 2 0 ) A ( 1 0 )D 0

C 1

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 23: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 4 0 )

E 2 E ( 2 0 ) A ( 1 0 ) C ( 1 0 )D 0

C 2

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 24: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 5 0 )

E 2 E ( 2 0 ) A ( 1 0 ) C ( 2 0 )D 0

C 3

B 1 C ( 1 0 ) B ( 1 0 )A 2

A ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 25: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

F ( 7 0 )

E 3 E ( 3 0 ) A ( 1 0 ) C ( 2 0 )D 1

C 3

B 2 C ( 1 0 ) B ( 1 0 ) D ( 1 0 )A 2

A ( 1 0 ) B ( 1 0 )

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 26: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

F ( 7 0 )

E 4 E ( 4 0 ) A ( 1 0 ) C ( 2 0 )D 1

C 4

B 2 C ( 2 0 ) B ( 1 0 ) D ( 1 0 )A 3

A ( 2 0 ) B ( 1 0 )

F E C A

F E B

F A

F C

F C

F E D B

F E C A

Item F

Co-Occurrences Frequent Item tree

Building F-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 27: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E D C B

E D B A

E C A

E B

E D B

E D B

E D B A

E C A

Item E

D C B A

D C B

D B A

D B A

D B

D C

D B

D B A

D C B A

Item D

E ( 8 0 )

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )

C 3

B 6

A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )

B ( 1 0 ) A ( 2 0 )

D ( 9 0 )

C 4 C ( 4 0 ) B ( 5 0 )

B 8

A 5

B ( 3 0 ) A ( 3 0 )

A ( 2 0 )

C B A

C B

C A

C A

C A

C A

C B A

Item CC ( 7 0 )

B ( 3 0 ) A ( 4 0 )

B 3

A 6

A ( 2 0 )

B ( 6 0 )

A 6

A ( 6 0 )

B A

B A

B A

B A

B A

B A

Item B

Co-Occurrences Frequent Item tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 28: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 0 )

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 )

C 3

B 6

A 4 C ( 1 0 ) B ( 4 0 ) A ( 2 0 )

B ( 1 0 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 29: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

 

E ( 8 0 ) E ( 8 1 )

E D B 1

D 5 D ( 5 0 ) C ( 2 0 ) B ( 1 0 ) D ( 5 1 ) E D 1

C 3 E B 1

B 6 E D B 1

A 3 C ( 1 0 ) B ( 4 0 ) A ( 2 0 ) C ( 1 1 )

B ( 1 0 ) A ( 2 0 ) B ( 1 1 )

E-COFI-treeSupport = Frequency count – Participation count

Mining COFI-treesIntroductionPre-processingMining PhaseExperimentsConclusion

Page 30: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 1 ) E ( 8 5 ) E D B 4

E D 5

D 5 D ( 5 1 ) C ( 2 0 ) B ( 1 0 ) D ( 5 5 ) E B 5

C 3 E D B 5

B 6

A 3 C ( 1 1 ) B ( 4 0 ) A ( 2 0 ) B ( 4 4 )

B ( 1 1 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 31: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E B 1

E ( 8 5 ) E ( 8 6 )

E D 5

E B 6

D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 0 ) B ( 1 1 ) E D B 5

C 3

B 6

A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )

B ( 1 1 ) A ( 2 0 )

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 32: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

E ( 8 6 ) E ( 8 6 ) No change

D 5 D ( 5 5 ) C ( 2 0 ) B ( 1 1 ) D ( 5 5 )

C 3

B 6

A 3 C ( 1 1 ) B ( 4 4 ) A ( 2 0 )

B ( 1 1 ) A ( 2 0 ) E D 5

E B 6

E D B 5

Mining COFI-trees

E-COFI-tree

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 33: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

D ( 9 0 )

C 4 C ( 4 0 ) B ( 5 0 )

B 8

A 5

B ( 3 0 ) A ( 3 0 )

A ( 2 0 )

C ( 7 0 )

B ( 3 0 ) A ( 4 0 )

B 3

A 6

A ( 2 0 )

B ( 6 0 )

A 6

A ( 6 0 )

Mining COFI-trees

D-COFI-tree

DBA:5

DA:5

DB:8

C-COFI-tree

CA:6

B-COFI-tree

BA:6

IntroductionPre-processingMining PhaseExperimentsConclusion

Page 34: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

0

2000

4000

6000

8000

10000

12000

0.0025 0.005 0.0075 0.01

Support %

Tim

e in

Sec

on

ds

Apriori

FP-Growth

Inverted Matrix

Experimental Studies

Time needed to mine 1M transactions with different support levels

IntroductionPre-processingMining PhaseExperimentsConclusion

Pentium 700Mhz with 256 MB of RAM

Page 35: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Experimental Studies

Accumulated time needed to mine 1M transactions using 4 different

support levels

IntroductionPre-processingMining PhaseExperimentsConclusion

Pentium 700Mhz with 256 MB of RAM

Time needed in seconds to mine different transaction sizes

Page 36: ACM SIGKDD Aug. 2003 – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery

M. El-Hajj and O. R. Zaïane, 2003Database Lab. University of Alberta

Canada

ACM SIGKDDAug. 2003 – Washington, DC

Conclusion and Future work

Updateable Inverted Matrix for native storage of transactions

Compressing the size of Inverted Matrix

IntroductionPre-processingMining PhaseExperimentsConclusion

Future work

New AR algorithm1. Low memory dependency2. No Superfluous processing3. Interactive mining ready4. scalable

Parallelizing the mining process as well as the construction of the Inverted Matrix