lecture14 - advanced topics in association rules

21
Introduction to Machine Introduction to Machine Learning Learning Lecture 14 Lecture 14 Advanced Topics in Association Rules Mining Albert Orriols i Puig il@ ll l d aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Upload: albert-orriols-puig

Post on 24-Jan-2015

1.730 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lecture14 - Advanced topics in association rules

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 14Lecture 14Advanced Topics in Association Rules Mining

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Page 2: Lecture14 - Advanced topics in association rules

Recap of Lecture 13Ideas come from the market basket analysis (MBA)y ( )

Let’s go shopping!

Milk, eggs, sugar, bread

Milk, eggs, cereal, b d

Eggs, sugar

bread

Customer1Customer1

Customer2 Customer3

What do my customer buy? Which product are bought together?

Aim: Find associations and correlations between the different

Slide 2

d assoc at o s a d co e at o s bet ee t e d e e titems that customers place in their shopping basket

Artificial Intelligence Machine Learning

Page 3: Lecture14 - Advanced topics in association rules

Recap of Lecture 13

D t b TDBItemset sup

Itemset supDatabase TDBC1

L1Tid Items10 A C D

{A} 2{B} 3{C} 3

Itemset sup{A} 2{B} 3

1st scan10 A, C, D20 B, C, E30 A, B, C, E

{C} 3{D} 1{E} 3

{C} 3{E} 3

C2 C2

40 B, E

ItemsetItemset sup{A B} 1L2 2nd scan

te set{A, B}{A, C}

{A, B} 1{A, C} 2{A, E} 1

Itemset sup{A, C} 2{B C} 2 {A, E}

{B, C}{B E}

{B, C} 2{B, E} 3{C, E} 2

{B, C} 2{B, E} 3{C, E} 2

C3 L33rd scan

{B, E}{C, E}

{C, E} 2

Itemset It t

Slide 3Artificial Intelligence Machine Learning

3 33 scante set{B, C, E}

Itemset sup{B, C, E} 2

Page 4: Lecture14 - Advanced topics in association rules

Recap of Lecture 13Challengesg

Apriori scans the data base multiple times

M t ft th i hi h b f did tMost often, there is a high number of candidates

Support counting for candidates can be time expensive

Several methods try to improve this points bySeveral methods try to improve this points byReduce the number of scans of the data base

Shrink the number of candidates

Counting the support of candidates more efficiently

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture14 - Advanced topics in association rules

Today’s Agenda

Starting a journey through some advancedStarting a journey through some advanced topics in ARM

Mining frequent patterns without candidate generation

Multiple Level AR

Sequential Pattern MiningSequential Pattern Mining

Quantitative association rules

Mining class association rules

B d t & fidBeyond support & confidence

Applications

Slide 5Artificial Intelligence Machine Learning

Page 6: Lecture14 - Advanced topics in association rules

Revisiting Candidate Generation

Remember A priori?pUse the previous frequent itemsets (k-1) to generate the k-itemsetste sets

Count itemsets support by scanning the data base

Bottleneck in the process: Candidate generationSuppose 100 items

First level of the tree 100 nodesFirst level of the tree 100 nodes

Second level of the tree ⎟⎟⎠

⎞⎜⎜⎝

⎛2

100

In general, number of k-itemsets: ⎟⎟⎠

⎞⎜⎜⎝

⎛k

100

⎟⎠

⎜⎝ 2

Slide 6Artificial Intelligence Machine Learning

⎟⎠

⎜⎝ k

Page 7: Lecture14 - Advanced topics in association rules

Can We Avoid Generation?Build an auxiliar structure to get statistics about the gitemsets in order to avoid candidate generation

Use an FP-treeUse an FP tree

Avoid multiple scans of the data

Divide-and-conquer methodology

Avoid candidate generation

Outline of the process:Outline of the process:Generate an FP-Tree

Mine the FP-tree

Slide 7Artificial Intelligence Machine Learning

Page 8: Lecture14 - Advanced topics in association rules

Building the FP-Tree

TID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P}

2 {A,B,C,F,L,M,O} {F,C,A,B,M}

3 {B,F,H,J,O} {F,B}

4 {B,C,K,S,P} {C,B,P}

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P}

Scan the DB for the first time and identify frequent itemsets. They are: <(f:4),(c:4), (a:3),(b:3),(m:3),(p:3)>

We sort the items according to their frequency in the last column

Slide 8Artificial Intelligence Machine Learning

Page 9: Lecture14 - Advanced topics in association rules

Building the FP-TreeAfter reading TID1:

TID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:1root

2 {A,B,C,F,L,M,O} {F,C,A,B,M}

3 {B,F,H,J,O} {F,B} C:14 {B,C,K,S,P} {C,B,P}

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} A:1

M:1

P:1

Scan again the DB to build the tree

Slide 9Artificial Intelligence Machine Learning

g

Page 10: Lecture14 - Advanced topics in association rules

Building the FP-TreeAfter reading TID2:

TID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:2root

2 {A,B,C,F,L,M,O} {F,C,A,B,M}

3 {B,F,H,J,O} {F,B} C:24 {B,C,K,S,P} {C,B,P}

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} A:2

M:1B:1

B:1P:1

B:1

Slide 10Artificial Intelligence Machine Learning

Page 11: Lecture14 - Advanced topics in association rules

Building the FP-TreeAfter reading TID3:

TID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:3root

2 {A,B,C,F,L,M,O} {F,C,A,B,M}

3 {B,F,H,J,O} {F,B} C:2B:1

4 {B,C,K,S,P} {C,B,P}

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} A:2

M:1B:1

M:1P:1

M:1

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture14 - Advanced topics in association rules

Building the FP-TreeAfter reading TID4:

TID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:3root

C:12 {A,B,C,F,L,M,O} {F,C,A,B,M}

3 {B,F,H,J,O} {F,B} C:2B:1

B:14 {B,C,K,S,P} {C,B,P}

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} A:2 P:1

M:1B:1

M:1P:1

M:1

Slide 12Artificial Intelligence Machine Learning

Page 13: Lecture14 - Advanced topics in association rules

Building the FP-TreeAfter reading TID5:

TID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P} F:4root

C:12 {A,B,C,F,L,M,O} {F,C,A,B,M}

3 {B,F,H,J,O} {F,B} C:3B:1

B:14 {B,C,K,S,P} {C,B,P}

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P} A:3 P:1

M:2B:1

M:1P:2

M:1

Slide 13Artificial Intelligence Machine Learning

Page 14: Lecture14 - Advanced topics in association rules

Building the FP-TreeTID Items Sorted FIS

1 {F,A,C,D,G,I,M,P} {F,C,A,M,P}

2 {A,B,C,F,L,M,O} {F,C,A,B,M} root

3 {B,F,H,J,O} {F,B}

4 {B,C,K,S,P} {C,B,P}

F:4

C 3B:1

C:1

B 1

Item

F

5 {A,F,C,E,L,P,M,N} {F,C,A,M,P}C:3

A:3

B:1

P:1

C

A A:3

M:2B:1

P:1B

M

P:2M:1P

Build and index to access quickly to the nodes and traverse the tree

Slide 14Artificial Intelligence Machine Learning

q y

Page 15: Lecture14 - Advanced topics in association rules

Mining the FP-TreeProperties to mine the FP-treep

Node-link prop.: All possible itemsets in which the frequent item a is included can be found by following a’s node-linksa s c uded ca be ou d by o o g a s ode s

F:4root

C:1F:4

C:3B:1

C:1

B:1

Item

F

P has support of 3

Two paths in the FP-tree for node P

A:3 P:1

C

A

tree for node P

1. {F,C,A,M}

2 {C B P}

M:2B:1

B

M

P

2. {C,B,P}

P:2M:1P

Slide 15Artificial Intelligence Machine Learning

Page 16: Lecture14 - Advanced topics in association rules

Mining the FP-TreePrefix path prop.: To calculate the frequent patterns for a node p p p q pa in path P, only the prefix subpath of node of node a in Pneeds to be accumulated, and the frequency count of every node in the prefix path should carry the same count as node anode in the prefix path should carry the same count as node a

rootN d P i i l d i

rootF:4

B:1

C:1Item

F

Node P is involved in:

(F:4,C:3,A:3,M:2,P:2)

Take the prefix of the

F:4

B:1

C:1

C:3 B:1F

C

A

Take the prefix of the path until M

(F:4,C:3,A:3)

C:3 B:1

A:3

M:2B:1

P:1A

B

M

Adjust counts to 2

(F:2,C:2,A:2)

A:3

M:2B:1

P:1

M:2

P:2M:1

M

PSo, F, C, and A co-ocurwith M

M:2

P:2M:1

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture14 - Advanced topics in association rules

Mining the FP-TreeFragment growth: Let α be an itemset in DB, B be α’s g g ,conditional pattern base, and β be an itemset in B. Then, the support α U β is equivalent to the support of β in B.

tF:2

root

For M, we had

C:2

,

(F:2,C:2,A:2)

(F:1,C:1,A:1,B:1)

A:2

B:1

Therefore,

{(F,C,A,M):2},{(F,C,M}:2}, B:1 …

Slide 17Artificial Intelligence Machine Learning

Page 18: Lecture14 - Advanced topics in association rules

Is FP-growth Faster than Apriori?

As the support threshold goes down, the number of itemsets increases dramatically. FP-growth does not need to generate candidates and test them

Slide 18

candidates and test them.

Artificial Intelligence Machine Learning

Page 19: Lecture14 - Advanced topics in association rules

Is FP-growth Faster than Apriori?

Both FP-growth and A priori scale linearly with the number of transactions. But FP-growth is more efficient

Slide 19Artificial Intelligence Machine Learning

Page 20: Lecture14 - Advanced topics in association rules

Next Class

Advanced topics in association rule mining

Slide 20Artificial Intelligence Machine Learning

Page 21: Lecture14 - Advanced topics in association rules

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 14Lecture 14Advanced Topics in Association Rules Mining

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull