© tan,steinbach, kumar introduction to data mining 4/18/2004 1 association rule mining l given a...
TRANSCRIPT
![Page 1: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/1.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
Association Rule Mining
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Market-Basket transactions
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Example of Association Rules
{Diaper} {Beer},{Milk, Bread} {Eggs,Coke},{Beer, Bread} {Milk},
Implication means co-occurrence, not causality!
![Page 2: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/2.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
Definition: Frequent Itemset
Itemset– A collection of one or more items
Example: {Milk, Bread, Diaper}
– k-itemset An itemset that contains k items
Support count ()– Frequency of occurrence of an itemset
– E.g. ({Milk, Bread,Diaper}) = 2
Support– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset– An itemset whose support is greater
than or equal to a minsup threshold
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
![Page 3: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/3.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3
Definition: Association Rule
Example:Beer}Diaper,Milk{
4.052
|T|)BeerDiaper,,Milk(
s
67.032
)Diaper,Milk()BeerDiaper,Milk,(
c
Association Rule– An implication expression of the form
X Y, where X and Y are itemsets
– Example: {Milk, Diaper} {Beer}
Rule Evaluation Metrics– Support (s)
Fraction of transactions that contain both X and Y
– Confidence (c) Measures how often items in Y
appear in transactions thatcontain X
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
![Page 4: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/4.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4
Association Rule Mining Task
Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold
– confidence ≥ minconf threshold
Brute-force approach:– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf thresholds
Computationally prohibitive!
![Page 5: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/5.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5
Mining Association Rules
Example of Rules:
{Milk,Diaper} {Beer} (s=0.4, c=0.67){Milk,Beer} {Diaper} (s=0.4, c=1.0){Diaper,Beer} {Milk} (s=0.4, c=0.67){Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5)
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Observations:
• All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but can have different confidence
• Thus, we may decouple the support and confidence requirements
![Page 6: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/6.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6
Mining Association Rules
Two-step approach: 1. Frequent Itemset Generation
– Generate all itemsets whose support minsup
2. Rule Generation– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
Frequent itemset generation is still computationally expensive
![Page 7: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/7.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7
Frequent Itemset Generation
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given d items, there are 2d possible candidate itemsets
![Page 8: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/8.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8
Frequent Itemset Generation
Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the database
– Match each transaction against every candidate
– Complexity ~ O(NMw) => Expensive since M = 2d !!!
TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke
N
Transactions List ofCandidates
M
w
![Page 9: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/9.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9
Frequent Itemset Generation Strategies
Reduce the number of candidates (M)– Complete search: M=2d
– Use pruning techniques to reduce M
Reduce the number of transactions (N)– Reduce size of N as the size of itemset increases– Used by DHP and vertical-based mining algorithms
Reduce the number of comparisons (NM)– Use efficient data structures to store the candidates or
transactions– No need to match every candidate against every
transaction
![Page 10: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/10.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10
Reducing Number of Candidates
Apriori principle:– If an itemset is frequent, then all of its subsets must also
be frequent
Apriori principle holds due to the following property of the support measure:
– Support of an itemset never exceeds the support of its subsets
– This is known as the anti-monotone property of support
)()()(:, YsXsYXYX
![Page 11: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/11.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11
Found to be Infrequent
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Illustrating Apriori Principle
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDEPruned supersets
![Page 12: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/12.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12
Illustrating Apriori Principle
Item CountBread 4Coke 2Milk 4Beer 3Diaper 4Eggs 1
Itemset Count{Bread,Milk} 3{Bread,Beer} 2{Bread,Diaper} 3{Milk,Beer} 2{Milk,Diaper} 3{Beer,Diaper} 3
Itemset Count {Bread,Milk,Diaper} 3
Items (1-itemsets)
Pairs (2-itemsets)
(No need to generatecandidates involving Cokeor Eggs)
Triplets (3-itemsets)Minimum Support = 3
If every subset is considered, 6C1 + 6C2 + 6C3 = 41
With support-based pruning,6 + 6 + 1 = 13
![Page 13: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/13.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13
Frequent Itemset Mining
2 strategies:
– Breadth-first: AprioriExploit monotonicity to the maximum
– Depth-first strategy: EclatPrune the databaseDo not fully exploit monotonicity
![Page 14: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/14.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14
Apriori
A CB D
{}
minsup=2
0 0 0 0
Candidates
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
![Page 15: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/15.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15
Apriori
A CB D
{}
0 1 1 0
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Candidates
![Page 16: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/16.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16
Apriori
A CB D
{}
0 2 2 0
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Candidates
![Page 17: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/17.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17
Apriori
A CB D
{}
1 2 3 1
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Candidates
![Page 18: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/18.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18
Apriori
A CB D
{}
2 3 4 2
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Candidates
![Page 19: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/19.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19
Apriori
A CB D
{}
2 4 4 3
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Candidates
![Page 20: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/20.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20
Apriori
AB BCAC AD CDBD
A CB D
{}
2 4 4 3
Candidates
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
![Page 21: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/21.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21
Apriori
AB BCAC AD CDBD
A CB D
{}
2 4 4 3
1 2 2 3 2 2
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
![Page 22: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/22.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22
Apriori
ACD BCD
AB BCAC AD CDBD
A CB D
{}
1 2 2 3 2 2
Candidates
2 4 4 3
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
![Page 23: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/23.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23
Apriori
ACD BCD
AB BCAC AD CDBD
A CB D
{}
1 2 2 3 2 2
2 1
2 4 4 3
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
![Page 24: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/24.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24
Apriori Algorithm
Method:
– Let k=1– Generate frequent itemsets of length 1– Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k frequent itemsets
Prune candidate itemsets containing subsets of length k that are infrequent
Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those
that are frequent
![Page 25: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/25.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25
Frequent Itemset Mining
2 strategies:
– Breadth-first: AprioriExploit monotonicity to the maximum
– Depth-first strategy: EclatPrune the databaseDo not fully exploit monotonicity
![Page 26: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/26.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26
Depth-First Algorithms
Find all frequent itemsets
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
![Page 27: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/27.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 27
Depth-First Algorithms
Find all frequent itemsets
with D
without D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Find all frequent itemsets
Find all frequent itemsets
![Page 28: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/28.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28
Depth-First Algorithms
Find all frequent itemsets
with D
without D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Find all frequent itemsets
Find all frequent itemsets
A, B, C, AC
A, B, C, AC, BC
![Page 29: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/29.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 29
Depth-First Algorithms
Find all frequent itemsets
with D
without D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Find all frequent itemsets
Find all frequent itemsets
A, B, C, AC
A, B, C, AC, BC
AD, BD, CD, ACD A, B, C, AC, BCadd D again
![Page 30: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/30.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 30
Depth-First Algorithms
Find all frequent itemsets
with D
without D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
minsup=2
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
Find all frequent itemsets
Find all frequent itemsets
A, B, C, AC
A, B, C, AC, BC
AD, BD, CD, ACD + A, B, C, AC, BC
A, B, C, AC, BC, AD, BD, CD, ACD
![Page 31: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/31.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
![Page 32: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/32.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 32
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
DBDB[D]
AC: 2
![Page 33: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/33.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 33
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
3 A, 4 A, B
A: 2
DBDB[D]
DB[CD]
![Page 34: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/34.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 34
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
3 A, 4 A, B
A: 2
DBDB[D]
DB[CD]
AC: 2
![Page 35: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/35.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 35
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
DBDB[D]
AC: 2
![Page 36: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/36.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 36
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
DBDB[D]
AC: 2
4 A
DB[BD]
A:1
![Page 37: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/37.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 37
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
DBDB[D]
AC: 2
![Page 38: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/38.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 38
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
3 A, C4 A, B, C5 B,
A: 2B: 4C: 2
DBDB[D]
AC: 2
AD: 2BD: 4CD: 2ACD: 2
![Page 39: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/39.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 39
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2
![Page 40: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/40.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 40
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2
1 B2 B3 A4 A, B
DB[C]
A: 2B: 3
![Page 41: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/41.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 41
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2
1 B2 B3 A4 A, B
DB[C]
A: 2B: 3
1
2
4 AA: 1
DB[BC]
![Page 42: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/42.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 42
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2
1 B2 B3 A4 A, B
DB[C]
A: 2B: 3
![Page 43: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/43.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 43
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3
1 B2 B3 A4 A, B
DB[C]
A: 2B: 3
![Page 44: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/44.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 44
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3
![Page 45: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/45.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 45
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3
1 24 A 5
DB[B]
A:1
![Page 46: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/46.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 46
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3
![Page 47: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/47.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 47
Depth-First Algorithm
1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D
A: 2B: 4C: 4D: 3
DB
AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3
Final set of frequent itemsets
![Page 48: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/48.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 48
ECLAT
For each item, store a list of transaction ids (tids)
TID Items1 A,B,E2 B,C,D3 C,E4 A,C,D5 A,B,C,D6 A,E7 A,B8 A,B,C9 A,C,D
10 B
HorizontalData Layout
A B C D E1 1 2 2 14 2 3 4 35 5 4 5 66 7 8 97 8 98 109
Vertical Data Layout
TID-list
![Page 49: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/49.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 49
ECLAT
Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets.
Depth-first traversal of the search lattice Advantage: very fast support counting Disadvantage: intermediate tid-lists may become too large
for memory
A1456789
B1257810
AB1578
![Page 50: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/50.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 50
Rule Generation
Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A, A BCD, B ACD, C ABD, D ABCAB CD, AC BD, AD BC, BC AD, BD AC, CD AB,
If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L)
![Page 51: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/51.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 51
Rule Generation
How to efficiently generate rules from frequent itemsets?– In general, confidence does not have an anti-monotone
propertyc(ABC D) can be larger or smaller than c(AB
D)
– But confidence of rules generated from the same itemset has an anti-monotone property
– e.g., L = {A,B,C,D}:
c(ABC D) c(AB CD) c(A BCD) Confidence is anti-monotone w.r.t. number of items on the RHS of the rule
![Page 52: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the](https://reader035.vdocuments.mx/reader035/viewer/2022062620/551a731a550346b52d8b5038/html5/thumbnails/52.jpg)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 52
Rule Generation for Apriori Algorithm
ABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Lattice of rulesABCD=>{ }
BCD=>A ACD=>B ABD=>C ABC=>D
BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD
D=>ABC C=>ABD B=>ACD A=>BCD
Pruned Rules
Low Confidence Rule