association rule mining
DESCRIPTION
Association Rule Mining. COMP 790-90 Seminar BCB 713 Module Spring 2011. Outline. What is association rule mining? Methods for association rule mining Extensions of association rule. What Is Association Rule Mining?. - PowerPoint PPT PresentationTRANSCRIPT
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Association Rule Mining
COMP 790-90 Seminar
BCB 713 Module
Spring 2011
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
2
Outline
What is association rule mining?
Methods for association rule mining
Extensions of association rule
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
3
What Is Association Rule Mining?
Frequent patterns: patterns (set of items, sequence, etc.) that occur frequently in a database [AIS93]
Frequent pattern mining: finding regularities in data
What products were often purchased together? Beer and diapers?!
What are the subsequent purchases after buying a car?
Can we automatically profile customers?
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
4
Why Essential?
Foundation for many data mining tasksAssociation rules, correlation, causality, sequential patterns, structural patterns, spatial and multimedia patterns, associative classification, cluster analysis, iceberg cube, …
Broad applicationsBasket data analysis, cross-marketing, catalog design, sale campaign analysis, web log (click stream) analysis, …
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
5
Basics
Itemset: a set of itemsE.g., acm={a, c, m}
Support of itemsetsSup(acm)=3
Given min_sup=3, acm is a frequent patternFrequent pattern mining: find all frequent patterns in a database
TID Items bought
100 f, a, c, d, g, I, m, p
200 a, b, c, f, l,m, o
300 b, f, h, j, o
400 b, c, k, s, p
500 a, f, c, e, l, p, m, n
Transaction database TDB
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
6
Frequent Pattern Mining: A Road Map
Boolean vs. quantitative associations age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “car”) [1%, 75%]
Single dimension vs. multiple dimensional associations
Single level vs. multiple-level analysisWhat brands of beers are associated with what brands of diapers?
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
7
Extensions & Applications
Correlation, causality analysis & mining interesting rules
Maxpatterns and frequent closed itemsets
Constraint-based mining
Sequential patterns
Periodic patterns
Structural Patterns
Computing iceberg cubes
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
8
Frequent Pattern Mining Methods
Apriori and its variations/improvements Mining frequent-patterns without candidate generationMining max-patterns and closed itemsetsMining multi-dimensional, multi-level frequent patterns with flexible support constraintsInterestingness: correlation and causality
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
9
Apriori: Candidate Generation-and-test
Any subset of a frequent itemset must be also frequent — an anti-monotone property
A transaction containing {beer, diaper, nuts} also contains {beer, diaper}
{beer, diaper, nuts} is frequent {beer, diaper} must also be frequent
No superset of any infrequent itemset should be generated or tested
Many item combinations can be pruned
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
10
Apriori-based Mining
Generate length (k+1) candidate itemsets from length k frequent itemsets, and
Test the candidates against DB
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
11
Apriori Algorithm
A level-wise, candidate-generation-and-test approach (Agrawal & Srikant 1994)
TID Items10 a, c, d20 b, c, e30 a, b, c, e40 b, e
Min_sup=2
Itemset Supa 2b 3c 3d 1e 3
Data base D 1-candidates
Scan D
Itemset Supa 2b 3c 3e 3
Freq 1-itemsetsItemset
abacaebcbece
2-candidates
Itemset Supab 1ac 2ae 1bc 2be 3ce 2
Counting
Scan D
Itemset Supac 2bc 2be 3ce 2
Freq 2-itemsetsItemset
bce
3-candidates
Itemset Supbce 2
Freq 3-itemsets
Scan D
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
12
The Apriori Algorithm
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) doCk+1 = candidates generated from Lk;for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
return k Lk;
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
13
Important Details of Apriori
How to generate candidates?Step 1: self-joining Lk
Step 2: pruning
How to count supports of candidates?
COMP 790-090 Data Mining: Concepts, Algorithms, and Applications
14
How to Generate Candidates?
Suppose the items in Lk-1 are listed in an orderStep 1: self-join Lk-1 INSERT INTO Ck
SELECT p.item1, p.item2, …, p.itemk-1, q.itemk-1
FROM Lk-1 p, Lk-1 qWHERE p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 <
q.itemk-1
Step 2: pruningFor each itemset c in Ck do
For each (k-1)-subsets s of c do if (s is not in Lk-1) then delete c from Ck