data mining techniques association rule. what is association mining? association rule mining –...
TRANSCRIPT
Data Mining Techniques Association Rule
What Is Association Mining?• Association Rule Mining
– Finding frequent patterns, associations, correlations, or causal structures among item sets in transaction databases, relational databases, and other information repositories
• Applications– Market basket analysis (marketing strategy: items to put
on sale at reduced prices), cross-marketing, catalog design, shelf space layout design, etc
• Examples– Rule form: Body ead [Support, Confidence].– buys(x, “Computer”) buys(x, “Software”) [2%, 60%]– major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%,
75%]
Market Basket Analysis
Typically, association rules are considered interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.
Rule Measures: Support and Confidence
• Let minimum support 50%, and minimum confidence 50%, we have– A C [50%, 66.6%]
– C A [50%, 100%]
Transaction ID Items Bought1000 A,B,C2000 A,C3000 A,D4000 B,E,F
Support & Confidence
Association Rule: Basic Concepts
• Given– (1) database of transactions, – (2) each transaction is a list of items
(purchased by a customer in a visit)
• Find all rules that correlate the presence of one set of items with that of another set of items
• Find all the rules A B with minimum confidence and support– support, s, P(A B)– confidence, c, P(B|A)
Terminologies• Item
– I1, I2, I3, …– A, B, C, …
• Itemset– {I1}, {I1, I7}, {I2, I3, I5}, …– {A}, {A, G}, {B, C, E}, …
• 1-Itemset– {I1}, {I2}, {A}, …
• 2-Itemset– {I1, I7}, {I3, I5}, {A, G}, …
Terminologies
• K-Itemset– If the length of the itemset is K
• Frequent (Large) K-Itemset– If the length of the itemset is K and the itemset
satisfies a minimum support threshold.
• Association Rule– If a rule satisfies both a minimum support thres
hold and a minimum confidence threshold
Analysis• The number of itemsets of a given cardinality
tends to grow exponentially
Fast Algorithms for Mining Association Rules
Mining Association Rules: Apriori Principle
• For rule A C:– support = support({A C}) = 50%– confidence = support({A C})/support({A}) = 66.6%
• The Apriori principle:– Any subset of a frequent itemset must be frequent
Transaction ID Items Bought1000 A,B,C2000 A,C3000 A,D4000 B,E,F
Frequent Itemset Support{A} 75%{B} 50%{C} 50%
{A,C} 50%
Min. support 50%Min. confidence 50%
Mining Frequent Itemsets: the Key Step
• Find the frequent itemsets: the sets of items that
have minimum support
– A subset of a frequent itemset must also be a frequent
itemset
• i.e., if {AB} is a frequent itemset, both {A} and {B} should be a
frequent itemset
– Iteratively find frequent itemsets with cardinality from 1 to
k (k-itemset)
• Use the frequent itemsets to generate
association rules
Example Database D1 3 42 3 51 2 3 52 5
scan D
count C1
C1 count1 22 33 34 15 3
generate L1
L1
1 2 3 5
scan D
count C2
C2 count12 113 215 123 225 335 2
generate L2
L2
13232535
C2
121315232535
generate C2
scan D
count C3
C3 count235 2
generate L3L3
235C3
235generate C3
Example of Generating Candidates
• L3={abc, abd, acd, ace, bcd}
• Self-joining: L3*L3
– abcd from abc and abd
– acde from acd and ace
• Pruning:
– acde is removed because ade is not in L3
• C4={abcd}
Example
Apriori Algorithm
Apriori Algorithm
Apriori Algorithm
Exercise 4
min-sup = 20%min-conf =
80%
Demo-IBM Intelligent Minner
Demo Database
Multi-Dimensional Association• Single-Dimensional (Intra-Dimension) Rules: Single
Dimension (Predicate) with Multiple Occurrences.buys(X, “milk”) buys(X, “bread”)
• Multi-Dimensional Rules: 2 Dimensions– Inter-dimension association rules (no repeated predicates)
age(X,”19-25”) occupation(X,“student”) buys(X,“coke”)
– hybrid-dimension association rules (repeated predicates)age(X,”19-25”) buys(X, “popcorn”) buys(X, “coke”)
• Categorical (Nominal) Attributes– finite number of possible values, no ordering among
values
• Quantitative Attributes– numeric, implicit ordering among values
Exercise 5min-sup = 20%min-conf = 80%
Research Topics• Quantitative Association Rules
– buys (bread, 5) buys (milk, 3)• Weighted Association Rules• High Utility Association Rules• Non-redundant Association Rule• Constrained Association Rules Mining• Multi-dimensional Association Rules• Generalized Association Rules• Negative Association Rules• Incremental Mining Association Rules• Data Stream Association Rule Mining• Interactive Mining Association Rules