Mining Quantitative Association Rules in Large Relational
DatabasesRamakrishnan Srikant
Rakesh Agrawal
ACM SIGMOD Conference on Management of Data, 1996
March 21, 2013(Slides modified from Sasi Sekhar Kunta’s version.)
Presented by:Sepehr Amir-
Mohammadian
2
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
3
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
4
Association Rules• Itemsets and , • Rule • Support: • Confidence: • Find rules that have MinSup and MinConf
5
Boolean Association Rules
TID A B C D100 1 1 0 1200 0 1 1 1300 1 1 1 0400 0 0 1 0
TID Items100 A B D200 B C D300 A B C400 C
6
Quantitative Association RulesRecordID Age Married NumCars
100 23 No 1200 25 Yes 1300 29 No 0400 34 Yes 2500 38 Yes 2
7
Mapping to Boolean Association Rules
• Use as new attribute instead of a categorical attribute
• Use as new attribute instead of a quantitative attribute with a small domain
• Use as new attribute instead of a quantitative attribute with a large domain
RecordID
Age: 20..29
Age: 30..39
Married: Yes
Married: No
NumCars: 0
NumCars: 1
100 1 0 0 1 0 1200 1 0 1 0 0 1300 1 0 0 1 1 0400 0 1 1 0 0 0500 0 1 1 0 0 0
8
Problems• “MinSup”: If number of partitions is large, the
support of a single partition can be lower• “MinConf”: Information lost during partition
values into intervals. Confidence can be lower as number of intervals is smaller
RecordID Age Married NumCars
100 23 No 1200 25 Yes 1300 29 No 0400 34 Yes 2500 38 Yes 2
9
Solution• Consider all combinations of adjacent
values/intervals in quantitative attributes Solves “MinSup” problem
• Increase the number of values/intervals, without encountering “MinSup” problem Reduces information loss
• New Problems:– Execution time: Maximum support threshold, MaxSup– Many rules: Interestingness of rules
10
Steps of Proposed Approach1. Determine the number of partitions for each
quantitative attribute2. Map values/ranges to consecutive integer
values such that the order is preserved3. Find the support of each value of the attributes,
and combine when support is less than MaxSup. Find frequent itemsets, whose support is larger than MinSup
4. Use frequent itemsets to generate association rules
5. Pruning out uninteresting rules
11
Example• Step 0: Initial set of records
RecordID Age Married NumCars100 23 No 1200 25 Yes 1300 29 No 0400 34 Yes 2500 38 Yes 2
12
Example – Cont. • Step 1: Determine the partitions for each
quantitative attributes
Intervals for Age
20 .. 24
25 .. 29
30 .. 34
35 .. 39
RecordID Age Married NumCars
100 20 .. 24 No 1
200 25 .. 29 Yes 1
300 25 .. 29 No 0
400 30 .. 34 Yes 2
500 35 .. 39 Yes 2
13
Example – Cont.• Step 2: Mapping intervals/values to consecutive
intergers
Intervals for Age
Integers
20 .. 24 1
25 .. 29 2
30 .. 34 3
35 .. 39 4
Values for
Married
Integers
Yes 1
No 2
14
Example – Cont.• Step 2: Mapping intervals/values to consecutive
integers
RecordID Age Married NumCars100 1 2 1200 2 1 1300 2 2 0400 3 1 2500 4 1 2
15
Example – Cont.• Step 3: Extracting large itemsets
– Some of these itemsets are represented with MinSup = 0.4
Itemset Support323232
16
Example – Cont.• Step 4: Rule generation
– Some of these rules are represented with MinConf = 0.5
17
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
18
Formal Study of Quantitative A. A.
• set of attributes• set of positive integers• , denotes that attribute has value • set of items • For any , • , set of records• , a record such that attributes are distinct• A record supports itemset if
• , a quantitative association rule, where
– ,
19
Formal Definition of Quantitative A. A. – Cont.
• holds in with support , if of the records in support .
• holds in with confidence , if of the records in that support , also support .
• , probability that all items in are supported by a given record
• is a generalization of , denoted by if
20
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
21
Partitioning Quantitative Attributes• A measure of partial completeness: Information
lost in partitioning– : set of rules obtained before partitioning– : set of rules obtained after partitioning– Partial completeness measures the distance
between a rule in and its closest generalization in – The distance is defined by the ratio of support
• Give the best approach to have minimal number of partitions
22
Partial Completeness• : the set of frequent itemsets• For any , is -complete w.r.t if
– –
• The smaller is, the less the information lost
23
Example – K-Completeness• Consider the following set of frequent itemsets:
• Then, items 2, 3, 5, 7 form a 1.5-complete set.• But, items 3,5,7 do not form a 1.5-complete set.
Number Itemset Support1 5%2 6%3 8%4 5%5 6%6 4%7 5%
24
Confidence of Rules Generated from K-Complete Set
• If is -complete set w.r.t , then any rule obtained from has a generalization from , such that is bounded by
• In the previous example:
25
K-Completeness for a Single Attribute
• Consider as a quantitative attribute, partitioned into base intervals.
• Suppose than the support for each base interval is less than
• Let be the set of all combinations of base intervals that have .
• Then, is -complete w.r.t. the set of all ranges over .
26
K-Completeness for a Group of Attributes
• Consider a set 0f quantitative attributes, partitioned into base intervals.
• Suppose that the support for each base interval is less than
• Let be the set of all frequent itemsets over the partitioned attributes.
• Then, is -complete w.r.t. the set of all frequent itemsets without partitioning.
27
Equi-Depth Partitioning • Equi-depth partitioning: Splitting the support
identically
• Suppose that the number of intervals are given.• Then, equi-depth partitioning minimizes max
support for a base interval , and so minimizes .
• Suppose that is given and .• Then, equi-depth partitioning with support in
each base interval results in the minimum number of intervals:
28
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
29
Identify Interesting Rules• Combining intervals results in many rules
• For example, suppose a quarter of people in age group 20..30 are in the age group 20..25– with 8% sup, 70% conf– , with 2% sup, 70% conf– The second rule doesn’t give any additional
information, and is less general than the first rule
30
Expected Value of Support and Confidence
• Interest: Rules with support and confidence according to some expectations
• Let • Let , • The expected value of based on , would be
)• Similarly, the expected value of the confidence for the rule
according to its generalization would be)
where , .
31
Interest Measure• Itemset is -interesting w.r.t its generalization
, if – , and– For any specialization with , is -interesting w.r.t
• Rule is -interesting w.r.t its generalization if – , or
– Moreover, the itemset is -interesting w.r.t .
32
Example of Interest
33
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
34
Candidate Generation• Given the set of all frequent -itemsets, generate
the set of • The process has three parts:
– Join Phase– Subset Prune Phase– Interest Prune Phase
35
Join Phase• joined with itself
• Example, :
• Result of self-join, :
36
Subset Prune Phase• Make sure any -subset is in .
• Example, :
• Result of self-join, :
• Delete the first itemset in since is not in .
37
Interest Prune Phase• Given user-specified interest level • Delete any itemset that contains an item with
support greater than • It is guaranteed that such itemsets cannot be -
interesting w.r.t their generalizations
38
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Candidate Generation• Concluding Remarks• Q&A
39
Concluding Remarks• Introduced the problem of mining quantitative
association rules
• Dealt with quantitative attributes by fine-partitioning the values and combining adjacent partitions as necessary
• Introduced partial completeness to quantify the information lost, and help decide the partitions
• Gave interest measure to identify interesting rules
• Candidate Generation
40
Outline• Association Rules and Quantitative Association
Rules• Formal Study of Quantitative Association
Analysis• Partitioning Quantitative Attributes• Identifying the Interesting Rules• Extending the Apriori Algorithm• Concluding Remarks• Q&A
41
Exam Questions1. What are the two problems with mapping quantitative associations to boolean associations?A. Slide No. 8
2. Give the general steps to be followed in order to mine quantitative association rules.B. Slide No. 10
3. If P is a K-Complete set w.r.t. the set of all frequent itemsets, the minimum confidence when generating rules from P should follow what constraint, in order to guarantee that a close rule will be generated?C. It should be of the desired level of confidence. Slide
No. 24.
42
Thank you.Questions?