business intelligence
TRANSCRIPT
1
BUSINESS INTELLIGENCE &
DATA MINING Association Rules- Support; Confidence; Lift; Conviction.
Geo S. Mariyan(Master of Computer Science)
University of Mumbai.
2
2. Association Rules: Introduction Introduction : - Association rules are created by analyzing data for frequent if/then patterns and using the criteria support and confidence to identify the most important relationships.
- The purchase of one product when another product is purchased represents an association rules.
- Association rules are used in retail stores in marketing advertising floor placements and inventory control.
3
Support is an indication of how frequently the items appear in the database. Confidence indicates the number of times the if/then statements have been found to be true.
In data mining, association rules are useful for analyzing and predicting customer behavior.
They play an important part in shopping basket data analysis, product clustering, catalog design and store layout.
Example: "If a customer buys a dozen eggs, he is 80% likely to also purchase milk."
4
Frequent Item Set The most common approach to finding association rule is to break
up problem into two parts:1)Find Large Itemsets.2)Generate rules from frequent itemsets. A set is called frequent if its support is no less than a given absolute minimal support threshold. An itemset is any subset of the set of all items. A frequent itemset is an itemset whose number of occurrence is
above a threshold. We use the notation L to indicate the complete set of large item sets and l to indicate a specific large itemsets.
The original motivation for searching frequent sets came from the need to analyse so called supermarket transaction data, that is, to examine customer behaviour in terms of the purchased products.
5
6
7
Association Rule Notation
8
Association Rule Definitions
9
Association Rule Example:
10
Algorithm to Generate Association Rules:In this algorithm we use a function support , which
returns the support for the input itemset.
11
ExampleTable1.1 Sample data to Illustrate Association Rule A database in which an association rule is to be found in viewed as a set of tuples , where each tuple contains a set of items. For example, a tuple could be {Bread,Jelly, Peanut,Butter} which consists of this three item. Table 1.1 is used throughout this topic to illustrate different algorithms. Here, there are five transaction and five items.
12
Support (s)
13
Support of All Sets of ItemsSupport: This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears. In Table 1 below, the support of {apple} is 4 out of 8, or 50%. Itemsets can also contain multiple items. For instance, the support of {apple, beer, rice} is 2 out of 8, or 25%.
14
Confidence
Every association rule has a support and a confidence.
An association rule is of the form: X => Y
X => Y: if someone buys X, he also buys Y
The confidence is the conditional probability that, given X present in a transition , Y will also be present.
Confidence measure, by definition: Confidence(X=>Y) equals support(X,Y) / support(X)
15
Support and Confidence for Some association Rule
This says how likely item Y is purchased when item X is purchased, expressed as {X -> Y}.
This is measured by the proportion of transactions with item X, in which item Y also appears.
16
Lift•
17
This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is. In Table 1, the lift of {apple -> beer} is 1, which implies no. association between items.
A lift value greater than 1 means that item Y is likely to be bought if item X is bought, while a value less than 1 means that item Y is unlikely to be bought if item X is bought.
18
Conviction•
19
•
20
21
Confidence or Support :
The confidence or Support (a) for a association rule X=> Y is the ratio of the number of transaction that contain X U Y to the transaction that contain X.
The selection of association rule is based on these two values as describe in the definition of the association rule problem in definition.
Confidence measure the Support of the rule where as supports measures how often it should occur in the database.
22
23
24
25