association rule mining in data mining

16
Association Rule Mining Ayesha Ali

Upload: ayesha-ali

Post on 14-Apr-2017

122 views

Category:

Education


2 download

TRANSCRIPT

Page 1: Association Rule Mining in Data Mining

Association Rule Mining

Ayesha Ali

Page 2: Association Rule Mining in Data Mining

Association Analysis

• Discovery of Association Rules – showing attribute-value conditions that occur

frequently together in a set of data, e.g. market basket

– Given a set of data, find rules that will predict the occurrence of a data item based on the occurrences of other items in the data

• A rule has the form body head⇒– buys(Omar, “milk”) buys(Omar, “sugar”)⇒

Page 3: Association Rule Mining in Data Mining

Association Analysis

Page 4: Association Rule Mining in Data Mining

Association AnalysisLocation Business Type

1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food

2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food

3 Carpenter, Electrician, Barber, Hardware Store,

4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop

5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food

6 Internet Café, Gym, Games Shop, Shorts Shop, Fast Food, Bakery

Association Rule: X Y ; (Fast Food, Bakery) (Convenience Store)

Support S: Fraction of items that contain both X and Y = P(X U Y) S(Fast Food, Bakery, Convenience Store) = 2/6 = .33

Confidence C: how often items in Y appear in locations that contain X = P(X U Y) C[(Fast Food, Bakery) (Convenience Store)] = P(X U Y) / P(X)

= 0.33/0.50 = .66

Page 5: Association Rule Mining in Data Mining

Association Analysis

• Given a set of transactions T, the goal of association rule mining is to find all rules having– support ≥ minsup threshold– confidence ≥ minconf threshold

• Brute-force approach:– List all possible association rules– Compute the support and confidence for each rule– Prune rules that fail the minsup and minconf thresholds

⇒ Computationally prohibitive!

Page 6: Association Rule Mining in Data Mining

Association AnalysisLocation Business Type

1 Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Meat Shop

2 Bakery, Bookstore, Petrol Pump, Convenience Store, Library, Fast Food

3 Carpenter, Electrician, Barber, Hardware Store, Meat Shop

4 Bakery, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop

5 Convenience Store, Hospital, Pharmacy, Sports Shop, Gym, Fast Food

6 Internet Café, Gym, Sweets Shop, Shorts Shop, Fast Food, Bakery

Association Rules: (Fast Food, Bakery) (Convenience Store) Support S: .33 Confidence C: .66(Convenience Store, Bakery) (Fast Food) Support S: .33 Confidence C: .50(Fast Food, Convenience Store) (Bakery) Support S: .33 Confidence C: .55(Convenience Store) (Fast Food, Bakery) Support S: .33 Confidence C: .66(Fast Food) (Convenience Store, Bakery) Support S: .33 Confidence C: 1(Bakery) (Fast Food, Convenience Store) Support S: .33 Confidence C: .66

Page 7: Association Rule Mining in Data Mining

Association AnalysisAssociation Rules: (Fast Food, Bakery) (Convenience Store) Support S: .33 Confidence C: .66(Convenience Store, Bakery) (Fast Food) Support S: .33 Confidence C: .50(Fast Food, Convenience Store) (Bakery) Support S: .33 Confidence C: .66(Convenience Store) (Fast Food, Bakery) Support S: .33 Confidence C: .66(Fast Food) (Convenience Store, Bakery) Support S: .33 Confidence C: 1(Bakery) (Fast Food, Convenience Store) Support S: .33 Confidence C: .66

Observations

Above rules are binary partitions of given item set Identical Support but different Confidence Support and Confidence thresholds may be different

Page 8: Association Rule Mining in Data Mining

Mining Association Rules

• Two-step approach:

Step 1. Frequent Itemset GenerationGenerate all itemsets whose support ≥ minsup

Step 2. Rule GenerationGenerate high confidence rules from each frequent itemset,where each rule is a binary partitioning of a frequent itemset

Note: Frequent itemset generation is still computationally expensive

Page 9: Association Rule Mining in Data Mining

Mining Association Rules

• Frequent Item Generation

Lattice Graph of possible item sets

Page 10: Association Rule Mining in Data Mining

Mining Association Rules

• Brute-force approach:– Each node in the lattice graphs is a candidate frequent itemset– Count the support of each candidate by scanning the database

– N = 6– w = (Barber, Bakery, Convenience Store, Meat Shop, Fast Food, Bookstore, Petrol Pump, Library, Carpenter,

Electrician, Hardware Store, Vegetable Market, Flower Shop, Sweets Shop, Meat Shop, Hospital, Pharmacy, Sports Shop, Gym, Internet Café) = 20

– M = 220 = 1048576– Complexity ~ O (NMw)

Page 11: Association Rule Mining in Data Mining

Mining Association Rules

W Unique Items in Item set

Page 12: Association Rule Mining in Data Mining

Mining Association Rules

• Frequent Itemset Generation – Reduce the number of candidates (M)– Reduce the number of transactions/locations (N)– Reduce the number of comparisons (NM)• Use efficient data structures to store the candidates• No need to match every candidate against every

transaction/location

Page 13: Association Rule Mining in Data Mining

Reducing the number of candidates

• Apriori principle:– If an itemset is frequent, then all of its subsets must also

be frequent• Important Support property:

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

Page 14: Association Rule Mining in Data Mining

Reducing the number of candidates

Applying Apriori principle

Page 15: Association Rule Mining in Data Mining

Reducing the number of candidates

• N = 20• All Possible candidate sets;

– NC1 + NC2 + NC3 + … + NCN

• Minimum Occurrence Based Filtering

Set m= 2 and L = 1While (L < N){

Scan DB: List = Create Occurrence Frequency Table of candidate sets of Length LIf no candidate in List then Break;

Filter all candidate sets with Occurrence Frequency < mCreate new candidate set of Length (L=L+1) from List

}

Page 16: Association Rule Mining in Data Mining

Filter Minimum Occurrences

m < 2

Reducing the number of candidatesBusiness Type Count

Barber 2

Bakery 2

Book tore 1

Carpenter 1

Convenience Store 3

Electrician 1

Fast Food 3

Flower Shop 1

Gym 1

Games Shop 1

Hardware Store 1

Hospital 1

Internet Café 1

Library 1

Meat Shop 1

Petrol Pump 1

Pharmacy 1

Sports Shop 1

Sweets Shop 1

Vegetable Market 1

Business Type CountBarber 2

Bakery 2

Convenience Store 3

Fast Food 3

Filter

Scan 1

Business Type Count(Barber, Bakery) 1

(Barber, Convenience Store) 1

(Barber, Fast Food) 1

(Bakery, Convenience Store) 2

(Bakery, Fast Food) 3

(Convenience Store, Fast Food) 3

Pairs of Two Items; 4C2 = 6

Business Type Count(Bakery, Convenience Store) 2

(Bakery, Fast Food) 3

(Convenience Store, Fast Food) 3

Filter Minimum Occurrences m < 2

L1

L2