lecture15 - advances topics on association rules part ii

26
Introduction to Machine Introduction to Machine Learning Learning Lecture 15 Lecture 15 Advanced Topics in Association Rules Mining Albert Orriols i Puig htt // lb t il t http://www.albertorriols.net [email protected] Artificial Intelligence Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Upload: albert-orriols-puig

Post on 24-Jan-2015

1.498 views

Category:

Education


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lecture15 - Advances topics on association rules PART II

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 15Lecture 15Advanced Topics in Association Rules Mining

Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net

[email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Page 2: Lecture15 - Advances topics on association rules PART II

Recap of Lecture 13-14Ideas come from the market basket analysis (MBA)y ( )

Let’s go shopping!

Milk, eggs, sugar, bread

Milk, eggs, cereal, b d

Eggs, sugar

bread

Customer1Customer1

Customer2 Customer3

What do my customer buy? Which product are bought together?

Aim: Find associations and correlations between the different

Slide 2

d assoc at o s a d co e at o s bet ee t e d e e titems that customers place in their shopping basket

Artificial Intelligence Machine Learning

Page 3: Lecture15 - Advances topics on association rules PART II

Recap of Lecture 13-14Apriorip

Will find all the association with minimum support and confidenceco de ce

However:Scans the data base multiple timesScans the data base multiple timesMost often, there is a high number of candidatesSupport counting for candidates can be time expensiveSupport counting for candidates can be time expensive

FP-growthWill obtain the same rules than Apriori

Avoids candidate generation by building a GP tree

Counting the support of candidates more efficiently

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture15 - Advances topics on association rules PART II

Today’s Agenda

Continuing our journey through some advancedContinuing our journey through some advanced topics in ARM

Mining frequent patterns without candidate generation

Multiple Level AR

Sequential Pattern MiningSequential Pattern Mining

Quantitative association rules

Mining class association rules

B d t & fidBeyond support & confidence

Applications

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture15 - Advances topics on association rules PART II

AcknowledgmentsPart of this lecture is based on the work by y

Slide 5Artificial Intelligence Machine Learning

Page 6: Lecture15 - Advances topics on association rules PART II

Why Multiple Level AR? Aim: Find associations between items

But wait!There are many different diapers

Dodot, Huggies …gg

There are many different beers: heineken, desperados, king fisher … in bottle/can …, p , g

Which rule do you prefer?diapers ⇒ beer

dodot diapers M ⇒ Dam beer in Candodot diapers M ⇒ Dam beer in Can

Which will have greater support?

Slide 6

Which will have greater support?

Artificial Intelligence Machine Learning

Page 7: Lecture15 - Advances topics on association rules PART II

Concept HierarchyCreate is-a hierarchies

Clothes Footwear

Outwear Shirts Shoes Hiking BootsShirts Hiking Boots

Assume we found the rule: Outwear ⇒ Hiking boots

Jackets Ski Pants

Assume we found the rule: Outwear ⇒ Hiking boots

ThenJackets ⇒ Hiking boots may not have minimum support

Clothes ⇒ Hiking boots may not have minimum confidence

Slide 7Artificial Intelligence Machine Learning

Page 8: Lecture15 - Advances topics on association rules PART II

Concept HierarchyThis means that

Rules at lower levels may not have enough support to be part of any frequent itemset

However, rules at a lower level of the hierarchy which are overspecific may denote a strong association

Jackets ⇒ Hiking boots

So, which rules do you want?

Users are interested in generating rules that span different levels of the taxonomy

Rules of lower levels may not have minimum support

Taxonomy can be used to prune uninteresting or redundant rules

Multiple taxonomies may be presentFor example: category, price (cheap, expensive), “items-on-sale”, etc

Slide 8

Multiple taxonomies may be modeled as a forest, or a DAG

Artificial Intelligence Machine Learning

Page 9: Lecture15 - Advances topics on association rules PART II

Notation

z ancestors

(marked with ^) edge:

p parentedge:

is_a relationship

c1 c2 child

descendants

Slide 9Artificial Intelligence Machine Learning

Page 10: Lecture15 - Advances topics on association rules PART II

NotationFormalizing the problemg p

I = {i1, i2, …, im}- items

T t ti t f it T IT-transaction, set of items T ⊆ I

D-set of transactions

T supports item x, if x is in T or x is an ancestor of some item in T

T supports X ⊆ I if it supports every item in Xsuppo ts ⊆ t suppo ts e e y te

Generalized association rule: X ⇒ Y if X ⊂ I Y ⊂ I X ∩ Y = ∅ and no item in Y is an ancestor of anyif X ⊂ I, Y ⊂ I, X ∩ Y = ∅, and no item in Y is an ancestor of any item in X.

That is, jacket ⇒ clothes is essentially true

The rule X ⇒ Y has confidence c in D if c% of transactions in D that support X also support Y

Slide 10

The rule X ⇒ Y has support s in D if s% of transactions in D supports X ∪ Y

Artificial Intelligence Machine Learning

Page 11: Lecture15 - Advances topics on association rules PART II

So, Let’s Re-state the Problem

New aim: find all generalized association rules that have gsupport and confidence greater than the user-specified minimum support (called minsup) and minimum confidence (called minconf) respectively(called minconf) respectively

Clothes Footwear

Outwear Shirts Shoes Hiking BootsOutwear Shirts

J k t Ski P t

Shoes Hiking Boots

Jackets Ski Pants

Antecedent and consequent may have items of any level of the hierarchy

Do you see any potential problem?

Slide 11

I can find many redundant rules!

Artificial Intelligence Machine Learning

Page 12: Lecture15 - Advances topics on association rules PART II

Mining the ExampleDatabase D Frequent Itemsets

Transaction Items Bought100 Shirt

Itemset Support{Jacket} 2

{O t } 3200 Jacket, Hiking Boots300 Ski Pants, Hiking Boots400 Sh

{Outwear} 3{Clothes} 4{Shoes} 2400 Shoes

500 Shoes600 Jacket

{Shoes} 2{Hiking Boots} 2

{Footwear} 4600 Jacket{Outwear, Hiking Boots} 2

{Clothes,Hiking Boots} 2Rules

{ , g }

{Outwear, Footwear} 2

{Clothes, Footwear} 2

Rule Support Confidence

Outwear ⇒ Hiking Boots 33% 66.6%{Clothes, Footwear} 2

Outwear ⇒ Footwear 33% 66.6%Hiking Boots ⇒ Outwear 33% 100%Hiking Boots ⇒ Clothes 33% 100%

minsup = 30%

Slide 12Artificial Intelligence Machine Learning

Hiking Boots ⇒ Clothes 33% 100%minconf = 60%

Page 13: Lecture15 - Advances topics on association rules PART II

Mining the ExampleObservation 1

If the set{x,y} has minimum support, so do {x^,y^} {x^,y} and {x^,y^}{ ,y }

E.g.:if {Jacket Shoes} has minsup thenif {Jacket, Shoes} has minsup then{Outwear, Shoes}, {Jacket, Footwear}, and {Outwear, Footwear} also have minimum support} pp

Slide 13Artificial Intelligence Machine Learning

Page 14: Lecture15 - Advances topics on association rules PART II

Mining the ExampleObservation 2

If the rule x ⇒ y has minimum support and confidence, thenx ⇒ y^ is guaranteed to have both minsup and minconf. ⇒ y s gua a teed to a e bot sup a d co

E.g.:The rule Outwear ⇒ Hiking Boots has minsup and minconfThe rule Outwear ⇒ Hiking Boots has minsup and minconf.The rule Outwear ⇒ Footwear has both minsup and minconf

H th l ^ d ^ ^ ill h i thHowever, the rules x^ ⇒ y and x^ ⇒ y^ will have minsup, they may not have minconf.

EE.g.: Clothes ⇒ Hiking Boots Cl th F tClothes ⇒ Footwear

have minsup, but not minconf

Slide 14Artificial Intelligence Machine Learning

Page 15: Lecture15 - Advances topics on association rules PART II

Interesting RulesSo, in which rules are we interested?,

Up to now, we were interested in rules thatHow much the support of a rule was more than the expectedHow much the support of a rule was more than the expected support based on the support of the antecedent and the consequentBut this does not consider taxonomyI have poor pruning… But now, I need to prune a lot!

Shrikant and Agrawal proposed a different approachConsider that Milk

Milk ⇒ cereal [s=0.08, c=0.70]

And that

Milk[s = ]

Skim milk ⇒ cereal [s=0.02, c=0.70]

So, do you think that the second ruleis important?

2% Milk[s = ]

Skim Milk[s = ]

Slide 15

is important?May be not!

Artificial Intelligence Machine Learning

Page 16: Lecture15 - Advances topics on association rules PART II

Interesting RulesA rule is X ⇒ Y is R-interesting w.r.t gan ancestor X^ ⇒ Y^ if:

)^^(b d)(t d)(l YXYXRYX > )^^(on based)s(expected· )(s real YXYXRYX ⇒⇒>⇒

or

)^^(b d)(d)(l YXYXRYX )^^(on based)s(expected· )(c real YXYXRYX ⇒⇒>⇒

Aim: Interesting rules will be those whose support is more than R times the expected value or whose confidence is more than R times the expected value for some user-specified constant RR times the expected value, for some user specified constant R

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture15 - Advances topics on association rules PART II

Interesting RulesWhat’s the expected value?p

A method defined to compute the expected value

)ˆPr()ˆPr()Pr(

...)ˆPr()Pr()][Pr( 1

ˆ Zzz

zzZE j

Z ×××=

Where Z^ is an ancestor of Z

)Pr()Pr( 1 zz j

Go to the papers for the details

Now we aim at:Now, we aim at:finding all generalized R-interesting association rules (R is a user specified minimum interest called min interest) that haveuser-specified minimum interest called min-interest) that have support and confidence greater than minsup and minconf respectively

Slide 17

y

Artificial Intelligence Machine Learning

Page 18: Lecture15 - Advances topics on association rules PART II

Algorithms to Mine General AR

Follow three steps:p1. Find all itemsets whose support is greater than minsup.

These itemsets are called frequent itemsets.ese e se s a e ca ed eque e se s

2. Use the frequent itemsets to generate the desired rules: 1 if ABCD and AB are frequent then1. if ABCD and AB are frequent then 2. conf(AB ⇒ CD) = support(ABCD)/support(AB)

P ll i t ti l f thi t3. Prune all uninteresting rules from this set

Different algorithms for this purposeBasicBasic

Cumulate

Slide 18

EstMerge

Artificial Intelligence Machine Learning

Page 19: Lecture15 - Advances topics on association rules PART II

Basic AlgorithmFollow the steps:p

Is itemset X is frequent?

D t ti T t X?Does transaction T supports X? (X contains items from different levels of taxonomy, T contains only leaves)

T’ = T + ancestors(T);

Answer: T supports X ↔ X ⊆ T’Answer: T supports X ↔ X ⊆ T

Slide 19Artificial Intelligence Machine Learning

Page 20: Lecture15 - Advances topics on association rules PART II

Details of the Basic Algorithm

Count item occurrences

Generate new k-itemsetsGenerate new k itemsets candidates

Add all ancestors of each itemAdd all ancestors of each item in t to t, removing any

duplication

Find the support of all the candidates

Take only those withTake only those with support over minsup

Slide 20Artificial Intelligence Machine Learning

Page 21: Lecture15 - Advances topics on association rules PART II

Can You Optimize It?Optimization 1: Filtering the ancestors added to p gtransactions

We only need to add to transaction t the ancestors that are inWe only need to add to transaction t the ancestors that are in one of the candidates.

If the original item is not in any itemsets it can be dropped fromIf the original item is not in any itemsets, it can be dropped from the transaction.

Clothes

Outwear Shirts

Example: Jackets Ski Pants

Candidates: {clothes, shoes}.Transaction t: {Jacket } can be replaced with {clothes }

Slide 21

Transaction t: {Jacket, …} can be replaced with {clothes, …}

Artificial Intelligence Machine Learning

Page 22: Lecture15 - Advances topics on association rules PART II

Can You Optimize It?Optimization 2: Pre-computing ancestorsp p g

Rather than finding ancestors for each item by traversing the taxonomy graph, we can pre-compute the ancestors for each a o o y g ap , e ca p e co pu e e a ces o s o eacitem

We can drop ancestors that are not contained in any of the e ca d op a ces o s a a e o co a ed a y o ecandidates in the same time

Clothes

Outwear Shirts

Jackets Ski Pants

Slide 22Artificial Intelligence Machine Learning

Page 23: Lecture15 - Advances topics on association rules PART II

Can You Optimize It?Optimization 3: Prune itemsets containing an item and p gits ancestor

If we have {Jacket} and {Outwear} we will have candidateIf we have {Jacket} and {Outwear}, we will have candidate {Jacket, Outwear} which is not interesting.

s({Jacket}) = s ({Jacket Outwear})s({Jacket}) s ({Jacket, Outwear})

Delete ({Jacket, Outwear}) in k=2 will ensure it will not erase in k>2 (because of the prune step of candidate generationk>2. (because of the prune step of candidate generation method)

Therefore we can prune the rules containing an item an itsTherefore, we can prune the rules containing an item an its ancestor only for k=2, and in the next steps all candidates will not include item + ancestor

Slide 23Artificial Intelligence Machine Learning

Page 24: Lecture15 - Advances topics on association rules PART II

SummaryImportance of hierarchy in real-world applicationsp y pp

How?B ild DAGBuild a DAG

Redefine the problem of ARM

Get association rules

Don’t take these ideas in isolation!Don t take these ideas in isolation!Applicable to all the advances we will see in the next classes

Real-world problems usually require the mixing of many ideas

Slide 24Artificial Intelligence Machine Learning

Page 25: Lecture15 - Advances topics on association rules PART II

Next Class

Advanced topics in association rule mining

Slide 25Artificial Intelligence Machine Learning

Page 26: Lecture15 - Advances topics on association rules PART II

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 15Lecture 15Advanced Topics in Association Rules Mining

Albert Orriols i Puightt // lb t i l thttp://www.albertorriols.net

[email protected]

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull