multi level association rules

Support for an itemset X in a transactional database D is defined as count(X) / |D|.

For an association rule X Þ Y, we can calculate

support(X Þ Y) = support(X U Y) = support(X union Y).confidence(X Þ Y) = support(X U Y) / support(X).

Support (S) and Confidence (C) can also be related to joint probabilities and conditional probabilities as follows.

support(X Þ Y) = P(X U Y).confidence(X Þ Y) = P(Y/X).

The number of association rules that can be derived from a dataset D are exponentially large. Interesting association rules are those whose support and confidence are greater than minSupp and minConf.

Frequent itemsets (also called as large itemsets), are those itemsets whose support is greater

than minSupp. The apriori property (downward closure property) says that any subsets of an

frequent itemset are also frequent itemsets.

Multi Level Association Rules – Concepts:

o Rules Generated from mining data at different levels of abstraction

o Essential to mine at different levels, in supporting business decision making

o Massive amount of data highly sparse at the primitive level

o Rules at high concept level adds to common sense

o Rules at low concept level may not be interesting always

Example:

o Items in task relevant data will be primitive

o Primitive data items occurs least frequently

buys (hp-laptop computer) buys (canon-inkjet printer)

Vs

buys (laptop computer) buys (inkjet printer)

Vs

buys (computer) buys (printer)

o Support- Confidence Framework

o Top down Strategy, in accumulating counts

o Algorithms – Apriori & it’s variations

o Variations includes

o Uniform support for all levels

o Reduced Support at lower levels

Mining (UNIFORM SUPPORT):

o Same support for all levels of abstraction

o Subsets of ancestors not satisfying minimum support are not examined

o Higher support threshold lose interesting associations at lower abstractions

o Lower support threshold Many uninteresting associations at higher abstractions

o Alternate Search Strategies

o Level by level independent

Full breadth search

No back Ground knowledge in pruning

Leads to examining lot of infrequent items

o Level-cross filtering by single item

Examine nodes at level i, only if node at level i-1 is frequent

Misses frequent items at lower level abstractions (due to reduced support)

o Level-cross filtering by k-itemset

Examine k-itemsets at level i, only if k-itemset at level i-1 is frequent

Misses frequent k-itemsets at lower level abstractions (due to reduced support)

o Controlled level-cross filtering by singe item

o A modified level-cross filtering by singe item

o Sets a level passage threshold for every levels

o Allows the inspection of lower abstractions, even if its ancestor fails to satisfy

min_sup threshold

Computer Printer

(At same Abstraction level)

Computer InkJet Printer (Cross level Association rules)

(At Different Abstraction level)

Redundancy:

Laptop computer InkJet Printer

(Support = 10 % , confidence = 70%)

Vs

HP Laptop Computer InkJet Printer

(Support = 5 % , confidence = 68%)

o Second one is redundant due to the existing ancestor relationship

Multi Dimensional Association Rules – Concepts:

=>Rules involving more than one dimensions or predicates

• buys (X, “IBM Laptop Computer”) buys (X, “HP Inkjet Printer”)

(Single dimensional)

• age (X, “20 ..25” ) and occupation (X, “student”) buys (X, “HP Inkjet Printer”)

(Multi Dimensional- Inter dimension Association Rule)

• age (X, “20 ..25” ) and buys (X, “IBM Laptop Computer”) buys (X, “HP Inkjet Printer”)

(Multi Dimensional- Hybrid dimension Association Rule)

• Attributes can be categorical or quantitative

• Quantitative attributes are numeric and incorporates hierarchy (age, income..)

• Numeric attributes must be discretized

• 3 different approaches in mining multi dimensional association rules

o Using static discretization of quantitative attributes

o Using dynamic discretization of quantitative attributes

o Using Distance based discretization with clustering

Mining using Static Discretization:

• Discretization is static and occurs prior to mining

• Discretized attributes are treated as categorical

• Use apriori algorithm to find all k-frequent predicate sets

• Every subset of frequent predicate set must be frequent

• If in a data cube the 3D cuboid (age, income, buys) is frequent implies (age, income), (age,buys), (income, buys)

Mining using Dynamic Discretization:

• Known as Mining Quantitative Association Rules

• Numeric attributes are dynamically discretized

• Consider rules of type

Aquan1 Λ Aquan2 Acat

(2D Quantitative Association Rules)

age(X,”20…25”) Λ income(X,”30K…40K”) buys (X, ”Laptop Computer”)

• ARCS (Association Rule Clustering System)An Approach for mining quantitative association rules.

• 2 step mining process

o Perform clustering to find the interval of attributes involved

o Obtain association rules by searching for groups of clusters that occur together

• The resultant rules must satisfy

o Clusters in the rule antecedent are strongly associated with clusters of rules in

the consequent

o Clusters in the antecedent occur together

o Clusters in the consequent occur together

multi level association rules

Documents