mining frequent itemset-association analysis
TRANSCRIPT
Mining Frequent itemset
Association Analysis
Mining Frequent Item set
Frequent patterns are patterns that appear in the dataset frequentlyFor example a set of items such as milk and bread that appear frequently together in a transaction data set is a frequent itemset
Frequent pattern mining searches for recurring relationships in a given data set
Market Basket Analysis
bull Frequent itemset mining leads to the discovery of associations and correlations among items in largetransactional or relational data sets
bull A example of frequent item set mining is market basket analysisThis process analyzes customer buying habits by finding association between items that customers place in their shopping baskets
Market Basket Analysis
Market Basket Analysisbull Market Basket analysis may be performed on the retail
data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog
bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule
bull Computer=gtantivirus_software[support=2confidence=60]
Frequent ItemsetClosed Itemset and Association Rules
bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I
bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support
threshold and minimum confidenece threshold are said to be strong
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Frequent Item set
Frequent patterns are patterns that appear in the dataset frequentlyFor example a set of items such as milk and bread that appear frequently together in a transaction data set is a frequent itemset
Frequent pattern mining searches for recurring relationships in a given data set
Market Basket Analysis
bull Frequent itemset mining leads to the discovery of associations and correlations among items in largetransactional or relational data sets
bull A example of frequent item set mining is market basket analysisThis process analyzes customer buying habits by finding association between items that customers place in their shopping baskets
Market Basket Analysis
Market Basket Analysisbull Market Basket analysis may be performed on the retail
data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog
bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule
bull Computer=gtantivirus_software[support=2confidence=60]
Frequent ItemsetClosed Itemset and Association Rules
bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I
bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support
threshold and minimum confidenece threshold are said to be strong
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Market Basket Analysis
bull Frequent itemset mining leads to the discovery of associations and correlations among items in largetransactional or relational data sets
bull A example of frequent item set mining is market basket analysisThis process analyzes customer buying habits by finding association between items that customers place in their shopping baskets
Market Basket Analysis
Market Basket Analysisbull Market Basket analysis may be performed on the retail
data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog
bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule
bull Computer=gtantivirus_software[support=2confidence=60]
Frequent ItemsetClosed Itemset and Association Rules
bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I
bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support
threshold and minimum confidenece threshold are said to be strong
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Market Basket Analysis
Market Basket Analysisbull Market Basket analysis may be performed on the retail
data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog
bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule
bull Computer=gtantivirus_software[support=2confidence=60]
Frequent ItemsetClosed Itemset and Association Rules
bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I
bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support
threshold and minimum confidenece threshold are said to be strong
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Market Basket Analysisbull Market Basket analysis may be performed on the retail
data of customer transaction at your storeThe results can be used to plan marketing and advertising strategies or in the design of a new catalog
bull The patterns can be represented in the form of association rulesFor example the information that customers who purchase computers also tend to buy anti virus software at the same time is represented by the association rule
bull Computer=gtantivirus_software[support=2confidence=60]
Frequent ItemsetClosed Itemset and Association Rules
bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I
bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support
threshold and minimum confidenece threshold are said to be strong
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Frequent ItemsetClosed Itemset and Association Rules
bull Let I=I1I2I3helliphellipIm) be a set of items Let D be the task relevant data or be a set database transaction where each transaction T is a set of items such as T c I
bull support(A=gtB)=P(AUB)bull Confidence(A=gtB)=P(BA)bull Rules that satisfy the minimum support
threshold and minimum confidenece threshold are said to be strong
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Frequent ItemsetClosed Itemset and Association Rules
bull Occurrence-The occurrence frequency of an itemset is the number of transaction that contain the itemsetThis is called as frequencysuppoert count or count of the itemset
bull Itemset support is also referred to as relative support where as the occurrence frequency is called as the absolute support
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Frequent ItemsetClosed Itemset and Association Rules
bull Confidence(A=gtB)=P(BA)=support(AUB)bull support_count(A)
bull Association Rules can be viewed as a 2 step process-
bull Find all the frequent itemsetbull Generate strong association rules from the
frequent itemset
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Closed Frequent Itemset amp Maximal Frequent Item set
bull An item set X is a closed frequent itemset in a set S if there exist no proper super-itemset Y such as Y has a same support count as X in S
bull An itemset X is a closed frequent itemset in set S if X is both closed and frequent in S
bull An itemset X is a maximal frequent itemset in set S if X frequent and there exists no super-itemset Y such that XCY and Y is frequent in S
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Frequent Pattern miningbull Market basket is just one form of frequent pattern miningbull Frequent pattern mining can be classified in various ways
based on the following criteriabull Based on completeness of patterns to be minedbull Based on level of abstraction involved in the rule setbull Based on the number of data dimension involved in the
rulebull Base on the types of values handled in the rulebull Based on kind of rules to be minedbull Based on the kind of patterns to be mined
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Efficient and Scalable Frequent Itemset Mining method
bull Apriori is the basic algorithm for designing frequent itemset
bull Apriori is a seminal algorithm proposed by RAgrawal and RSrikant in 1994 for mining frequent itemset for Boolean association rulesThe name of the algorithm is based on the fact that the algorithm usest prior knowledge of frequent item set properties
bull Apriori employs an iterative approach known as level wise search
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Apriori property
bull All non empty subsets of a frequent itemset must also be frequentThe Apriori property is based on the fact that if the itemset doesnot satisfy the minimum support threshold min_sup then I is not frequent that
is P(I)lt min_supThe property is called antimonotone in the sense
that if the a set cannot pass the test all of its superset will fail the same test as well
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Apriori Algorithm
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Apriori Algo
bull Input-I ItemsetD Database of transactionS Support
bull OutputL Large itemset
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Apriori Algo(Contdhellip)
Apriori algorrithmK=0 k is used as the scan numberL=nullC1=I intial candidate are set to be the itemsRepeatK=k+1Lk=nullFor each Ii in Ck do
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Apriori algo(contd)Ci=0 Intial counts for each itemset are 0For each tj in D do For each Ii in Ck do if Ii C tj then Ci=Ci+1 For each Ii C Ck do
if ci gt= (s|D|) do
LK=LK UII
L=L U Lk
CK+1=Apriori-Gen(LK)
Until Ck+1= null
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Generating Association Rules from Frequent Itemsets
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Improving the Efficiency of Apriori
bull Hash based techniquebull Transaction reductionbull Partitioningbull Samplingbull Dynamic Item set Counting
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Frequent Itemset without Candidate Generation(FP Growth)
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
FP Growth
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
FP Growth Algo
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Frequent Itemset using Vertical Format
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining frequent Itemset using Vertical Data Format
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Various Kind of Association Rules
Mining bull multilevel association rulesInvolves concept at different level of abstractionbull multidimensional association rules Involves more than one dimension or predicatebull quantitative association rulesInvolves numeric attribute that have implicit
ordering among values
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Multi Level Association Rules
bull Finding strong association rule at low or primitive level of abstraction is very difficult
bull Data mining system should provide capabilities for mining association rules at multiple levels of abstraction
bull A concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level concept or ancestor
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
A concept hierarchy for All Electronics
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Multiple level Association Rules
bull Association Rules generated from mining data at multiple level of abstraction are called multiple-level or multilevel association rules
bull Multilevel association rules can be mined by using concept hierarchy under a support-confidence framework
bull A top down strategy is involved where counts are accumulated for the calculation of frequent item set at each concept level
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Using uniform minimum support for all levels
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Using reduced minimum support at lower levels
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Disadvantages of mining multilevel association rules
Generation of redundant rules across multilevel of abstraction due to the ancestor relationship among items
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Disadvantages of mining multilevel association rules
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Disadvantages of mining multilevel association rules
bull If a rule doesnrsquot provide any new information then it should be removed
bull A rule R1 is more generalized than rule R2so need to specify rule R2
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Multidimensional association rules from Relational Database and DW
Association rules that imply a single predicate that is the predicate buys
Buys(Xrdquo digital camerardquo)=gtbuys(XrdquoHp printerrdquo)Such rules are called single dimension or intra
dimensional association rules
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Multidimensional association rules from Relational Database and DW
Mining multidimensional database association rules-
Associations rules that involves two or more dimensions or predicate are called multidimensional association rules
Age(xrdquo20hellip29rdquo)^buys(xrdquolaptoprdquo)=gtbuys(xrdquohp printerrdquo)
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Two Basic Approaches
bull Database attributes can be of two types-CategoricalQuantitativeCategorical attributes have a finite number of
possible values with no ordering among values(eg occupationbrandcolor)Categorical attributes are also called nominal attributes
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Two Basic Approaches
Quantitative attributes are numeric and have a implicit ordering among values(eg ageincomeprice)
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Multidimensional Association Rules using Static Discretization of Quantitative Attributes
Quantitative attributes are discretized before mining using predefined concept hierarchies or data discretization techniques where numeric values are replaced by interval labels
Multidimensional data are used to construct data cubeData cube are well suited for mining multidimensional association rules
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Mining Quantitative Association Rules
bull Quantitative association rules are multidimensional association rules in which the numeric attributes are dynamically discretized during the mining process so as to satisfy the minimum criteria
bull A(quant1) ^ A(quant2) =gt Acat
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Association Rule Clustering System
bull Association Rule Clustering System maps pairs of quantitative attributes on to a 2-D grid for tuples statisying a given categorical attribute condition
bull The following steps are involved in ARCS-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Steps involved in ARCS
1 Binning- Quantitative attributes can have a very wide range of values defining their domain
The partioning process is called binning The intervals are considered as binsThe common binning strategies are-
Equal width binning Equal Frequency Binning Clustering based binning
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Steps involved in ARCS(contd)
2 Finding frequent predicate sets-Once the distribution takes place later each category can be scanned to find most frquent itemset that statisfy minimum support and minimum confidence
3Clustering the association rules-the strong association rules are then mapped to 2-D grid
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Steps involved in ARCS(contdhellip)
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Steps involved in ARCS(contdhellip)
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
From Association Mining to Correlation Analysis
Even strong association rules can be misleadingSupport Confidence framework can be
supplemented by additional measure based on statistical significance and correlational analysis
Lift is a simple correlation measureThe occurrence of itemset A is independent of the
occurrence of itemset B if P(AUB)=P(A)P(B) otherwise A and B are correlated and dependent as event
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
From Association Mining to Correlation Analysis
lift(AB)= P(AUB)P(A)P(B)If lift(AB)lt1 then the occurrence of A is
negatively correlated with occurrence of BIf Lift(AB) gt1 then the occurrence of A is
positively correlated with occurrence of BIf Lift(AB)=1 then A and B are independent and
no correlation
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
From Association Mining to Correlation Analysis
P(BA)P(B) or con f(A=gtB)sup(B)
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
bull Correlation Analysis using lift-
Let game refer to tranasction that donot contains games And Video refer to transaction that donot contain videosThe transaction can be summarized in the contingency table
Probabiltiy of purchasing a computer game=60Probability of purchsing a video=75Probability of purchasing both=40By rule=406075=89So as by rule it is less than 1 so it is negatively correlated
Examples
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Correlation analysis using X2bull c2 = Sbull (observed - expected)2expectedbull =bull (400010485764500)2bull 4500bull +bull (350010485763000)2bull 3000bull +bull (200010485761500)2bull 1500bull +bull (50010485761000)2bull 1000bull = 5556bull So it is negatively correlated
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Other correlation measure
bull All_confidencebull Cosine
bull all_conf(X)=sup(X)mx_item_sup(x)
bull Cos(AB)=P(AUB)sqrt(P(A)P(B)=sup(AUB)sqrtsup(A)sup(B)
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Comparison of four correlation measures on typical data set
bull A null transaction is a transaction that does not contain any of the itemsets being examined
bull A measure is null-variant if its value is free from the influence of null transaction
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Constraint Based Association mining
bull A data mining process may uncover thousands of rules from a given datamost of which end up being unrelated or uninteresting to the users
bull A good heuristic is to have the users specify such intuition or expectation as constraints to confine the search spaceThis strategy is known as constraint based mining
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Constraint Based Association mining
bull The constraint can include the following-Knowledge type constraintData constraintDimensionlevel constraintInterestingness constraintRule constraint
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
metarule-Guided Mining of association rules
bull Metarules allows users to specify the syntatic form of rules that they are interested in mining
bull Metarule-guided mining-Finding association between customer traits and
the items that customers buyOnly interested in determining which pairs of customer traits promote the sale of office software
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Constraint Pushing
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-
Rule Constraint
bull Antimonotonicbull Monotonicbull Convertiblebull Inconvertible
Explain all of them with example of DMQL
- Mining Frequent itemset
- Mining Frequent Item set
- Market Basket Analysis
- Market Basket Analysis (2)
- Market Basket Analysis (3)
- Frequent ItemsetClosed Itemset and Association Rules
- Frequent ItemsetClosed Itemset and Association Rules (2)
- Frequent ItemsetClosed Itemset and Association Rules (3)
- Closed Frequent Itemset amp Maximal Frequent Item set
- Frequent Pattern mining
- Efficient and Scalable Frequent Itemset Mining method
- Apriori property
- Apriori Algorithm
- Slide 14
- Apriori Algo
- Apriori Algo(Contdhellip)
- Apriori algo(contd)
- Generating Association Rules from Frequent Itemsets
- Improving the Efficiency of Apriori
- Mining Frequent Itemset without Candidate Generation(FP Growth)
- FP Growth
- FP Growth Algo
- Mining Frequent Itemset using Vertical Format
- Mining frequent Itemset using Vertical Data Format
- Slide 25
- Slide 26
- Mining Various Kind of Association Rules
- Mining Multi Level Association Rules
- A concept hierarchy for All Electronics
- Multiple level Association Rules
- Using uniform minimum support for all levels
- Using reduced minimum support at lower levels
- Disadvantages of mining multilevel association rules
- Disadvantages of mining multilevel association rules (2)
- Disadvantages of mining multilevel association rules (3)
- Mining Multidimensional association rules from Relational Datab
- Mining Multidimensional association rules from Relational Datab (2)
- Two Basic Approaches
- Two Basic Approaches (2)
- Mining Multidimensional Association Rules using Static Discreti
- Slide 41
- Mining Quantitative Association Rules
- Association Rule Clustering System
- Steps involved in ARCS
- Steps involved in ARCS(contd)
- Steps involved in ARCS(contdhellip)
- Steps involved in ARCS(contdhellip) (2)
- From Association Mining to Correlation Analysis
- From Association Mining to Correlation Analysis (2)
- From Association Mining to Correlation Analysis (3)
- Examples
- Correlation analysis using X2
- Other correlation measure
- Comparison of four correlation measures on typical data set
- Constraint Based Association mining
- Constraint Based Association mining (2)
- metarule-Guided Mining of association rules
- Constraint Pushing
- Rule Constraint
-