association rule dr. jieh-shan george yeh [email protected]
TRANSCRIPT
![Page 2: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/2.jpg)
ARULES: MINING ASSOCIATION RULES AND FREQUENT ITEMSETS
http://cran.r-project.org/web/packages/arules/index.html
![Page 3: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/3.jpg)
Association Rule
• Association rules are rules presenting association or correlation between itemsets.
• An association rule is in the form of A => B, where A and B are two disjoint itemsets, referred to respectively as the lhs (left-hand side) and rhs (right-hand side) of the rule.
• The three most widely-used measures for selecting interesting rules are support, confidence and lift.
![Page 4: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/4.jpg)
Association Rule
• support is the percentage of cases in the data that contains both A and B
• confidence is the percentage of cases containing A that also contain B
• lift is the ratio of confidence to the percentage of cases containing B.
![Page 5: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/5.jpg)
![Page 6: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/6.jpg)
Example: The Titanic Dataset
• The Titanic dataset in the datasets package is a 4-dimensional table with summarized information on the fate of passengers on the Titanic according to social class, sex, age and survival.
• titanic.raw.rdata– http://
www.rdatamining.com/data/titanic.raw.rdata?attredirects=0&d=1
![Page 7: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/7.jpg)
load("titanic.raw.rdata")str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ... - attr(*, "dimnames")=List of 4 ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew" ..$ Sex : chr [1:2] "Male" "Female" ..$ Age : chr [1:2] "Child" "Adult" ..$ Survived: chr [1:2] "No" "Yes"
df <- as.data.frame(Titanic)
head(df)
![Page 8: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/8.jpg)
Mining Associations with Apriori
• Description• Mine frequent itemsets, association rules or association
hyperedges using the Apriori algorithm. The Apriori algorithm employs level-wise search for frequent itemsets. The implementation of Apriori used includes some improvements (e.g., a prefix tree and item sorting).
• Usageapriori(data, parameter = NULL, appearance = NULL,
control = NULL)
![Page 9: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/9.jpg)
Mining Associations with Apriori
• Arguments• data
– object of class transactions or any data structure which can be coerced into transactions (e.g., a binary matrix or data.frame).
• parameter– object of class APparameter or named list. The default behavior is to mine
rules with support 0.1, confidence 0.8, and maxlen 10.
• appearance– object of class APappearance or named list. With this argument item
appearance can be restricted. By default all items can appear unrestricted.
• control– object of class APcontrol or named list. Controls the performance of the
mining algorithm (item sorting, etc.)
![Page 10: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/10.jpg)
library(arules)# find association rules with default settingsrules.all <- apriori(titanic.raw)
rules.allinspect(rules.all)
![Page 11: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/11.jpg)
rhs containing "Survived" only
# rules with rhs containing "Survived" onlyrules <- apriori(titanic.raw, control = list(verbose=F), parameter = list(minlen=2, supp=0.005, conf=0.8), appearance = list(rhs=c("Survived=No", "Survived=Yes"), default="lhs"))quality(rules) <- round(quality(rules), digits=3)rules.sorted <- sort(rules, by="lift")inspect(rules.sorted)
![Page 12: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/12.jpg)
find redundant rules
# find redundant rulessubset.matrix <- is.subset(rules.sorted, rules.sorted)subset.matrix[lower.tri(subset.matrix, diag=T)] <- NAredundant <- colSums(subset.matrix, na.rm=T) >= 1which(redundant)
# remove redundant rulesrules.pruned <- rules.sorted[!redundant]inspect(rules.pruned)
![Page 13: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/13.jpg)
######rules <- apriori(titanic.raw, parameter = list(minlen=3, supp=0.002, conf=0.2), appearance = list(rhs=c("Survived=Yes"), lhs=c("Class=1st", "Class=2nd", "Class=3rd", "Age=Child", "Age=Adult"), default="none"), control = list(verbose=F))rules.sorted <- sort(rules, by="confidence")inspect(rules.sorted)
![Page 14: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/14.jpg)
ARULESVIZ: VISUALIZING ASSOCIATION RULES AND FREQUENT ITEMSETS
http://cran.r-project.org/web/packages/arulesViz/index.html
![Page 15: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/15.jpg)
###Visualizing Association Ruleslibrary(arulesViz)plot(rules.all)
plot(rules.all, method="grouped")
plot(rules.all, method="graph")
plot(rules.all, method="graph", control=list(type="items"))
plot(rules.all, method="paracoord", control=list(reorder=TRUE))
![Page 16: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/16.jpg)
plot(rules.all)
![Page 17: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/17.jpg)
plot(rules.all, method="grouped")
![Page 18: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/18.jpg)
plot(rules.all, method="graph")
![Page 19: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/19.jpg)
plot(rules.all, method="graph", control=list(type="items"))
![Page 20: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/20.jpg)
plot(rules.all, method="paracoord", control=list(reorder=TRUE))
![Page 21: Association Rule Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw](https://reader030.vdocuments.mx/reader030/viewer/2022032702/56649cb95503460f94980fd9/html5/thumbnails/21.jpg)
Other Packages
• arulesSequences: Mining frequent sequences– http://cran.r-project.org/web/packages/
arulesSequences/index.html• arulesNBMiner: Mining NB-Frequent
Itemsets and NB-Precise Rules– http://cran.r-project.org/web/packages/
arulesNBMiner/index.html