1 peter fox data analytics – itws-4963/itws-6965 week 5a, february 24, 2015 weighted knn, ~...
TRANSCRIPT
![Page 1: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/1.jpg)
1
Peter Fox
Data Analytics – ITWS-4963/ITWS-6965
Week 5a, February 24, 2015
Weighted kNN, ~ clustering, trees and Bayesian
classification
![Page 2: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/2.jpg)
Plot tools/ tipshttp://statmethods.net/advgraphs/layout.html
http://flowingdata.com/2014/02/27/how-to-read-histograms-and-use-them-in-r/
pairs, gpairs, scatterplot.matrix, clustergram, etc.
data()
# precip, presidents, iris, swiss, sunspot.month (!), environmental, ethanol, ionosphere
More script fragments in R will be available on the web site (http://escience.rpi.edu/data/DA )
2
![Page 3: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/3.jpg)
Weighted KNN?require(kknn)
data(iris)
m <- dim(iris)[1]
val <- sample(1:m, size = round(m/3), replace = FALSE,
prob = rep(1/m, m))
iris.learn <- iris[-val,]
iris.valid <- iris[val,]
iris.kknn <- kknn(Species~., iris.learn, iris.valid, distance = 1,
kernel = "triangular")
summary(iris.kknn)
fit <- fitted(iris.kknn)
table(iris.valid$Species, fit)
pcol <- as.character(as.numeric(iris.valid$Species))
pairs(iris.valid[1:4], pch = pcol, col = c("green3", "red”)[(iris.valid$Species != fit)+1])
3
![Page 4: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/4.jpg)
4
Look at Lab5b_wknn_2015.R
![Page 5: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/5.jpg)
Ctree> iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris)
> print(iris_ctree)
Conditional inference tree with 4 terminal nodes
Response: Species
Inputs: Sepal.Length, Sepal.Width, Petal.Length, Petal.Width
Number of observations: 150
1) Petal.Length <= 1.9; criterion = 1, statistic = 140.264
2)* weights = 50
1) Petal.Length > 1.9
3) Petal.Width <= 1.7; criterion = 1, statistic = 67.894
4) Petal.Length <= 4.8; criterion = 0.999, statistic = 13.865
5)* weights = 46
4) Petal.Length > 4.8
6)* weights = 8
3) Petal.Width > 1.7
7)* weights = 46 5
![Page 6: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/6.jpg)
plot(iris_ctree)
6
Try Lab6b_5_2014.R> plot(iris_ctree, type="simple”) # try this
![Page 7: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/7.jpg)
Swiss - pairs
7
pairs(~ Fertility + Education + Catholic, data = swiss, subset = Education < 20, main = "Swiss data, Education < 20")
![Page 8: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/8.jpg)
New dataset - ionosphererequire(kknn)
data(ionosphere)
ionosphere.learn <- ionosphere[1:200,]
ionosphere.valid <- ionosphere[-c(1:200),]
fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid)
table(ionosphere.valid$class, fit.kknn$fit)
# vary kernel
(fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15,
kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1))
table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class)
#alter distance
(fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15,
kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2))
table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class)8
![Page 9: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/9.jpg)
Resultsionosphere.learn <- ionosphere[1:200,]
# convenience samping!!!!
ionosphere.valid <- ionosphere[-c(1:200),]
fit.kknn <- kknn(class ~ ., ionosphere.learn, ionosphere.valid)
table(ionosphere.valid$class, fit.kknn$fit)
b g
b 19 8
g 2 1229
![Page 10: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/10.jpg)
(fit.train1 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15, + kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 1))
Call:
train.kknn(formula = class ~ ., data = ionosphere.learn, kmax = 15, distance = 1, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"))
Type of response variable: nominal
Minimal misclassification: 0.12
Best kernel: rectangular
Best k: 2
table(predict(fit.train1, ionosphere.valid), ionosphere.valid$class)
b g
b 25 4
g 2 12010
![Page 11: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/11.jpg)
(fit.train2 <- train.kknn(class ~ ., ionosphere.learn, kmax = 15,
+ kernel = c("triangular", "rectangular", "epanechnikov", "optimal"), distance = 2))
Call:
train.kknn(formula = class ~ ., data = ionosphere.learn, kmax = 15, distance = 2, kernel = c("triangular", "rectangular", "epanechnikov", "optimal"))
Type of response variable: nominal
Minimal misclassification: 0.12
Best kernel: rectangular
Best k: 2
table(predict(fit.train2, ionosphere.valid), ionosphere.valid$class)
b g
b 20 5
g 7 11911
![Page 12: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/12.jpg)
However… there is more
12
![Page 13: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/13.jpg)
Bayes> cl <- kmeans(iris[,1:4], 3)
> table(cl$cluster, iris[,5])
setosa versicolor virginica
2 0 2 36
1 0 48 14
3 50 0 0
#
> m <- naiveBayes(iris[,1:4], iris[,5])
> table(predict(m, iris[,1:4]), iris[,5])
setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47 13
pairs(iris[1:4],main="Iris Data (red=setosa,green=versicolor,blue=virginica)", pch=21, bg=c("red","green3","blue")[unclass(iris$Species)])
![Page 14: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/14.jpg)
Using a contingency table> data(Titanic)
> mdl <- naiveBayes(Survived ~ ., data = Titanic)
> mdl
14
Naive Bayes Classifier for Discrete PredictorsCall: naiveBayes.formula(formula = Survived ~ ., data = Titanic)A-priori probabilities:Survived No Yes 0.676965 0.323035 Conditional probabilities: ClassSurvived 1st 2nd 3rd Crew No 0.08187919 0.11208054 0.35436242 0.45167785 Yes 0.28551336 0.16596343 0.25035162 0.29817159 SexSurvived Male Female No 0.91543624 0.08456376 Yes 0.51617440 0.48382560 AgeSurvived Child Adult No 0.03489933 0.96510067 Yes 0.08016878 0.91983122
![Page 15: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/15.jpg)
Using a contingency table> predict(mdl, as.data.frame(Titanic)[,1:3])
[1] Yes No No No Yes Yes Yes Yes No No No No Yes Yes Yes Yes Yes No No No Yes Yes Yes Yes No
[26] No No No Yes Yes Yes Yes
Levels: No Yes
15
![Page 16: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/16.jpg)
Naïve Bayes – what is it?• Example: testing for a specific item of
knowledge that 1% of the population has been informed of (don’t ask how).
• An imperfect test:– 99% of knowledgeable people test positive– 99% of ignorant people test negative
• If a person tests positive – what is the probability that they know the fact?
16
![Page 17: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/17.jpg)
Naïve approach…• We have 10,000 representative people• 100 know the fact/item, 9,900 do not• We test them all:
– Get 99 knowing people testing knowing– Get 99 not knowing people testing not knowing– But 99 not knowing people testing as knowing
• Testing positive (knowing) – equally likely to know or not = 50%
17
![Page 18: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/18.jpg)
Tree diagram
10000 ppl
1% know (100ppl)
99% test to know
(99ppl)
1% test not to know (1per)
99% do not know
(9900ppl)
1% test to know
(99ppl)
99% test not to know
(9801ppl)18
![Page 19: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/19.jpg)
Relation between probabilities• For outcomes x and y there are probabilities
of p(x) and p (y) that either happened• If there’s a connection then the joint
probability - both happen = p(x,y)• Or x happens given y happens = p(x|y) or
vice versa then:– p(x|y)*p(y)=p(x,y)=p(y|x)*p(x)
• So p(y|x)=p(x|y)*p(y)/p(x) (Bayes’ Law)• E.g.
p(know|+ve)=p(+ve|know)*p(know)/p(+ve)= (.99*.01)/(.99*.01+.01*.99) = 0.5
19
![Page 20: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/20.jpg)
How do you use it?• If the population contains x what is the
chance that y is true?
• p(SPAM|word)=p(word|SPAM)*p(SPAM)/p(word)
• Base this on data: – p(spam) counts proportion of spam versus not– p(word|spam) counts prevalence of spam
containing the ‘word’– p(word|!spam) counts prevalence of non-spam
containing the ‘word’ 20
![Page 21: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/21.jpg)
Or..• What is the probability that you are in one
class (i) over another class (j) given another factor (X)?
• Invoke Bayes:
• Maximize p(X|Ci)p(Ci)/p(X) (p(X)~constant and p(Ci) are equal if not known)
• So: conditional indep - 21
![Page 22: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/22.jpg)
• P(xk | Ci) is estimated from the training samples – Categorical: Estimate P(xk | Ci) as percentage of
samples of class i with value xk
• Training involves counting percentage of occurrence of each possible value for each class
– Numeric: Actual form of density function is generally not known, so “normal” density is often assumed
22
![Page 23: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/23.jpg)
Digging into irisclassifier<-naiveBayes(iris[,1:4], iris[,5])
table(predict(classifier, iris[,-5]), iris[,5], dnn=list('predicted','actual'))
classifier$apriori
classifier$tables$Petal.Length
plot(function(x) dnorm(x, 1.462, 0.1736640), 0, 8, col="red", main="Petal length distribution for the 3 different species")
curve(dnorm(x, 4.260, 0.4699110), add=TRUE, col="blue")
curve(dnorm(x, 5.552, 0.5518947 ), add=TRUE, col = "green") 23
![Page 24: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/24.jpg)
24
![Page 25: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/25.jpg)
Decision tree (example)> require(party) # don’t get me started!
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
> iris_ctree <- ctree(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data=iris)
25
![Page 26: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/26.jpg)
plot(iris_ctree)
26
Try Lab6b_5_2014.R> plot(iris_ctree, type="simple”) # try this
![Page 27: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/27.jpg)
Beyond plot: pairspairs(iris[1:4], main = "Anderson's Iris Data -- 3 species”, pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
27
Try Lab6b_2_2014.R - USJudgeRatings
![Page 28: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/28.jpg)
Try hclust for iris
28
![Page 29: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/29.jpg)
gpairs(iris)
29
Try Lab6b_3_2014.R
![Page 30: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/30.jpg)
Better scatterplots
30
install.packages("car")
require(car)
scatterplotMatrix(iris)
Try Lab6b_4_2014.R
![Page 31: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/31.jpg)
splom(iris) # default
31
Try Lab6b_7_2014.R
![Page 32: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/32.jpg)
splom extra!require(lattice)
super.sym <- trellis.par.get("superpose.symbol")
splom(~iris[1:4], groups = Species, data = iris,
panel = panel.superpose,
key = list(title = "Three Varieties of Iris",
columns = 3,
points = list(pch = super.sym$pch[1:3],
col = super.sym$col[1:3]),
text = list(c("Setosa", "Versicolor", "Virginica"))))
splom(~iris[1:3]|Species, data = iris,
layout=c(2,2), pscales = 0,
varnames = c("Sepal\nLength", "Sepal\nWidth", "Petal\nLength"),
page = function(...) {
ltext(x = seq(.6, .8, length.out = 4),
y = seq(.9, .6, length.out = 4),
labels = c("Three", "Varieties", "of", "Iris"),
cex = 2)
})
parallelplot(~iris[1:4] | Species, iris)
parallelplot(~iris[1:4], iris, groups = Species,
horizontal.axis = FALSE, scales = list(x = list(rot = 90)))
> Lab6b_7_2014.R
32
![Page 33: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/33.jpg)
33
![Page 34: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/34.jpg)
34
![Page 35: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/35.jpg)
Using a contingency table> data(Titanic)
> mdl <- naiveBayes(Survived ~ ., data = Titanic)
> mdl
35
Naive Bayes Classifier for Discrete PredictorsCall: naiveBayes.formula(formula = Survived ~ ., data = Titanic)A-priori probabilities:Survived No Yes 0.676965 0.323035 Conditional probabilities: ClassSurvived 1st 2nd 3rd Crew No 0.08187919 0.11208054 0.35436242 0.45167785 Yes 0.28551336 0.16596343 0.25035162 0.29817159 SexSurvived Male Female No 0.91543624 0.08456376 Yes 0.51617440 0.48382560 AgeSurvived Child Adult No 0.03489933 0.96510067 Yes 0.08016878 0.91983122 Try Lab6b_9_2014.R
![Page 36: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/36.jpg)
http://www.ugrad.stat.ubc.ca/R/library/mlbench/html/HouseVotes84.html
require(mlbench)
data(HouseVotes84)
model <- naiveBayes(Class ~ ., data = HouseVotes84)
predict(model, HouseVotes84[1:10,-1])
predict(model, HouseVotes84[1:10,-1], type = "raw")
pred <- predict(model, HouseVotes84[,-1])
table(pred, HouseVotes84$Class) 36
![Page 37: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/37.jpg)
Exercise for you> data(HairEyeColor)
> mosaicplot(HairEyeColor)
> margin.table(HairEyeColor,3)
Sex
Male Female
279 313
> margin.table(HairEyeColor,c(1,3))
Sex
Hair Male Female
Black 56 52
Brown 143 143
Red 34 37
Blond 46 81
How would you construct a naïve Bayes classifier and test it? 37
![Page 38: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/38.jpg)
Hierarchical clustering> d <- dist(as.matrix(mtcars))
> hc <- hclust(d)
> plot(hc)
38
![Page 39: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/39.jpg)
ctree
39
require(party)
swiss_ctree <- ctree(Fertility ~ Agriculture + Education + Catholic, data = swiss)
plot(swiss_ctree)
![Page 40: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/40.jpg)
Hierarchical clustering
40
> dswiss <- dist(as.matrix(swiss))
> hs <- hclust(dswiss)
> plot(hs)
![Page 41: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/41.jpg)
scatterplotMatrix
41
![Page 42: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/42.jpg)
require(lattice); splom(swiss)
42
![Page 43: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/43.jpg)
43
![Page 44: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/44.jpg)
44
![Page 45: 1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 5a, February 24, 2015 Weighted kNN, ~ clustering, trees and Bayesian classification](https://reader036.vdocuments.mx/reader036/viewer/2022062313/56649d345503460f94a0ac5a/html5/thumbnails/45.jpg)
At this point…• You may realize the inter-relation among
classification at an absolute and relative level (i.e. hierarchical -> trees…)– Trees are interesting from a decision perspective:
if this or that, then this….
• Beyond just distance measures (kmeans) to probabilities (Bayesian)
• So many ways to visualize them…45