![Page 1: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/1.jpg)
Gaussian Naïve Bayes
1
10-‐601 Introduction to Machine Learning
Matt GormleyLecture 6
February 6, 2016
Machine Learning DepartmentSchool of Computer ScienceCarnegie Mellon University
Naïve Bayes Readings:“Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression” (Mitchell, 2016)
Murphy 3Bishop -‐-‐HTF -‐-‐Mitchell 6.1-‐6.10
Optimization Readings: (next lecture)Lecture notes from 10-‐600 (see Piazza note)
“Convex Optimization” Boyd and Vandenberghe (2009) [See Chapter 9. This advanced reading is entirely optional.]
![Page 2: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/2.jpg)
Reminders
• Homework 2: Naive Bayes– Release: Wed, Feb. 1– Due: Mon, Feb. 13 at 5:30pm
• Homework 3: Linear / Logistic Regression– Release: Mon, Feb. 13– Due: Wed, Feb. 22 at 5:30pm
2
![Page 3: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/3.jpg)
Naïve Bayes Outline• Probabilistic (Generative) View of
Classification– Decision rule for probability model
• Real-‐world Dataset– Economist vs. Onion articles– Document à bag-‐of-‐words à binary feature
vector• Naive Bayes: Model
– Generating synthetic "labeled documents"– Definition of model– Naive Bayes assumption– Counting # of parameters with / without NB
assumption• Naïve Bayes: Learning from Data
– Data likelihood– MLE for Naive Bayes– MAP for Naive Bayes
• Visualizing Gaussian Naive Bayes
3
This Lecture
Last Lecture
![Page 4: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/4.jpg)
Naive Bayes: Model
Whiteboard– Generating synthetic "labeled documents"– Definition of model– Naive Bayes assumption– Counting # of parameters with / without NB assumption
4
![Page 5: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/5.jpg)
What’s wrong with the Naïve Bayes Assumption?
The features might not be independent!!
5
• Example 1:– If a document contains the word “Donald”, it’s extremely likely to contain the word “Trump”
– These are not independent!
• Example 2:– If the petal width is very high, the petal length is also likely to be very high
![Page 6: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/6.jpg)
Naïve Bayes: Learning from Data
Whiteboard– Data likelihood–MLE for Naive Bayes–MAP for Naive Bayes
6
![Page 7: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/7.jpg)
VISUALIZING NAÏVE BAYES
7Slides in this section from William Cohen (10-‐601B, Spring 2016)
![Page 8: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/8.jpg)
![Page 9: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/9.jpg)
Fisher Iris DatasetFisher (1936) used 150 measurements of flowers from 3 different species: Iris setosa (0), Iris virginica (1), Iris versicolor (2) collected by Anderson (1936)
9Full dataset: https://en.wikipedia.org/wiki/Iris_flower_data_set
Species Sepal Length
Sepal Width
Petal Length
Petal Width
0 4.3 3.0 1.1 0.1
0 4.9 3.6 1.4 0.1
0 5.3 3.7 1.5 0.2
1 4.9 2.4 3.3 1.0
1 5.7 2.8 4.1 1.3
1 6.3 3.3 4.7 1.6
1 6.7 3.0 5.0 1.7
![Page 10: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/10.jpg)
Slide from William Cohen
![Page 11: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/11.jpg)
Slide from William Cohen
![Page 12: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/12.jpg)
Plot the difference of the probabilities
Slide from William Cohen
z-‐axis is the difference of the posterior probabilities: p(y=1 | x) – p(y=0 | x)
![Page 13: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/13.jpg)
Question: what does the boundary between positive and negative look like for Naïve Bayes?
Slide from William Cohen (10-‐601B, Spring 2016)
![Page 14: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/14.jpg)
Iris Data (2 classes)
14
![Page 15: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/15.jpg)
Iris Data (sigma not shared)
15
![Page 16: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/16.jpg)
Iris Data (sigma=1)
16
![Page 17: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/17.jpg)
Iris Data (3 classes)
17
![Page 18: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/18.jpg)
Iris Data (sigma not shared)
18
![Page 19: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/19.jpg)
Iris Data (sigma=1)
19
![Page 20: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/20.jpg)
Naïve Bayes has a linear decision boundary (if sigma is shared across classes)
Slide from William Cohen (10-‐601B, Spring 2016)
![Page 21: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/21.jpg)
Figure from William Cohen (10-‐601B, Spring 2016)
![Page 22: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/22.jpg)
Why don’t we drop the generative model and
try to learn this hyperplane directly?
Figure from William Cohen (10-‐601B, Spring 2016)
![Page 23: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/23.jpg)
Beyond the Scope of this Lecture
• Multinomial Naïve Bayes can be used for integer features
• Multi-‐class Naïve Bayes can be used if your classification problem has > 2 classes
23
![Page 24: Gaussian’Naïve’Bayesmgormley/courses/10601-s17/slides/lecture6-gnb.pdfGaussian’Naïve’Bayes 1 100601IntroductiontoMachineLearning Matt%Gormley Lecture6 February%6,%2016 Machine%Learning%Department](https://reader034.vdocuments.mx/reader034/viewer/2022042312/5edad92d09ac2c67fa6867ae/html5/thumbnails/24.jpg)
Summary
1. Naïve Bayes provides a framework for generative modeling
2. Choose p(xm | y) appropriate to the data(e.g. Bernoulli for binary features, Gaussian for continuous features)
3. Train by MLE or MAP4. Classify by maximizing the posterior
24