naïve bayes and logistic regressionawm/10701/slides/nbayes-9-27-05.pdfsep 27, 2005  · naïve...

31
Naïve Bayes and Logistic Regression Machine Learning 10-701 Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University September 27, 2005 Required reading: • Mitchell draft chapter (see course website) Recommended reading: • Mitchell, 6.10 (text learning example) • Bishop, Chapter 3.1.3, 3.1.4 • Ng and Jordan paper

Upload: others

Post on 20-Jan-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Naïve Bayes and Logistic Regression

Machine Learning 10-701

Tom M. MitchellCenter for Automated Learning and Discovery

Carnegie Mellon University

September 27, 2005

Required reading:

• Mitchell draft chapter (see course website)

Recommended reading:

• Mitchell, 6.10 (text learning example)

• Bishop, Chapter 3.1.3, 3.1.4

• Ng and Jordan paper

Page 2: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Naïve Bayes and Logistic Regression

• Design learning algorithms based on our understanding of probability

• Two of the most widely used

• Interesting relationship between these two

• Generative and Discriminative classifiers

Page 3: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Bayes Rule

Which is shorthand for:

Random Variable It’s ith possible value

Page 4: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Bayes Rule

Which is shorthand for:

Equivalently:

Page 5: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Bayes Rule

Which is shorthand for:

Common abbreviation:

Page 6: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Bayes Classifier

Training data:

Learning = estimating P(X|Y), P(Y)Classification = using Bayes rule to

calculate P(Y | Xnew)

X Y

Page 7: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Bayes Classifier

Training data:

How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?

X Y

Page 8: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Bayes Classifier

Training data:

How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?

X Y

Full joint P(X 1

... Xn| Y)

usually impractical!

Page 9: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Naïve Bayes

Naïve Bayes assumesX=h X1, …, Xn i, Y discrete-valued

i.e., that Xi and Xj are conditionally independent given Y, for all i≠j

Page 10: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Conditional IndependenceDefinition: X is conditionally independent of Y given Z,

if the probability distribution governing X is independent of the value of Y, given the value of Z

Which we often write

E.g.,

Page 11: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Naïve Bayes uses assumption that the Xi are conditionally independent, given Y

then:

How many parameters needed now for P(X|Y)? P(Y)?

Page 12: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Naïve Bayes classificationBayes rule:

Assuming conditional independence:

So, classification rule for Xnew =h X1, …, Xn i is:

Page 13: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Naïve Bayes Algorithm

• Train Naïve Bayes (examples) for each* value yk

estimatefor each* value xij of each attribute Xi

estimate

• Classify (Xnew)

* parameters must sum to 1

Page 14: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Estimating Parameters: Y, Xi discrete-valued

Maximum likelihood estimates:

MAP estimates (uniform Dirichlet priors):

Page 15: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding
Page 16: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Learning to classify text documents

• Classify which emails are spam• Classify which emails are meeting invites• Classify which web pages are student

home pages

How shall we represent text documents for Naïve Bayes?

Page 17: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding
Page 18: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding
Page 19: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Baseline: Bag of Words Approach

aardvark 0

about 2

all 2

Africa 1

apple 0

anxious 0

...

gas 1

...

oil 1

Zaire 0

Page 20: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding
Page 21: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

For code, seewww.cs.cmu.edu/~tom/mlbook.htmlclick on “Software and Data”

Page 22: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding
Page 23: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding
Page 24: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

What if we have continuous Xi ?Eg., image classification: Xi is ith pixel

Gaussian Naïve Bayes (GNB): assume

Sometimes assume variance• is independent of Y (i.e., σi), • or independent of Xi (i.e., σk)• or both (i.e., σ)

Page 25: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Estimating Parameters: Y discrete, Xi continuous

Maximum likelihood estimates: jth training example

δ(x)=1 if x true, else 0

Page 26: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Example: GNB for classifying mental states

~1 mm resolution

~2 images per sec.

15,000 voxels/image

non-invasive, safe

measures Blood Oxygen Level Dependent (BOLD) response

Typical impulse response

10 sec

Page 27: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Brain scans can track activation with precision and sensitivity

Page 28: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Gaussian Naïve Bayes: Learned μvoxel,wordP(BrainActivity | WordCategory = People)

Page 29: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Learned Bayes Models – Means forP(BrainActivity | WordCategory)

Animal wordsPeople words

Pairwise classification accuracy: 85%

Page 30: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

Plot of single-voxel classification accuracies.

Gaussian naïve Bayes classifier

(yellow and red are most predictive).

Subject 1 Subject 2 Subject 3

Page 31: Naïve Bayes and Logistic Regressionawm/10701/slides/NBayes-9-27-05.pdfSep 27, 2005  · Naïve Bayes and Logistic Regression • Design learning algorithms based on our understanding

What you should know:

• Learning (generative) classifiers based on Bayes rule

• Conditional independence– What it is– Why it’s important

• Naïve Bayes assumption and its consequences

• Naïve Bayes with discrete inputs, continuous inputs