naïve bayes - santini.sesantini.se/teaching/ml/2016/lect_06/06_naivebayes.pdf · lecture 6 -...

20
Naïve Bayes Lecture 6: Self-Study ----- Marina Santini Acknowledgements Slides borrowed and adapted from: Data Mining by I. H. Witten, E. Frank and M. A. Hall Lecture 6 - Self-Study: Naive Bayes 1 2016

Upload: truonghanh

Post on 02-May-2019

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

Naïve Bayes Lecture 6: Self-Study

-----

Marina Santini

Acknowledgements Slides borrowed and adapted from:

Data Mining by I. H. Witten, E. Frank and M. A. Hall

Lecture 6 - Self-Study: Naive Bayes 1 2016

Page 2: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

Lecture 6: Required Reading

Daumé III (2015: 53-59; 107-110) Witten et al. (2011: 90-99; 305-308; 314-315;

322-323; 328-329; 331-332; 334)

2016 Lecture 6 - Self-Study: Naive Bayes 2

Page 3: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

Outline

•  Naïve Bayes •  Zero-probability problem: smoothing •  Multinomial Naïve Bayes •  Discussion

2016 Lecture 6 - Self-Study: Naive Bayes 3

Page 4: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

4 Lecture 6 - Self-Study: Naive Bayes

Statistical modeling

l  Use all the attributes l  Two assumptions: Attributes are

♦  equally important ♦  statistically independent (given the class value)

l  I.e., knowing the value of one attribute says nothing about the value of another (if the class is known)

l  Independence assumption is never correct! l  But … this scheme works well in practice

2016

Page 5: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

5 Lecture 6 - Self-Study: Naive Bayes

Probabilities for weather data

5/ 14

5

No

9/ 14

9

Yes

Play

3/5

2/5

3

2

No

3/9

6/9

3

6

Yes

True

False

True

False

Windy

1/5

4/5

1

4

No Yes No Yes No Yes

6/9

3/9

6

3

Normal

High

Normal

High

Humidity

1/5

2/5

2/5

1

2

2

3/9

4/9

2/9

3

4

2

Cool 2/5 3/9 Rainy

Mild

Hot

Cool

Mild

Hot

Temperature

0/5 4/9 Overcast

3/5 2/9 Sunny

2 3 Rainy

0 4 Overcast

3 2 Sunny

Outlook

No True High Mild Rainy

Yes False Normal Hot Overcast

Yes True High Mild Overcast

Yes True Normal Mild Sunny

Yes False Normal Mild Rainy

Yes False Normal Cool Sunny

No False High Mild Sunny

Yes True Normal Cool Overcast

No True Normal Cool Rainy

Yes False Normal Cool Rainy

Yes False High Mild Rainy

Yes False High Hot Overcast

No True High Hot Sunny

No False High Hot Sunny

Play Windy Humidity Temp Outlook

2016

Page 6: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

6 Lecture 6 - Self-Study: Naive Bayes

5/ 14

5

No

9/ 14

9

Yes

Play

3/5

2/5

3

2

No

3/9

6/9

3

6

Yes

True

False

True

False

Windy

1/5

4/5

1

4

No Yes No Yes No Yes

6/9

3/9

6

3

Normal

High

Normal

High

Humidity

1/5

2/5

2/5

1

2

2

3/9

4/9

2/9

3

4

2

Cool 2/5 3/9 Rainy

Mild

Hot

Cool

Mild

Hot

Temperature

0/5 4/9 Overcast

3/5 2/9 Sunny

2 3 Rainy

0 4 Overcast

3 2 Sunny

Outlook

? True High Cool Sunny

Play Windy Humidity Temp. Outlook l  A new day:

Likelihood of the two classes

For “yes” = 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.0053

For “no” = 3/5 × 1/5 × 4/5 × 3/5 × 5/14 = 0.0206

Conversion into a probability by normalization:

P(“yes”) = 0.0053 / (0.0053 + 0.0206) = 0.205

P(“no”) = 0.0206 / (0.0053 + 0.0206) = 0.795

Probabilities for weather data

2016

Page 7: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

7 Lecture 6 - Self-Study: Naive Bayes

Bayes’s rule l Probability of event H given evidence E: l A priori probability of H :

l  Probability of event before evidence is seen

l A posteriori probability of H : l  Probability of event after evidence is seen

Thomas Bayes Born: 1702 in London, England Died: 1761 in Tunbridge Wells, Kent, England

Pr [H ∣E ]= Pr [E ∣H ]Pr [H ]Pr [E ]

Pr [H ]

Pr [H ∣E ]

2016

Page 8: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

8 Lecture 6 - Self-Study: Naive Bayes

Naïve Bayes for classification

l  Classification learning: what’s the probability of the class given an instance?

♦  Evidence E = instance ♦  Event H = class value for instance

l  Naïve assumption: evidence splits into parts (i.e. attributes) that are independent

2016

Page 9: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

9 Lecture 6 - Self-Study: Naive Bayes

Weather data example

? True High Cool Sunny

Play Windy Humidity Temp. Outlook Evidence E

Probability of class “yes”

Pr [yes∣E ]= Pr [Outlook = S unny∣yes ]× Pr [Temperature = C ool∣yes ]× Pr [Humidity = H igh∣yes ]× Pr [Windy = True∣yes ]

× Pr [yes ]Pr [E ]

=

29× 39× 39× 39× 914

Pr [E ]2016

Page 10: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

10 Lecture 6 - Self-Study: Naive Bayes

The “zero-frequency problem”

l  What if an attribute value doesn’t occur with every class value? (e.g. “Humidity = high” for class “yes”)

♦  Probability will be zero! ♦  A posteriori probability will also be zero!

(No matter how likely the other values are!)

l  Remedy: add 1 to the count for every attribute value-class combination (Laplace estimator)

l  Result: probabilities will never be zero! (also: stabilizes probability estimates)

Pr [H umidity = H igh∣yes ]= 0Pr [yes∣E ]= 0

2016

Page 11: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

11 Lecture 6 - Self-Study: Naive Bayes

Modified probability estimates

l  In some cases adding a constant different from 1 might be more appropriate

l  Example: attribute outlook for class yes l  Weights don’t need to be equal (but they must

sum to 1)

Sunny Overcast Rainy

2016

Page 12: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

12 Lecture 6 - Self-Study: Naive Bayes

Missing values

l  Training: instance is not included in frequency count for attribute value-class combination

l  Classification: attribute will be omitted from calculation

l  Example: ? True High Cool ?

Play Windy Humidity Temp. Outlook

Likelihood of “yes” = 3/9 × 3/9 × 3/9 × 9/14 = 0.0238

Likelihood of “no” = 1/5 × 4/5 × 3/5 × 5/14 = 0.0343

P(“yes”) = 0.0238 / (0.0238 + 0.0343) = 41%

P(“no”) = 0.0343 / (0.0238 + 0.0343) = 59%

2016

Page 13: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

13 Lecture 6 - Self-Study: Naive Bayes

Numeric attributes l  Usual assumption: attributes have a normal

or Gaussian probability distribution (given the class)

l  The probability density function for the normal distribution is defined by two parameters: l  Sample mean µ

l  Standard deviation σ

l  Then the density function f(x) is

2016

Page 14: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

14 Lecture 6 - Self-Study: Naive Bayes

Statistics for weather data

l  Example density value:

5/ 14

5

No

9/ 14

9

Yes

Play

3/5

2/5

3

2

No

3/9

6/9

3

6

Yes

True

False

True

False

Windy

σ =9.7

µ =86

95, …

90, 91,

70, 85,

No Yes No Yes No Yes

σ =10.2

µ =79

80, …

70, 75,

65, 70,

Humidity

σ =7.9

µ =75

85, …

72,80,

65,71,

σ =6.2

µ =73

72, …

69, 70,

64, 68,

2/5 3/9 Rainy

Temperature

0/5 4/9 Overcast

3/5 2/9 Sunny

2 3 Rainy

0 4 Overcast

3 2 Sunny

Outlook

2016

Page 15: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

15 Lecture 6 - Self-Study: Naive Bayes

Classifying a new day

l  A new day: l  Missing values during training are not

included in calculation of mean and standard deviation

? true 90 66 Sunny

Play Windy Humidity Temp. Outlook

Likelihood of “yes” = 2/9 × 0.0340 × 0.0221 × 3/9 × 9/14 = 0.000036

Likelihood of “no” = 3/5 × 0.0221 × 0.0381 × 3/5 × 5/14 = 0.000108

P(“yes”) = 0.000036 / (0.000036 + 0. 000108) = 25%

P(“no”) = 0.000108 / (0.000036 + 0. 000108) = 75%

2016

Page 16: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

16 Lecture 6 - Self-Study: Naive Bayes

Probability densities

l  Relationship between probability and density:

l  But: this doesn’t change calculation of a posteriori probabilities because ε cancels out

l  Exact relationship:

2016

Page 17: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

17 Lecture 6 - Self-Study: Naive Bayes

Multinomial naïve Bayes I l  Version of naïve Bayes used for document classification

using bag of words model l  n1,n2, ... , nk: number of times word i occurs in document l  P1,P2, ... , Pk: probability of obtaining word i when sampling

from documents in class H l  Probability of observing document E given class H (based

on multinomial distribution):

l  Ignores probability of generating a document of the right length (prob. assumed constant for each class)

2016

Page 18: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

18 Lecture 6 - Self-Study: Naive Bayes

Multinomial naïve Bayes II l  Suppose dictionary has two words, yellow and blue l  Suppose Pr[yellow | H] = 75% and Pr[blue | H] = 25% l  Suppose E is the document “blue yellow blue” l  Probability of observing document:

Suppose there is another class H' that has Pr[yellow | H'] = 10% and Pr[yellow | H'] = 90%:

l  Need to take prior probability of class into account to make final classification

l  Factorials don't actually need to be computed l  Underflows can be prevented by using logarithms

2016

Page 19: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

19 Lecture 6 - Self-Study: Naive Bayes

Naïve Bayes: discussion

l  Naïve Bayes works surprisingly well (even if independence assumption is clearly violated)

l  Why? Because classification doesn’t require accurate probability estimates as long as maximum probability is assigned to correct class

l  However: adding too many redundant attributes will cause problems (e.g. identical attributes)

l  Note also: many numeric attributes are not normally distributed (→ kernel density estimators)

2016

Page 20: Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

The end

20 Lecture 6 - Self-Study: Naive Bayes 2016