related readingggordon/tmp/04-naive-bayes-annotated.pdfgeoff gordon—10-701 machine learning—fall...
TRANSCRIPT
![Page 1: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/1.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Related reading
• Bishop 2.5: nearest neighbor and Parzen windows
• Bishop 3-3.1: least squares for regression
• Bishop 4-4.1: linear classifiers
• Bishop p46, p380: naive Bayes
1
![Page 2: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/2.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule
• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0
2Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule
• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0
12
![Page 3: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/3.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule: sum version
• P(a | b) = P(b | a) P(a) / P(b)
3
![Page 4: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/4.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule in ML
• P(model | data) = P(data | model) P(model) / P(data)
4
![Page 5: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/5.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule vs. MAP vs. MLE
• P(model | data) = P(data | model) P(model) / P(data)
5
![Page 6: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/6.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Jerzy Neyman
Frequentist vs. Bayes
• Nature as adversary vs. Nature as probability distribution
• Probability as long-run frequency of repeatable events vs. odds for bets I'm willing to take
6
rev. Thomas Bayes
FIGHT!!!
see
also
: htt
p://w
ww
.xkc
d.co
m/1
132/
![Page 7: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/7.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Test for a rare disease
• About 0.1% of all people are infected
• Test detects all infections
• Test is highly specific: 1% false positive
• You test positive. What is the probability you have the disease?
7
![Page 8: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/8.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Test for a rare disease
• About 0.1% of all people are infected
• Test detects all infections
• Test is highly specific: 1% false positive
• You test positive. What is the probability you have the disease?
7
Bonus: what is probability an average med student gets this question wrong?
![Page 9: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/9.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Follow-up test
• Test 2: detects 90% of infections, 5% false positives‣ P(+disease | +test1, +test2) =
8
![Page 10: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/10.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Independence
9
![Page 11: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/11.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Conditional independence
10
![Page 12: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/12.jpg)
xkcd.com
London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…
Conditionally Independent
31
slide credit: Barnabas
![Page 13: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/13.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
xkcd.com
London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…
Conditionally Independent
31
humor credit: xkcd
More on the importance of conditioning
12
![Page 14: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/14.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Samples
13
…
![Page 15: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/15.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Recall: spam filtering
14
![Page 16: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/16.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bag of words
15
![Page 17: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/17.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
A ridiculously naive assumption
• Assume:
• Clearly false:
• Given this assumption, use Bayes rule
16
![Page 18: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/18.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Graphical model
17
A Graphical Model
spam
x1 x2 . . . xn
spam
xi
i=1..n
41
A Graphical Model
spam
x1 x2 . . . xn
spam
xi
i=1..n
41
![Page 19: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/19.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Naive Bayes
• P(spam | email ∧ award ∧ program ∧ for ∧ internet ∧ users ∧ lump ∧ sum ∧ of ∧ Five ∧ Million)
18
![Page 20: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/20.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
In log spacezspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))
z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))
19
![Page 21: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/21.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Collect termszspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))
z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))
z = zspam – zspam
20
![Page 22: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/22.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Linear discriminant
21
![Page 23: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/23.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Intuitions
22
![Page 24: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/24.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
How to get probabilities?
23
• Bernoulli distribution: Ber(p)
Suppose a coin with head prob. p is tossed n times. What is the probability of getting k heads and n-k tails?
• Binomial distribution: Bin(n,p)
17
Discrete Distributions
![Page 25: Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop](https://reader036.vdocuments.mx/reader036/viewer/2022071216/60482921f0bc0c06c9198d10/html5/thumbnails/25.jpg)
Geoff Gordon—10-701 Machine Learning—Fall 2013
Improvements
24