mle’s, bayesian classifiers and naïve bayesand naïve...
TRANSCRIPT
![Page 1: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/1.jpg)
MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes
Required reading:
• Mitchell draft chapter (on class website)
Machine Learning 10-601Machine Learning 10 601
Tom M. MitchellMachine Learning Department
Carnegie Mellon University
January 28, 2008
![Page 2: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/2.jpg)
Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that maximizes probability of observed dataθ that maximizes probability of observed data
• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data
![Page 3: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/3.jpg)
![Page 4: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/4.jpg)
Estimating MLE for Bernoulli model
![Page 5: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/5.jpg)
Naïve Bayes and Logistic Regression
• Design learning algorithms based on probabilistic modelprobabilistic model– Learn f: X Y, or better yet P(Y|X)
• Two of the most widely used algorithms
• Interesting relationship between these g ptwo:– Generative vs Discriminative classifiers
![Page 6: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/6.jpg)
Bayes Rule
Which is shorthand for:
Random ith possible value of YRandom Variable
p
![Page 7: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/7.jpg)
Bayes Rule
Which is shorthand for:
Equivalently:
![Page 8: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/8.jpg)
Bayes Rule
Which is shorthand for:
Common abbreviation:
![Page 9: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/9.jpg)
Bayes ClassifierX Y
Training data:X Y
Learning = estimating P(X|Y), P(Y)Classification = using Bayes rule toClassification = using Bayes rule to
calculate P(Y | Xnew)
![Page 10: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/10.jpg)
Bayes ClassifierX Y
Training data:X Y
How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?How many parameters must we estimate?
![Page 11: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/11.jpg)
Bayes ClassifierX Y
Training data:X Y
How shall we represent P(X|Y), P(Y)?How many parameters must we estimate?How many parameters must we estimate?
![Page 12: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/12.jpg)
Naïve Bayes
Naïve Bayes assumes
i th t X d X diti lli.e., that Xi and Xj are conditionally independent given Y, for all i≠j
![Page 13: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/13.jpg)
Conditional IndependenceDefinition: X is conditionally independent of Y given Z,
if the probability distribution governing X is independent of the value of Y, given the value of Z
Which we often writeWhich we often write
E.g.,
![Page 14: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/14.jpg)
Naïve Bayes uses assumption that the Xi are conditionally independent given Yindependent, given Y
Given this assumption, then:Given this assumption, then:
in general:
How many parameters needed to describe P(X|Y)? P(Y)?• Without conditional indep assumption?• Without conditional indep assumption?• With conditional indep assumption?
![Page 15: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/15.jpg)
How many parameters to estimate?P(X1 X | Y) ll i bl b lP(X1, ... Xn | Y), all variables booleanWithout conditional independence assumption:
With conditional independence assumption:
![Page 16: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/16.jpg)
Naïve Bayes in a NutshellBayes rule:
Assuming conditional independence among Xi’s:
So, classification rule for Xnew =h X1, …, Xn i is:
![Page 17: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/17.jpg)
Naïve Bayes Algorithm – discrete Xi
• Train Naïve Bayes (examples) for each* value yfor each value yk
estimatefor each* value x of each attribute Xfor each value xij of each attribute Xi
estimate
• Classify (Xnew)
* probabilities must sum to 1, so need estimate only n-1 parameters...
![Page 18: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/18.jpg)
Estimating Parameters• Maximum Likelihood Estimate (MLE): choose θ that maximizes probability of observed dataθ that maximizes probability of observed data
• Maximum a Posteriori (MAP) estimate: choose θ that is most probable given prior probability and the data
![Page 19: MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayestom/10601_sp08/slides/NBayes-1-28-2008-plus.pdf · MLE’s, Bayesian Classifiers and Naïve Bayesand Naïve Bayes Required](https://reader030.vdocuments.mx/reader030/viewer/2022040702/5d658b2888c993160a8bc8e0/html5/thumbnails/19.jpg)
Estimating Parameters: Y, Xi discrete-valued
Maximum likelihood estimates (MLE’s):
N b f D Number of items in set D for which Y=yk