daphne koller parameter estimation maximum likelihood estimation probabilistic graphical models...
TRANSCRIPT
![Page 1: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/1.jpg)
Daphne Koller
Parameter Estimation
MaximumLikelihoodEstimation
ProbabilisticGraphicalModels
Learning
![Page 2: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/2.jpg)
Daphne Koller
Biased Coin Example
• Tosses are independent of each other• Tosses are sampled from the same
distribution (identically distributed)
P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1-
sampled IID from P
![Page 3: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/3.jpg)
Daphne Koller
IID as a PGM
XData m X[1] X[M]
. . .
0
1
][1
][)|][(
xmx
xmxmxP
![Page 4: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/4.jpg)
Daphne Koller
Maximum Likelihood Estimation
• Goal: find [0,1] that predicts D well• Prediction quality = likelihood of D given
M
mmxPDPDL
1)|][()|():(
HHTTHL ,,,,:
0 0.2 0.4 0.6 0.8 1
L(D:
)
![Page 5: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/5.jpg)
Daphne Koller
Maximum Likelihood Estimator
• Observations: MH heads and MT tails
• Find maximizing likelihood
• Equivalent to maximizing log-likelihood
• Differentiating the log-likelihood and solving for :
TH MMTH MML )1(),:(
)1log(log),:( THTH MMMMl
TH
H
MM
M
![Page 6: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/6.jpg)
Daphne Koller
Sufficient Statistics
• For computing in the coin toss example, we only needed MH and MT since
• MH and MT are sufficient statistics
TH MMDL )1():(
![Page 7: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/7.jpg)
Daphne Koller
Sufficient Statistics• A function s(D) is a sufficient statistic from
instances to a vector in k if for any two datasets D and D’ and any we have
)':():(])[(])[('][][
DLDLixsixsDixDix
Datasets
Statistics
![Page 8: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/8.jpg)
Daphne Koller
Sufficient Statistic for Multinomial
k
i
Mi
iDL1
):(
• For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D
• Sufficient statistic s(x) is a tuple of dimension k– s(xi)=(0,...0,1,0,...,0)
i
![Page 9: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/9.jpg)
Daphne Koller
Sufficient Statistic for Gaussian
• Gaussian distribution:
• Rewrite as
• Sufficient statistics for Gaussian: s(x)=<1,x,x2>
2
2
12
2
1)(),(~)(
x
eXpNXP if
2
2
222
2
1exp
2
1)(
xxXp
![Page 10: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/10.jpg)
Daphne Koller
Maximum Likelihood Estimation
• MLE Principle: Choose to maximize L(D:)
• Multinomial MLE:
• Gaussian MLE: m
mxM
][1
m
i i
ii
M
M
1
m
mxM
2)ˆ][(1
ˆ
![Page 11: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/11.jpg)
Daphne Koller
Summary
• Maximum likelihood estimation is a simple principle for parameter selection given D
• Likelihood function uniquely determined by sufficient statistics that summarize D
• MLE has closed form solution for many parametric distributions
![Page 12: Daphne Koller Parameter Estimation Maximum Likelihood Estimation Probabilistic Graphical Models Learning](https://reader033.vdocuments.mx/reader033/viewer/2022042519/5697c0091a28abf838cc6f66/html5/thumbnails/12.jpg)
Daphne Koller
END END END