![Page 1: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/1.jpg)
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 9
![Page 2: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/2.jpg)
Why MLE?
• Before: Distribution + Parameter→ x
• Now: x + Distribution→ Parameter
• (Much more realistic)
• But: Says nothing about how good a fit a distribution is
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 9
![Page 3: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/3.jpg)
Likelihood
• Likelihood is p(x;θ )
• We want estimate of θ that best explains data we seen
• I.e., Maximum Likelihood Estimate (MLE)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 9
![Page 4: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/4.jpg)
Likelihood
• The likelihood function refers to the PMF (discrete) or PDF (continuous).
• For discrete distributions, the likelihood of x is P(X = x).
• For continuous distributions, the likelihood of x is the density f (x).
• We will often refer to likelihood rather than probability/mass/density sothat the term applies to either scenario.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 9
![Page 5: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/5.jpg)
Optimizing Unconstrained Functions
Suppose we wanted to optimize
`= x2−2x +2 (1)
-5 -4 -3 -2 -1 0 1 2 3 4 5
-3
-2
-1
1
2
3
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9
![Page 6: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/6.jpg)
Optimizing Unconstrained Functions
Suppose we wanted to optimize
`= x2−2x +2 (1)∂ `
∂ x=−2x −2 (2)
-5 -4 -3 -2 -1 0 1 2 3 4 5
-3
-2
-1
1
2
3
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 9
![Page 7: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/7.jpg)
Optimizing Unconstrained Functions
∂ `
∂ x=0 (3)
−2x −2=0 (4)
x =−1 (5)
(Should also check that second derivative is negative)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 9
![Page 8: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/8.jpg)
Optimizing Constrained Functions
Theorem: Lagrange Multiplier Method
Given functions f (x1, . . .xn) and g(x1, . . .xn), the critical points of frestricted to the set g = 0 are solutions to equations:
∂ f
∂ xi(x1, . . .xn) =λ
∂ g
∂ xi(x1, . . .xn) ∀i
g(x1, . . .xn) = 0
This is n+1 equations in the n+1 variables x1, . . .xn,λ.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 9
![Page 9: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/9.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
![Page 10: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/10.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
![Page 11: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/11.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
![Page 12: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/12.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 8 of 9
![Page 13: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/13.jpg)
Lagrange Example
• Dividing the first equation by the second gives us
y
x= 2 (6)
• which means y = 2x , plugging this into the constraint equation gives:
20x +10(2x) = 200
x = 5⇒ y = 10
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 9 of 9
![Page 14: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/14.jpg)
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 4
![Page 15: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/15.jpg)
Continuous Distribution: Gaussian
• Recall the density function
f (x) =1
p2πσ2
exp
�
−(x −µ)2
2σ2
�
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`(µ,σ)≡ −N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
![Page 16: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/16.jpg)
Continuous Distribution: Gaussian
• Recall the density function
f (x) =1
p2πσ2
exp
�
−(x −µ)2
2σ2
�
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`(µ,σ)≡ −N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
![Page 17: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/17.jpg)
Continuous Distribution: Gaussian
• Recall the density function
f (x) =1
p2πσ2
exp
�
−(x −µ)2
2σ2
�
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`(µ,σ)≡ −N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
![Page 18: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/18.jpg)
Continuous Distribution: Gaussian
• Recall the density function
f (x) =1
p2πσ2
exp
�
−(x −µ)2
2σ2
�
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`(µ,σ)≡ −N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
![Page 19: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/19.jpg)
Continuous Distribution: Gaussian
• Recall the density function
f (x) =1
p2πσ2
exp
�
−(x −µ)2
2σ2
�
(1)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`(µ,σ)≡ −N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (2)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 4
![Page 20: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/20.jpg)
MLE of Gaussian µ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (3)
∂ `
∂ µ=0+
1
σ2
∑
i
(xi −µ) (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
![Page 21: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/21.jpg)
MLE of Gaussian µ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (3)
∂ `
∂ µ=0+
1
σ2
∑
i
(xi −µ) (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
![Page 22: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/22.jpg)
MLE of Gaussian µ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (3)
∂ `
∂ µ=0+
1
σ2
∑
i
(xi −µ) (4)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
![Page 23: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/23.jpg)
MLE of Gaussian µ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (3)
∂ `
∂ µ=0+
1
σ2
∑
i
(xi −µ) (4)
Solve for µ:
0=1
σ2
∑
i
(xi −µ) (5)
0=∑
i
xi −Nµ (6)
µ=
∑
i xi
N(7)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
![Page 24: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/24.jpg)
MLE of Gaussian µ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (3)
∂ `
∂ µ=0+
1
σ2
∑
i
(xi −µ) (4)
Solve for µ:
0=1
σ2
∑
i
(xi −µ) (5)
0=∑
i
xi −Nµ (6)
µ=
∑
i xi
N(7)
Consistent with what we said before
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 4
![Page 25: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/25.jpg)
MLE of Gaussian σ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (8)
∂ `
∂ σ=−
N
σ+0+
1
σ3
∑
i
(xi −µ)2 (9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
![Page 26: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/26.jpg)
MLE of Gaussian σ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (8)
∂ `
∂ σ=−
N
σ+0+
1
σ3
∑
i
(xi −µ)2 (9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
![Page 27: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/27.jpg)
MLE of Gaussian σ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (8)
∂ `
∂ σ=−
N
σ+0+
1
σ3
∑
i
(xi −µ)2 (9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
![Page 28: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/28.jpg)
MLE of Gaussian σ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (8)
∂ `
∂ σ=−
N
σ+0+
1
σ3
∑
i
(xi −µ)2 (9)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
![Page 29: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/29.jpg)
MLE of Gaussian σ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (8)
∂ `
∂ σ=−
N
σ+0+
1
σ3
∑
i
(xi −µ)2 (9)
Solve for σ:
0=−n
σ+
1
σ3
∑
i
(xi −µ)2 (10)
N
σ=
1
σ3
∑
i
(xi −µ)2 (11)
σ2 =
∑
i(xi −µ)2
N(12)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
![Page 30: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/30.jpg)
MLE of Gaussian σ
`(µ,σ) =−N logσ−N
2log(2π)−
1
2σ2
∑
i
(xi −µ)2 (8)
∂ `
∂ σ=−
N
σ+0+
1
σ3
∑
i
(xi −µ)2 (9)
Solve for σ:
0=−n
σ+
1
σ3
∑
i
(xi −µ)2 (10)
N
σ=
1
σ3
∑
i
(xi −µ)2 (11)
σ2 =
∑
i(xi −µ)2
N(12)
Consistent with what we said before
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 4
![Page 31: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/31.jpg)
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 8
![Page 32: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/32.jpg)
Optimizing Constrained Functions
Theorem: Lagrange Multiplier Method
Given functions f (x1, . . .xn) and g(x1, . . .xn), the critical points of frestricted to the set g = 0 are solutions to equations:
∂ f
∂ xi(x1, . . .xn) =λ
∂ g
∂ xi(x1, . . .xn) ∀i
g(x1, . . .xn) = 0
This is n+1 equations in the n+1 variables x1, . . .xn,λ.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 8
![Page 33: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/33.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
![Page 34: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/34.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
![Page 35: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/35.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
![Page 36: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/36.jpg)
Lagrange Example
Maximize `(x ,y) =p
xy subject to the constraint 20x +10y = 200.
• Compute derivatives
∂ `
∂ x=
1
2
s
y
x
∂ g
∂ x= 20
∂ `
∂ y=
1
2
√
√x
y
∂ g
∂ y= 10
• Create new systems of equations
1
2
s
y
x= 20λ
1
2
√
√x
y= 10λ
20x +10y = 200
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 8
![Page 37: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/37.jpg)
Lagrange Example
• Dividing the first equation by the second gives us
y
x= 2 (1)
• which means y = 2x , plugging this into the constraint equation gives:
20x +10(2x) = 200
x = 5⇒ y = 10
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 8
![Page 38: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/38.jpg)
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)
p(~x | ~θ ) =N!∏
i xi !
∏
θ xii (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`( ~θ )≡ log(n!)−∑
i
log(xi !)+∑
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
![Page 39: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/39.jpg)
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)
p(~x | ~θ ) =N!∏
i xi !
∏
θ xii (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`( ~θ )≡ log(n!)−∑
i
log(xi !)+∑
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
![Page 40: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/40.jpg)
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)
p(~x | ~θ ) =N!∏
i xi !
∏
θ xii (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`( ~θ )≡ log(n!)−∑
i
log(xi !)+∑
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
![Page 41: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/41.jpg)
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)
p(~x | ~θ ) =N!∏
i xi !
∏
θ xii (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`( ~θ )≡ log(n!)−∑
i
log(xi !)+∑
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
![Page 42: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/42.jpg)
Discrete Distribution: Multinomial
• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)
p(~x | ~θ ) =N!∏
i xi !
∏
θ xii (2)
• Taking the log makes math easier, doesn’t change answer (monotonic)
• If we observe x1 . . .xN , then log likelihood is
`( ~θ )≡ log(n!)−∑
i
log(xi !)+∑
i
xi logθi (3)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 8
![Page 43: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/43.jpg)
MLE of Multinomial θ
`( ~θ ) = log(N!)−∑
i
log(xi !)+∑
i
xi logθi +λ
�
1−∑
i
θi
�
(4)
(5)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
![Page 44: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/44.jpg)
MLE of Multinomial θ
`( ~θ ) = log(N!)−∑
i
log(xi !)+∑
i
xi logθi +λ
�
1−∑
i
θi
�
(4)
(5)
Where did this come from? Constraint that ~θ must be a distribution.
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
![Page 45: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/45.jpg)
MLE of Multinomial θ
`( ~θ ) = log(N!)−∑
i
log(xi !)+∑
i
xi logθi +λ
�
1−∑
i
θi
�
(4)
(5)
• ∂ `∂ θi
= xiθi−λ
• ∂ `∂ λ = 1−∑
i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
![Page 46: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/46.jpg)
MLE of Multinomial θ
`( ~θ ) = log(N!)−∑
i
log(xi !)+∑
i
xi logθi +λ
�
1−∑
i
θi
�
(4)
(5)
• ∂ `∂ θi
= xiθi−λ
• ∂ `∂ λ = 1−∑
i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
![Page 47: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/47.jpg)
MLE of Multinomial θ
`( ~θ ) = log(N!)−∑
i
log(xi !)+∑
i
xi logθi +λ
�
1−∑
i
θi
�
(4)
(5)
• ∂ `∂ θi
= xiθi−λ
• ∂ `∂ λ = 1−∑
i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
![Page 48: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/48.jpg)
MLE of Multinomial θ
`( ~θ ) = log(N!)−∑
i
log(xi !)+∑
i
xi logθi +λ
�
1−∑
i
θi
�
(4)
(5)
• ∂ `∂ θi
= xiθi−λ
• ∂ `∂ λ = 1−∑
i θi
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 8
![Page 49: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/49.jpg)
MLE of Multinomial θ
• We have system of equations
θ1 =x1
λ(6)
...... (7)
θK =xK
λ(8)
∑
i
θi =1 (9)
• So let’s substitute the first K equations into the last:∑
i
xi
λ= 1 (10)
• So λ=∑
i xi =N, and θi =xiN
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
![Page 50: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/50.jpg)
MLE of Multinomial θ
• We have system of equations
θ1 =x1
λ(6)
...... (7)
θK =xK
λ(8)
∑
i
θi =1 (9)
• So let’s substitute the first K equations into the last:∑
i
xi
λ= 1 (10)
• So λ=∑
i xi =N, and θi =xiN
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
![Page 51: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/51.jpg)
MLE of Multinomial θ
• We have system of equations
θ1 =x1
λ(6)
...... (7)
θK =xK
λ(8)
∑
i
θi =1 (9)
• So let’s substitute the first K equations into the last:∑
i
xi
λ= 1 (10)
• So λ=∑
i xi =N,
and θi =xiN
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
![Page 52: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/52.jpg)
MLE of Multinomial θ
• We have system of equations
θ1 =x1
λ(6)
...... (7)
θK =xK
λ(8)
∑
i
θi =1 (9)
• So let’s substitute the first K equations into the last:∑
i
xi
λ= 1 (10)
• So λ=∑
i xi =N, and θi =xiN
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 7 of 8
![Page 53: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/53.jpg)
Maximum Likelihood Estimation
INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 6
![Page 54: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/54.jpg)
Big Pictures
• Ran through several common examples
• For existing distributions you can (and should) look up MLE
• For new models, you can’t (foreshadowing of later in class)
◦ Classification models◦ Unsupervised models (Expectation-Maximization)
• Not always so easy
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6
![Page 55: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/55.jpg)
Big Pictures
• Ran through several common examples
• For existing distributions you can (and should) look up MLE
• For new models, you can’t (foreshadowing of later in class)
◦ Classification models◦ Unsupervised models (Expectation-Maximization)
• Not always so easy
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 2 of 6
![Page 56: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/56.jpg)
Classification
• Classification can be viewed asp(y |x ,θ )
• Have x ,y , need θ
• Discovering θ is also problem ofMLE
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 3 of 6
![Page 57: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/57.jpg)
Clustering
• Clustering can be viewed asp(x |z,θ )
• Have x , need z,θ
• z is guessed at iteratively(Expectation)
• θ estimated to maximizelikelihood (Maximization)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 4 of 6
![Page 58: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/58.jpg)
Not always so easy: Bias
• An estimator is biased if E�
θ̂�
6= θ• We won’t prove it, but the estimate for variance is biased
• Comes from estimating µ, so need to “shrink” variance
σ̂2 =1
N −1
∑
i
(xi −µ)2 (1)
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 5 of 6
![Page 59: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning](https://reader033.vdocuments.mx/reader033/viewer/2022053107/6075625f2d28cc1a095c70ad/html5/thumbnails/59.jpg)
Not always so easy: Intractable Likelihoods
• Not always possible to “solve for” optimal estimator
• Use gradient optimization (we’ll see this for logistic regression)
• Use other approximations (e.g., Monte Carlo sampling)
• Whole subfield of statistics / information science
INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 6 of 6