Download - Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Maximum Likelihood Estimation

INFO-2301: Quantitative Reasoning 2Michael Paul and Jordan Boyd-GraberMARCH 7, 2017

INFO-2301: Quantitative Reasoning 2 | Paul and Boyd-Graber Maximum Likelihood Estimation | 1 of 9

Page 2: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Why MLE?

• Before: Distribution + Parameter→ x

• Now: x + Distribution→ Parameter

• (Much more realistic)

• But: Says nothing about how good a fit a distribution is

Likelihood

• Likelihood is p(x;θ )

• We want estimate of θ that best explains data we seen

• I.e., Maximum Likelihood Estimate (MLE)

Likelihood

• The likelihood function refers to the PMF (discrete) or PDF (continuous).

• For discrete distributions, the likelihood of x is P(X = x).

• For continuous distributions, the likelihood of x is the density f (x).

• We will often refer to likelihood rather than probability/mass/density sothat the term applies to either scenario.

Page 5: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Unconstrained Functions

Suppose we wanted to optimize

`= x2−2x +2 (1)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

Page 6: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Suppose we wanted to optimize

`= x2−2x +2 (1)∂ `

∂ x=−2x −2 (2)

-5 -4 -3 -2 -1 0 1 2 3 4 5

-3

-2

-1

1

2

3

Page 7: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

∂ `

∂ x=0 (3)

−2x −2=0 (4)

x =−1 (5)

(Should also check that second derivative is negative)

Page 8: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Constrained Functions

Theorem: Lagrange Multiplier Method

Given functions f (x1, . . .xn) and g(x1, . . .xn), the critical points of frestricted to the set g = 0 are solutions to equations:

∂ f

∂ xi(x1, . . .xn) =λ

∂ g

∂ xi(x1, . . .xn) ∀i

g(x1, . . .xn) = 0

This is n+1 equations in the n+1 variables x1, . . .xn,λ.

Page 9: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

xy subject to the constraint 20x +10y = 200.

• Compute derivatives

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

• Create new systems of equations

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 10: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 11: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 12: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 13: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

• Dividing the first equation by the second gives us

y

x= 2 (6)

• which means y = 2x , plugging this into the constraint equation gives:

20x +10(2x) = 200

x = 5⇒ y = 10

Page 14: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Page 15: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Continuous Distribution: Gaussian

• Recall the density function

f (x) =1

p2πσ2

exp

�

−(x −µ)2

2σ2

�

(1)

• Taking the log makes math easier, doesn’t change answer (monotonic)

• If we observe x1 . . .xN , then log likelihood is

`(µ,σ)≡ −N logσ−N

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (2)

Page 16: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

f (x) =1

p2πσ2

exp

�

−(x −µ)2

2σ2

�

(1)

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (2)

Page 17: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

f (x) =1

p2πσ2

exp

�

−(x −µ)2

2σ2

�

(1)

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (2)

Page 18: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

f (x) =1

p2πσ2

exp

�

−(x −µ)2

2σ2

�

(1)

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (2)

Page 19: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

f (x) =1

p2πσ2

exp

�

−(x −µ)2

2σ2

�

(1)

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (2)

Page 20: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

`(µ,σ) =−N logσ−N

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

∑

i

(xi −µ) (4)

Page 21: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

∑

i

(xi −µ) (4)

Page 22: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

∑

i

(xi −µ) (4)

Page 23: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

∑

i

(xi −µ) (4)

Solve for µ:

0=1

σ2

∑

i

(xi −µ) (5)

0=∑

i

xi −Nµ (6)

µ=

∑

i xi

N(7)

Page 24: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian µ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (3)

∂ `

∂ µ=0+

1

σ2

∑

i

(xi −µ) (4)

Solve for µ:

0=1

σ2

∑

i

(xi −µ) (5)

0=∑

i

xi −Nµ (6)

µ=

∑

i xi

N(7)

Consistent with what we said before

Page 25: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

∑

i

(xi −µ)2 (9)

Page 26: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

∑

i

(xi −µ)2 (9)

Page 27: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

∑

i

(xi −µ)2 (9)

Page 28: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

∑

i

(xi −µ)2 (9)

Page 29: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

∑

i

(xi −µ)2 (9)

Solve for σ:

0=−n

σ+

1

σ3

∑

i

(xi −µ)2 (10)

N

σ=

1

σ3

∑

i

(xi −µ)2 (11)

σ2 =

∑

i(xi −µ)2

N(12)

Page 30: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Gaussian σ

2log(2π)−

1

2σ2

∑

i

(xi −µ)2 (8)

∂ `

∂ σ=−

N

σ+0+

1

σ3

∑

i

(xi −µ)2 (9)

Solve for σ:

0=−n

σ+

1

σ3

∑

i

(xi −µ)2 (10)

N

σ=

1

σ3

∑

i

(xi −µ)2 (11)

σ2 =

∑

i(xi −µ)2

N(12)

Consistent with what we said before

Page 31: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Page 32: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Optimizing Constrained Functions

Theorem: Lagrange Multiplier Method

Given functions f (x1, . . .xn) and g(x1, . . .xn), the critical points of frestricted to the set g = 0 are solutions to equations:

∂ f

∂ xi(x1, . . .xn) =λ

∂ g

∂ xi(x1, . . .xn) ∀i

g(x1, . . .xn) = 0

This is n+1 equations in the n+1 variables x1, . . .xn,λ.

Page 33: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 34: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 35: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 36: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

Maximize `(x ,y) =p

∂ `

∂ x=

1

2

s

y

x

∂ g

∂ x= 20

∂ `

∂ y=

1

2

√

√x

y

∂ g

∂ y= 10

1

2

s

y

x= 20λ

1

2

√

√x

y= 10λ

20x +10y = 200

Page 37: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Lagrange Example

• Dividing the first equation by the second gives us

y

x= 2 (1)

• which means y = 2x , plugging this into the constraint equation gives:

20x +10(2x) = 200

x = 5⇒ y = 10

Page 38: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Discrete Distribution: Multinomial

• Recall the mass function (N is total number of observations, xi is thenumber for each cell, θi probability of cell)

p(~x | ~θ ) =N!∏

i xi !

∏

θ xii (2)

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

Page 39: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

p(~x | ~θ ) =N!∏

i xi !

∏

θ xii (2)

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

Page 40: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

p(~x | ~θ ) =N!∏

i xi !

∏

θ xii (2)

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

Page 41: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

p(~x | ~θ ) =N!∏

i xi !

∏

θ xii (2)

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

Page 42: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

p(~x | ~θ ) =N!∏

i xi !

∏

θ xii (2)

`( ~θ )≡ log(n!)−∑

i

log(xi !)+∑

i

xi logθi (3)

Page 43: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

MLE of Multinomial θ

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

�

1−∑

i

θi

�

(4)

(5)

Page 44: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

�

1−∑

i

θi

�

(4)

(5)

Where did this come from? Constraint that ~θ must be a distribution.

Page 45: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

�

1−∑

i

θi

�

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

Page 46: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

�

1−∑

i

θi

�

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

Page 47: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

�

1−∑

i

θi

�

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

Page 48: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

`( ~θ ) = log(N!)−∑

i

log(xi !)+∑

i

xi logθi +λ

�

1−∑

i

θi

�

(4)

(5)

• ∂ `∂ θi

= xiθi−λ

• ∂ `∂ λ = 1−∑

i θi

Page 49: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

• We have system of equations

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

∑

i

θi =1 (9)

• So let’s substitute the first K equations into the last:∑

i

xi

λ= 1 (10)

• So λ=∑

i xi =N, and θi =xiN

Page 50: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

∑

i

θi =1 (9)

i

xi

λ= 1 (10)

• So λ=∑

Page 51: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

∑

i

θi =1 (9)

i

xi

λ= 1 (10)

• So λ=∑

i xi =N,

and θi =xiN

Page 52: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

θ1 =x1

λ(6)

...... (7)

θK =xK

λ(8)

∑

i

θi =1 (9)

i

xi

λ= 1 (10)

• So λ=∑

Page 53: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Page 54: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Big Pictures

• Ran through several common examples

• For existing distributions you can (and should) look up MLE

• For new models, you can’t (foreshadowing of later in class)

◦ Classification models◦ Unsupervised models (Expectation-Maximization)

• Not always so easy

Page 55: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Big Pictures

• Ran through several common examples

• For existing distributions you can (and should) look up MLE

• For new models, you can’t (foreshadowing of later in class)

◦ Classification models◦ Unsupervised models (Expectation-Maximization)

• Not always so easy

Page 56: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Classification

• Classification can be viewed asp(y |x ,θ )

• Have x ,y , need θ

• Discovering θ is also problem ofMLE

Page 57: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Clustering

• Clustering can be viewed asp(x |z,θ )

• Have x , need z,θ

• z is guessed at iteratively(Expectation)

• θ estimated to maximizelikelihood (Maximization)

Page 58: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Not always so easy: Bias

• An estimator is biased if E�

θ̂�

6= θ• We won’t prove it, but the estimate for variance is biased

• Comes from estimating µ, so need to “shrink” variance

σ̂2 =1

N −1

∑

i

(xi −µ)2 (1)

Page 59: Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Not always so easy: Intractable Likelihoods

• Not always possible to “solve for” optimal estimator

• Use gradient optimization (we’ll see this for logistic regression)

• Use other approximations (e.g., Monte Carlo sampling)

• Whole subfield of statistics / information science

Download - Maximum Likelihood Estimation · Maximum Likelihood Estimation INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quantitative Reasoning

Top Related