hessian matrices in statistics

65
GVlogo Hessian Matrices In Statistics Ferris Jumah, David Schlueter, Matt Vance MTH 327 Final Project December 7, 2011 Hessian Matrices in Statistics

Upload: ferris-jumah

Post on 08-Jul-2015

403 views

Category:

Data & Analytics


1 download

DESCRIPTION

Hessian Matrices in Statistics

TRANSCRIPT

Page 1: Hessian Matrices in Statistics

GVlogo

Hessian Matrices In Statistics

Ferris Jumah, David Schlueter, Matt Vance

MTH 327Final Project

December 7, 2011

Hessian Matrices in Statistics

Page 2: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .

Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 3: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrix

Brief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 4: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statistics

Maximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 5: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)

Fisher Information and Applications

Hessian Matrices in Statistics

Page 6: Hessian Matrices in Statistics

GVlogo

Topic Introduction

Today we are going to talk about . . .Introduce the Hessian matrixBrief description of relevant statisticsMaximum Likelihood Estimation (MLE)Fisher Information and Applications

Hessian Matrices in Statistics

Page 7: Hessian Matrices in Statistics

GVlogo

The Hessian Matrix

Recall the Hessian matrix

H(f) =

∂2f∂x21

∂2f∂x1 ∂x2

· · · ∂2f∂x1 ∂xn

∂2f∂x2 ∂x1

∂2f∂x22

· · · ∂2f∂x2 ∂xn

......

. . ....

∂2f∂xn ∂x1

∂2f∂xn ∂x2

· · · ∂2f∂x2n

(1)

Hessian Matrices in Statistics

Page 8: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 9: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential Statisitics

Parameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 10: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 11: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 12: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)

E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 13: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 14: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample?

X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 15: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.

Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 16: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 17: Hessian Matrices in Statistics

GVlogo

Statistics: Some things to recall

Now, let’s talk a bit about Inferential StatisiticsParameters

Random VariablesDefinition: A random variable X is a function X : Ω→ R

Each r.v. follows a distribution that has associated probability functionf(x|θ)E.g.

f(x|µ, σ2) =1

σ√

2πexp

[−

(x− µ)2

2σ2

](2)

What is a Random Sample? X1, . . . , Xn i.i.d.Outputs of these r.v.s are our sample data

Hessian Matrices in Statistics

Page 18: Hessian Matrices in Statistics

GVlogo

Stats cont.

Estimators (θ) of Population Parameters

Definition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample dataMany estimators, but which is the best?

Hessian Matrices in Statistics

Page 19: Hessian Matrices in Statistics

GVlogo

Stats cont.

Estimators (θ) of Population ParametersDefinition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample data

Many estimators, but which is the best?

Hessian Matrices in Statistics

Page 20: Hessian Matrices in Statistics

GVlogo

Stats cont.

Estimators (θ) of Population ParametersDefinition: Estimator is often a formula to calculate an estimate of aparameter, θ based on sample dataMany estimators, but which is the best?

Hessian Matrices in Statistics

Page 21: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood Estimation

GOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 22: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sample

Likelihood FunctionWe obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 23: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 24: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)

Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 25: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 26: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 27: Hessian Matrices in Statistics

GVlogo

Maximum Likelihood Estimation (MLE)

Key Concept: Maximum Likelihood EstimationGOAL: to determine the best estimate of a parameter θ from a sampleLikelihood Function

We obtain data vector x = (x1, . . . , xn)Since random sample is i.i.d., we express the probability of our observeddata given θ as

f(x1, x2, . . . , xn | θ) = f(x1|θ) · f(x2|θ) · · · f(xn|θ) (3)

fn(x|θ) =n∏i=1

f(xi|θ) (4)

Implication of maximizing likelihood function

Hessian Matrices in Statistics

Page 28: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regression

Recall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 29: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector w

Likelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 30: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 31: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 32: Hessian Matrices in Statistics

GVlogo

Example of MLE

Example: Gaussian (Normal) Linear regressionRecall Least Squares RegressionWish to determine weight vector wLikelihood function given by

P (y|x,w) =

(1

σ√

)nexp

[−∑i(yi −wTxi)2

2σ2

](5)

Need to minimizen∑i=1

(yi −wTxi)2 = (y −Aw)T (y −Aw) (6)

where A is the design matrix of our data.

Hessian Matrices in Statistics

Page 33: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 34: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 35: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATA

Our resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 36: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 37: Hessian Matrices in Statistics

GVlogo

Example of MLE cont.

Following standard optimization procedure, we compute gradient of

∇S = −ATy +ATAw (7)

Notice linear combination of weights and columns of ATAOur resulting critical point is

w = (ATA)−1ATy, (8)

which we recognize to be the normal equations!

Hessian Matrices in Statistics

Page 38: Hessian Matrices in Statistics

GVlogo

Computing the Hessian Matrix

We compute the Hessian in order to show that this is minimum

∂wk∇S =

∂wk

w1

x1,1

...

xn,1

+ · · ·+ wk

x1,k

...

xn,k

+ · · ·+ wn

x1,n

...

xn,n

=

x1,k

...

xn,k

Therefore,

H = ATA (9)

which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function

Hessian Matrices in Statistics

Page 39: Hessian Matrices in Statistics

GVlogo

Computing the Hessian Matrix

We compute the Hessian in order to show that this is minimum

∂wk∇S =

∂wk

w1

x1,1

...

xn,1

+ · · ·+ wk

x1,k

...

xn,k

+ · · ·+ wn

x1,n

...

xn,n

=

x1,k

...

xn,k

Therefore,H = ATA (9)

which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function

Hessian Matrices in Statistics

Page 40: Hessian Matrices in Statistics

GVlogo

Computing the Hessian Matrix

We compute the Hessian in order to show that this is minimum

∂wk∇S =

∂wk

w1

x1,1

...

xn,1

+ · · ·+ wk

x1,k

...

xn,k

+ · · ·+ wn

x1,n

...

xn,n

=

x1,k

...

xn,k

Therefore,

H = ATA (9)

which is positive semi-definite. Therefore, our estimate for wmaximizes our likelihood function

Hessian Matrices in Statistics

Page 41: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and Disadvantages

Larger samples, as n→∞, give better estimatesθn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 42: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 43: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 44: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other Advantages

Disadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 45: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fit

Begs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 46: Hessian Matrices in Statistics

GVlogo

MLE cont.

Advantages and DisadvantagesLarger samples, as n→∞, give better estimates

θn → θ

Other AdvantagesDisadvantages: Uniqueness, existence, reliance upon distribution fitBegs the question: How much information about a parameter can begathered from sample data?

Hessian Matrices in Statistics

Page 47: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher Information

We determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 48: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 49: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 50: Hessian Matrices in Statistics

GVlogo

Fisher Information

Key Concept: Fisher InformationWe determine the amount of information about a parameter fromsample using Fisher information defined by

I(θ) = −E[∂2 ln[f(x|θ)]

∂θ

]. (10)

Intuitive appeal: More data provides more information aboutpopulation parameter

Hessian Matrices in Statistics

Page 51: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,(

∂ ln[f(x|θ)]∂µ

,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 52: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).

The gradient of the log likelihood is,(∂ ln[f(x|θ)]

∂µ,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 53: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,

(∂ ln[f(x|θ)]

∂µ,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 54: Hessian Matrices in Statistics

GVlogo

Fisher information example

Example: Finding the Fisher information for the normal distributionN(µ, σ2)

Log likelihood function of

ln[f(x|θ)] = −1

2ln(2πσ2)− (x− µ)2

2σ2(11)

where the the parameter vector θ = (µ, σ2).The gradient of the log likelihood is,(

∂ ln[f(x|θ)]∂µ

,∂ ln[f(x|θ)]

∂σ2

)=

(x− µσ2

,(x− µ)2

2σ4− 1

2σ2

)(12)

Hessian Matrices in Statistics

Page 55: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 56: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 57: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix.

We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 58: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 59: Hessian Matrices in Statistics

GVlogo

Fisher information example continued

We now compute the Hessian matrix that will lead us to our Fisherinformation matrix

∂2 ln[f(x|θ)])∂θ2

=

∂2 ln[f(x|θ)]

∂µ2

∂2 ln[f(x|θ)])∂µ∂σ2

∂2 ln[f(x|θ)]∂µ∂σ2

∂2 ln[f(x|θ)]∂(σ2)2

=

(−1σ2

)−(x− µ

σ4

)

−(x− µ

σ4

) (1

2σ4− (x− µ)2

σ6

) (13)

We now compute our Fisher information matrix. We see that

I(θ) = −E(∂2f(x|θ)∂θ2

)(14)

=

[1σ2 0

0 −12σ4

](15)

Hessian Matrices in Statistics

Page 60: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .

Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 61: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 62: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θ

Wald Test: Comparing a proposed value of θ against the MLETest statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 63: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 64: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics

Page 65: Hessian Matrices in Statistics

GVlogo

Applications of Fisher information

Fisher information is used in the calculation of . . .Lower bound of V ar(θ) given by

V ar(θ) ≥1

I(θ)(16)

for an estimator θWald Test: Comparing a proposed value of θ against the MLE

Test statistic given by

W =θ − θ0s.e.(θ)

(17)

wheres.e.(θ) =

1√I(θ)

(18)

Hessian Matrices in Statistics