tams38-lecture11 linearmodels&logisticregression lecturer ...€¦ · tams38-lecture11...
TRANSCRIPT
TAMS38 - Lecture 11Linear models & Logistic regression
Lecturer: Jolanta Pielaszkiewicz
Matematisk statistik - Matematiska institutionen
Linköpings universitet
”When you reach the end of your rope,tie a knot in it and hang on.” - Thomas Jefferson
13 December, 2016
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Contents 2
Linear models
Factorial design and regression analysis
Logistic regression
Deviance
Two examples
(Poisson regression)
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Linear models 3
The models of our different factorial designs and models in theregression analysis are included in the class of linear models. Inparticular, the models in factorial design can be written asregression models by using dummy variables.
The linear model can be written as
Y = Xβ + ε : n× 1,
where β : (k+ 1)× 1 are unknown parameters, X : n× (k+ 1) isa known design matrix and
cov(Y ) = cov(ε) = σ2I.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
One-Way ANOVA with dummy variables 4
Let
y1, . . . , y4 be observations from N(µ1, σ)y5, . . . , y7 be observations from N(µ2, σ)y8, . . . , y10 be observations from N(µ3, σ)
and
y = (y1, . . . , y4, y5, . . . , y7, y8, . . . , y10)′.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
One-Way ANOVA, cont. 5
We have
Y1...Y4Y5...Y7Y8...Y10
︸ ︷︷ ︸
=Y
=
1 0 0...
......
1 0 00 1 0...
......
0 1 00 0 1...
......
0 0 1
︸ ︷︷ ︸
=X
µ1
µ2
µ3
︸ ︷︷ ︸
=µ
+
ε1...ε4ε5...ε7ε8...ε10
︸ ︷︷ ︸
=ε
,
i.e., a regression model with no constant term and we getestimates
µ = (X′X)−1X′y.
Exercise Show that the equation gives the ordinaryµ-estimator.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
One-Way ANOVA, cont. 6
A parameterization which is common in the regression analysisis that we let
Yj = β0 + β1zj1 + β2zj2 + εj ,
where
zj1 =
{1, for sample 1,0, otherwise,
zj2 =
{1, for sample 2,0, otherwise.
Exercise Write X-matrix.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
One-Way ANOVA, cont. 7
Note that
E(Yj) =
β0 + β1, for sample 1,β0 + β2, for sample 2,β0, for sample 3,
where β1 describes the difference between expectations ofsample 1 and sample 3 and β2 describes the difference betweenexpectations of sample 2 and sample 3.
If we want to compare samples 1 and 2 we should study β1 − β2.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - One-Way ANOVA 8
Measurements for the four laboratories from the example fromLecture 1 and Lecture 3.
A B C D0.25 0.18 0.19 0.230.27 0.28 0.25 0.300.22 0.21 0.27 0.280.30 0.23 0.24 0.280.27 0.25 0.18 0.240.28 0.20 0.26 0.340.32 0.27 0.28 0.200.24 0.19 0.24 0.180.31 0.24 0.25 0.240.26 0.22 0.20 0.280.21 0.29 0.21 0.220.28 0.16 0.19 0.21
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 9
Model:Yj = β0 + β1zj1 + β2zj2 + β3zj3 + εj ,
where
zjk =
{1, for laboratory no. k,0, otherwise,
for k = 1, 2, 3. Now, we have expectations
E(Yj) =
β0 + β1, for sample 1,β0 + β2, for sample 2,β0 + β3, for sample 3,β0, for sample 4.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 10
Regression Analysis: y versus z1, z2, z3
The regression equation isy = 0.250 + 0.0175 z1 - 0.0233 z2 - 0.0200 z3
Predictor Coef SE Coef T PConstant 0.25000 0.01134 22.05 0.000z1 0.01750 0.01604 1.09 0.281z2 -0.02333 0.01604 -1.46 0.153z3 -0.02000 0.01604 -1.25 0.219
S = 0.0392809 R-Sq = 16.1% R-Sq(adj) = 10.4%
Analysis of Variance
Source DF SS MS F PRegression 3 0.013006 0.004335 2.81 0.050Residual Error 44 0.067892 0.001543Total 47 0.080898
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 11
MTB > print m1
Data Display
Matrix XPXI1
0.0833333 -0.083333 -0.083333 -0.083333-0.0833333 0.166667 0.083333 0.083333-0.0833333 0.083333 0.166667 0.083333-0.0833333 0.083333 0.083333 0.166667
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 12
One-way ANOVA: C5 versus C6
Source DF SS MS F PC6 3 0.01301 0.00434 2.81 0.050Error 44 0.06789 0.00154Total 47 0.08090
S = 0.03928 R-Sq = 16.08% R-Sq(adj) = 10.36%
Individual 95% CIs For Mean Based onPooled StDev
Level N Mean StDev --------+---------+---------+---------+-A 12 0.26750 0.03388 (--------*--------)B 12 0.22667 0.04097 (--------*--------)C 12 0.23000 0.03438 (--------*--------)D 12 0.25000 0.04651 (--------*--------)
--------+---------+---------+---------+-0.225 0.250 0.275 0.300
Pooled StDev = 0.03928
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 13
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Two-Way ANOVA 14
Let us have now two factors with two observations per cell
B1 B2 B3
A1 y1, y2 y3, y4 y5, y6A2 y7, y8 y9, y10 y11, y12
Let
z1 =
{1 for A-level 10 otherwise, u1 =
{1 for B-level 10 otherwise,
u2 =
{1 for B-level 20 otherwise.
Two factor model can be written as
Yj = β0 + α1zj1 + γ1uj1 + γ2uj2+
+ δ11zj1 · uj1 + δ12zj1 · uj2 + εj ,
that is equivalent to the usual two-factor model.
Exercise Write X-matrix.Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Two-Way ANOVA, cont. 15
Here, δ11 and δ12 are our parameters for interactions. Observthat only the dummy variables that are related to differentfactors should be multiplied. We obtain (a−1)(b−1) parametersthat corresponds to the interaction between pairs of factors.
Matrix of expectations for the cells is give by
B1 B2 B3
A1 β0 + α1 + γ1 + δ11 β0 + α1 + γ2 + δ12 β0 + α1
A2 β0 + γ1 β0 + γ2 β0
We have the additive model if and only if δ11 = δ12 = 0.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
16
Regression model in the example above is possible to be doneeven if we miss some y-observations. Then one can use methodwith the incomplete factorial design.
When one builds model with three factors one gets informationregarding the three factor interactions through coefficientstanding besides the product of three dummy variables thatcorrespond to those different factors.
If one can analyze the factorial design as regression model,results are often more difficult to interpret than in the standardanalysis. The usual hypotheses must be translated into the newparameters etc.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example 1 – Beetle mortality 17
The table below shows the number of beetles dead after fivehours exposure to gaseous carbon disulphide at variousconcentrations (data from Bliss, 1935).
Dose, xi Number of Number(log10CS2mgl−1) beetles, ni killed, yi
1.6907 59 61.7242 60 131.7552 62 181.7842 56 281.8113 63 521.8369 59 531.8610 62 611.8839 60 60
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 18
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Binomial distribution 19
A random variable Y follows Binomial distribution(Y ∼ Bin (n, p)) if probability function is given by
pY (y) =
(n
y
)py(1− p)n−y, y = 0, 1, . . . , n,
Assume that we have random variables Yi ∼ Bin (ni, pi) whereYi is the number of successes among ni trials, i = 1, . . . ,m.
Then one has m different parameters.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Log-likelihood function 20
Log-likelihood function (see Appendix) for the maximal modelwith m parameters is
l(p1, . . . , pm; y1, . . . , ym)
=N∑i=1
(yi log
(pi
1− pi
)+ ni log (1− pi) + log
(niyi
)).
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Logistic regression 21
We want to explain the proportion of successes in each groupand we make it using maximum-likelihood-estimator
Pi =Yini
explained with the help of a number of explanatory variables.As expectation is
E(Yi) = nipi och E(Pi) = pi
we can use the following model with the probabilities pi
g(pi) = x′iβ.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Link function 22
The simplest case is the linear model
p = x′β.
Problem here is that x′β can become negative or bigger than 1and we know that obviously 0 ≤ p ≤ 1.
If we let
p = g−1(x′β
)=
∫ t
−∞f(z)dz,
where f(z) is the probability density function, so-calledtolerance- distribution, we ensure that p ∈ [0, 1].
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Model: Linear 23
Tolerance function: Re[a, b]
p =x− ab− a
, a ≤ x ≤ b
Link function:
g(p) = p =x− ab− a
= β1 + β2x,
where β1 = − a
b− aand β2 =
1
b− a.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Model: Probit 24
Tolerance function: N(µ, σ)
p =1
σ√
2π
∫ x
−∞e−
(z−µ)2
2σ2 dz = Φ
(x− µσ
)Link function:
g(p) = Φ−1 (p) =x− µσ
= β1 + β2x Probit (Normit),
where β1 = −µσ
and β2 =1
σ.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Model: Logistic 25
Tolerance function: f(z) = β2eβ1+β2z
(1 + eβ1+β2z)2
p =eβ1+β2x
1 + eβ1+β2x
Link function:
g(p) = log
(p
1− p
)= β1 + β2x Logit
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Model: Extreme value 26
Tolerance function: f(z) = β2 exp{β1 + β2z − eβ1+β2z
}p = 1− exp {− exp (β1 + β2x)}
Link function:
g(p) = log (− log(1− p)) =β1 + β2x
Complementary log-log (Gompit)
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Link functions, cont. 27
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance 28
Assume that we have to models, one with p parameters and onewith the maximal amount of m parameters, where m > p. Letparameters be β0 : p× 1 and β1 : m× 1. Assume also that thesmaller model is a special case of the bigger one. Then we wantto test hypothesis
H0 : Smaller model with p parameters is the same good as themaximal model with m parameters,motH1 : Maximal model is better.
and we do it using the analysis of deviance.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance 29
DefinitionDeviance is defined as
D = 2(l(β1;y)− l(β0;y)
).
One can show that under H0 it holds that
D ≈ χ2(m− p),
and we want to reject H0 in favor of H1 for large values of thedeviance D.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Binomial distribution 30
We have random variables Yi ∼ Bin (ni, pi). The maximalmodel has m different parameters p1, . . . , pm with ML-estimates
P 1 = (p1, . . . , pm) ,
where
pi =yini.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Binomial distribution, cont. 31
Let P 0 be ML-estimator for some other model (with fewerparameters). Then, the deviance is
D = 2(l(P 1,y
)− l(P 0,y
))= 2
m∑i=1
(yi log
pip0i
+ (ni − yi) log1− pi1− p0i
)
= 2
m∑i=1
(yi log
yinip0i
+ (ni − yi) logni − yi
ni(1− p0i)
)
= 2
m∑i=1
(yi log
yiyi
+ (ni − yi) logni − yini − yi
).
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Binomial distribution, cont. 32
Again the deviance
D = 2
n∑i=1
(yi log
yiyi
+ (ni − yi) logni − yini − yi
)has the form
D = 2∑
oi logoiei,
where oi are the observed values (yi and ni − yi), and ei are thefitted values (yi and ni − yi).
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example 1 – Beetle mortality 33
The table below shows the number of beetles dead after fivehours exposure to gaseous carbon disulphide at variousconcentrations (data from Bliss, 1935).
Dose, xi Number of Number(log10CS2mgl−1) beetles, ni killed, yi
1.6907 59 61.7242 60 131.7552 62 181.7842 56 281.8113 63 521.8369 59 531.8610 62 611.8839 60 60
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 34
We will analyze the data using the different link functions givenabove. We start with the logit link function.
log
(p
1− p
)= β1 + β2x.
The log-likelihood function with the logic link function is
l =
n∑i=1
(yi (β1 + β2xi)− ni log
(1 + eβ1+β2xi
)+ log
(niyi
)).
We use MINITAB.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 35
Binary Logistic Regression: y_i, n_i versus x_i
Link Function: Logit
Response Information
Variable Value County_i Event 291
Non-event 190n_i Total 481
Logistic Regression TablePredictor Coef SE Coef Z PConstant -60.7175 5.18071 -11.72 0.000x_i 34.2703 2.91214 11.77 0.000
Log-Likelihood = -186.235Test that all slopes are zero: G = 272.970, DF = 1, P-Value = 0.000
Goodness-of-Fit TestsMethod Chi-Square DF PPearson 10.0268 6 0.124Deviance 11.2322 6 0.081Hosmer-Lemeshow 10.0268 6 0.124
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 36
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
ML-estimators 37
Maximum-likelihood-estimators (MLE) have many goodproperties. For example the the large n we have
β ≈ N(β,I−1
),
where information matrix I is given by
I =(Ijk)j,k
=(
E (UjUk))j,k,
with Ui =∂l
∂βi. One can also prove that
I =
(−E
(∂2l
∂βj∂βk
))j,k
.
Let the element of the covariance matrix be denoted byI−1 =
(Ijk)j,k
.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 38
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 39
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - Embryogenic anthers 40
The data in the table are taken from Sangwan-Norrell (1977).They are numbers yjk of embryogenic anthers of the plantspecies Datura innoxia Mill. obtained when numbers njk ofanthers were prepared under several different conditions.
Centrifuging force (g)Storage condition 40 150 350
Control y1k 55 52 57n1k 102 99 108
Treatment y2k 55 50 50n2k 76 81 90
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 41
We have one factor with two levels, storage in 3oC under 48hours (treatment) and a control group type of storage. There isalso a continuous explanatory variable corresponding to thedifferent centrifuging forces. We will investigate how the storageand centrifuging forces affect the number of embryogenicanthers.
If we plot pjk = yjk/njk against the logarithm of those differentcentrifuging forces xk, we obtain
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 42
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. 43
We will do now to logistic models for πjk, probability thatanthers are embryogenic. The first model is a model withdifferent constant terms and different slope for those two groups.
logit πjk = β0 + α0zj + β1xk + α1zjxk
= β0 + α0zj + (β1 + α1zj)xk,
where zj = 0 for control group, zj = 1 for treatment group.
The other model has a different constant term but the sameslope for the two groups
logit πjk = β0 + α0zj + β1xk.
We use MINITAB.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. - Model 1 44
Binary Logistic Regression: y, n versus z, x = logcf, zx
Link Function: Logit
Logistic Regression TablePredictor Coef SE Coef Z PConstant 0.233910 0.628418 0.37 0.710z 1.97721 0.998079 1.98 0.048x = logcf -0.0227412 0.126851 -0.18 0.858zx -0.318628 0.198881 -1.60 0.109
Log-Likelihood = -374.109Test that all slopes are zero: G = 10.424, DF = 3, P-Value = 0.015
Goodness-of-Fit TestsMethod Chi-Square DF PPearson 0.0276564 2 0.986Deviance 0.0276407 2 0.986Hosmer-Lemeshow 0.0276564 4 1.000
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example, cont. - Model 2 45
Binary Logistic Regression: y, n versus z, x = logcf
Link Function: Logit
Logistic Regression TablePredictor Coef SE Coef Z PConstant 0.876775 0.487037 1.80 0.072z 0.406841 0.174624 2.33 0.020x = logcf -0.154596 0.0970260 -1.59 0.111
Log-Likelihood = -375.404Test that all slopes are zero: G = 7.833, DF = 2, P-Value = 0.020
Goodness-of-Fit TestsMethod Chi-Square DF PPearson 2.59800 3 0.458Deviance 2.61878 3 0.454Hosmer-Lemeshow 2.59800 4 0.627
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
46
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
47
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
48
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - Poisson regression 49
Assume that we have the following observations
yi 2 3 6 7 8 9 10 12 15xi -1 -1 0 0 0 0 1 1 1
and that we want to fit a poisson regression. Then, we assumethat data are Poisson distributed.
A r.v. Y is Poisson distributed with parameter µ > 0(Y ∼ Po (µ)) if the probability function is given by
pY (y) = e−µµy
y!, y = 0, 1, . . . ,
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - Poisson regression, cont. 50
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - Poisson regression, cont. 51
We assume that we have the following model
EYi = µi = β1 + β2xi = x′iβ,
where xi = (1 xi)′ and β = (β1 β2)
′.We take link function g(µi) as identity function
g(µi) = µi.
If we try to maximize the likelihood function we deal with
l(β1, β2) =∑
yi log (β1 + β2xi)−∑
log (yi!)−Nβ1 − β2∑
xi
that is difficult to maximize. We use for example MATLAB.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - Poisson regression, cont. 52
The following MATLAB-code can solve the problem
y = [2 3 6 7 8 9 10 12 15]’;x = [-1 -1 0 0 0 0 1 1 1]’;m = 9;
lnL = @(b) -(y’*log(b(1) + b(2)*x) - m*b(1) - b(2)*sum(x));[b value] = fminsearch(lnL,[7 5]);
so the solution is
>> b
b =
7.4516 4.9353
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Possion distribution 53
Assume that we have response variables Y1, . . . , Ym andYi ∼ Po(µi). Assume that our big model that we would like totest is the one with all µi, i = 1, . . . ,m being different. Then wehave β = (µ1, . . . , µm)′ and log-likelihood function is
l(β1;y) =∑
yi logµi −∑
µi −∑
log yi!
and MLE is µi = yi with the value
l(β;y) =∑
yi log yi −∑
yi −∑
log yi!.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Poisson distribution, cont. 54
Assume that the smaller model have p < n parameters with theMLEs λi, and a value
l(β0;y) =∑
yi log λi −∑
λi −∑
log yi!,
for the log-likelihood function.
Now, the deviance is
D = 2(l(β1)− l(β0)
)= 2
(∑yi log
yi
λi−∑(
yi − λi))
.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Poisson distribution, cont. 55
The estimated means yi = λi are called the fitted value and onecan show that
∑(yi − yi) = 0 in many cases.
Hence, the deviance is
D = 2∑
yi logyiyi
= 2∑
oi logoiei,
where oi is the observed value (yi) and ei is the estimatedexpected value (yi).
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Deviance - Pearsons χ2-test 56
If one do a Taylor series approximation for the terms in thedeviance, i.e.,
oi logoiei
= (oi − ei) +1
2
(oi − ei)2
ei+ . . .
The deviance is then approximately given as
D ≈ 2∑(
(oi − ei) +1
2
(oi − ei)2
ei− (oi − ei)
)=∑ (oi − ei)2
ei= X2.
Hence, the deviance is closely related to Pearsons χ2 test.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Example - Poisson regression, cont. 57
Fitted values in the numerical example above yi = β1 + β2xi,with β1 = 7.4516 and β2 = 4.9353 and deviance D = 0.9473.
xi yi yi yi log (yi/yi)
-1 2 2.5164 -0.4593-1 3 2.5164 0.52740 6 7.4516 -1.30000 7 7.4516 -0.43770 8 7.4516 0.56810 9 7.4516 1.69911 10 12.3869 -2.14061 12 12.3869 -0.38081 15 12.3869 2.8711∑
72 72 0.9473D = 2 · 0.9473 = 1.8946. If the small and maximal model are thesame good then D ∼ χ2
9−2 = χ27. We choose the big model if
D > χ27,0.95 = 14.07. Hence, we cannot reject the small model!
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Appendix: Estimators 58
There are several ways to point to estimate the parameters of aprobability model
moment method,least square method,maximum-likelihood method.
We now want to look more closely at the maximum likelihoodmethod since it is one the most often used by us.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
Likelihood function 59
Let x1, . . . , xn be a random sample with independentobservations from the distribution f(x;θ) that depends on theunknown parameters θ.
DefinitionThe function
L(θ) =
n∏i=1
f(xi;θ) = f(x1;θ) · . . . · f(xn;θ)
is called the likelihood function.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
ML-estimator 60
Definition
A value θ, for which the likelihood function L(θ) obtains itshighest value is called maximum- likelihood-estimate(ML-estimate) of θ.
Before one maximize it is often convenient to take the logarithmof the likelihood function
l(θ) = logL(θ) =
n∑i=1
log f(xi;θ)
and then differentiate with respect to the parameters that youwant to maximize.
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
ML-estimator, cont. 61
Some of the maximum-likelihood-estimators (MLE) propertiesare given below.
If θ is MLE of θ then, under certain (rather mild) conditions,for large n we have
θ − Eθ√V arθ
≈ N(0, 1).
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11
ML-estimator, cont. 62
This can be generalized to the multidimensional case where onecan show that for large n we have
θ ≈ N(θ,I−1
),
where information matrix I is given by
I = (Ijk) = (E (UjUk)) ,
with
Ui =∂l
∂θi.
One can show also that
I =
(−E ∂2l
∂θj∂θk
).
Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11