tams38-lecture11 linearmodels&logisticregression lecturer ...€¦ · tams38-lecture11...

62
TAMS38 - Lecture 11 Linear models & Logistic regression Lecturer: Jolanta Pielaszkiewicz Matematisk statistik - Matematiska institutionen Linköpings universitet ”When you reach the end of your rope, tie a knot in it and hang on.” - Thomas Jefferson 13 December, 2016 Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Upload: others

Post on 13-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

TAMS38 - Lecture 11Linear models & Logistic regression

Lecturer: Jolanta Pielaszkiewicz

Matematisk statistik - Matematiska institutionen

Linköpings universitet

”When you reach the end of your rope,tie a knot in it and hang on.” - Thomas Jefferson

13 December, 2016

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 2: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Contents 2

Linear models

Factorial design and regression analysis

Logistic regression

Deviance

Two examples

(Poisson regression)

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 3: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Linear models 3

The models of our different factorial designs and models in theregression analysis are included in the class of linear models. Inparticular, the models in factorial design can be written asregression models by using dummy variables.

The linear model can be written as

Y = Xβ + ε : n× 1,

where β : (k+ 1)× 1 are unknown parameters, X : n× (k+ 1) isa known design matrix and

cov(Y ) = cov(ε) = σ2I.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 4: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

One-Way ANOVA with dummy variables 4

Let

y1, . . . , y4 be observations from N(µ1, σ)y5, . . . , y7 be observations from N(µ2, σ)y8, . . . , y10 be observations from N(µ3, σ)

and

y = (y1, . . . , y4, y5, . . . , y7, y8, . . . , y10)′.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 5: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

One-Way ANOVA, cont. 5

We have

Y1...Y4Y5...Y7Y8...Y10

︸ ︷︷ ︸

=Y

=

1 0 0...

......

1 0 00 1 0...

......

0 1 00 0 1...

......

0 0 1

︸ ︷︷ ︸

=X

µ1

µ2

µ3

︸ ︷︷ ︸

+

ε1...ε4ε5...ε7ε8...ε10

︸ ︷︷ ︸

,

i.e., a regression model with no constant term and we getestimates

µ = (X′X)−1X′y.

Exercise Show that the equation gives the ordinaryµ-estimator.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 6: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

One-Way ANOVA, cont. 6

A parameterization which is common in the regression analysisis that we let

Yj = β0 + β1zj1 + β2zj2 + εj ,

where

zj1 =

{1, for sample 1,0, otherwise,

zj2 =

{1, for sample 2,0, otherwise.

Exercise Write X-matrix.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 7: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

One-Way ANOVA, cont. 7

Note that

E(Yj) =

β0 + β1, for sample 1,β0 + β2, for sample 2,β0, for sample 3,

where β1 describes the difference between expectations ofsample 1 and sample 3 and β2 describes the difference betweenexpectations of sample 2 and sample 3.

If we want to compare samples 1 and 2 we should study β1 − β2.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 8: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - One-Way ANOVA 8

Measurements for the four laboratories from the example fromLecture 1 and Lecture 3.

A B C D0.25 0.18 0.19 0.230.27 0.28 0.25 0.300.22 0.21 0.27 0.280.30 0.23 0.24 0.280.27 0.25 0.18 0.240.28 0.20 0.26 0.340.32 0.27 0.28 0.200.24 0.19 0.24 0.180.31 0.24 0.25 0.240.26 0.22 0.20 0.280.21 0.29 0.21 0.220.28 0.16 0.19 0.21

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 9: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 9

Model:Yj = β0 + β1zj1 + β2zj2 + β3zj3 + εj ,

where

zjk =

{1, for laboratory no. k,0, otherwise,

for k = 1, 2, 3. Now, we have expectations

E(Yj) =

β0 + β1, for sample 1,β0 + β2, for sample 2,β0 + β3, for sample 3,β0, for sample 4.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 10: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 10

Regression Analysis: y versus z1, z2, z3

The regression equation isy = 0.250 + 0.0175 z1 - 0.0233 z2 - 0.0200 z3

Predictor Coef SE Coef T PConstant 0.25000 0.01134 22.05 0.000z1 0.01750 0.01604 1.09 0.281z2 -0.02333 0.01604 -1.46 0.153z3 -0.02000 0.01604 -1.25 0.219

S = 0.0392809 R-Sq = 16.1% R-Sq(adj) = 10.4%

Analysis of Variance

Source DF SS MS F PRegression 3 0.013006 0.004335 2.81 0.050Residual Error 44 0.067892 0.001543Total 47 0.080898

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 11: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 11

MTB > print m1

Data Display

Matrix XPXI1

0.0833333 -0.083333 -0.083333 -0.083333-0.0833333 0.166667 0.083333 0.083333-0.0833333 0.083333 0.166667 0.083333-0.0833333 0.083333 0.083333 0.166667

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 12: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 12

One-way ANOVA: C5 versus C6

Source DF SS MS F PC6 3 0.01301 0.00434 2.81 0.050Error 44 0.06789 0.00154Total 47 0.08090

S = 0.03928 R-Sq = 16.08% R-Sq(adj) = 10.36%

Individual 95% CIs For Mean Based onPooled StDev

Level N Mean StDev --------+---------+---------+---------+-A 12 0.26750 0.03388 (--------*--------)B 12 0.22667 0.04097 (--------*--------)C 12 0.23000 0.03438 (--------*--------)D 12 0.25000 0.04651 (--------*--------)

--------+---------+---------+---------+-0.225 0.250 0.275 0.300

Pooled StDev = 0.03928

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 13: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 13

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 14: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Two-Way ANOVA 14

Let us have now two factors with two observations per cell

B1 B2 B3

A1 y1, y2 y3, y4 y5, y6A2 y7, y8 y9, y10 y11, y12

Let

z1 =

{1 for A-level 10 otherwise, u1 =

{1 for B-level 10 otherwise,

u2 =

{1 for B-level 20 otherwise.

Two factor model can be written as

Yj = β0 + α1zj1 + γ1uj1 + γ2uj2+

+ δ11zj1 · uj1 + δ12zj1 · uj2 + εj ,

that is equivalent to the usual two-factor model.

Exercise Write X-matrix.Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 15: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Two-Way ANOVA, cont. 15

Here, δ11 and δ12 are our parameters for interactions. Observthat only the dummy variables that are related to differentfactors should be multiplied. We obtain (a−1)(b−1) parametersthat corresponds to the interaction between pairs of factors.

Matrix of expectations for the cells is give by

B1 B2 B3

A1 β0 + α1 + γ1 + δ11 β0 + α1 + γ2 + δ12 β0 + α1

A2 β0 + γ1 β0 + γ2 β0

We have the additive model if and only if δ11 = δ12 = 0.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 16: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

16

Regression model in the example above is possible to be doneeven if we miss some y-observations. Then one can use methodwith the incomplete factorial design.

When one builds model with three factors one gets informationregarding the three factor interactions through coefficientstanding besides the product of three dummy variables thatcorrespond to those different factors.

If one can analyze the factorial design as regression model,results are often more difficult to interpret than in the standardanalysis. The usual hypotheses must be translated into the newparameters etc.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 17: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example 1 – Beetle mortality 17

The table below shows the number of beetles dead after fivehours exposure to gaseous carbon disulphide at variousconcentrations (data from Bliss, 1935).

Dose, xi Number of Number(log10CS2mgl−1) beetles, ni killed, yi

1.6907 59 61.7242 60 131.7552 62 181.7842 56 281.8113 63 521.8369 59 531.8610 62 611.8839 60 60

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 18: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 18

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 19: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Binomial distribution 19

A random variable Y follows Binomial distribution(Y ∼ Bin (n, p)) if probability function is given by

pY (y) =

(n

y

)py(1− p)n−y, y = 0, 1, . . . , n,

Assume that we have random variables Yi ∼ Bin (ni, pi) whereYi is the number of successes among ni trials, i = 1, . . . ,m.

Then one has m different parameters.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 20: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Log-likelihood function 20

Log-likelihood function (see Appendix) for the maximal modelwith m parameters is

l(p1, . . . , pm; y1, . . . , ym)

=N∑i=1

(yi log

(pi

1− pi

)+ ni log (1− pi) + log

(niyi

)).

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 21: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Logistic regression 21

We want to explain the proportion of successes in each groupand we make it using maximum-likelihood-estimator

Pi =Yini

explained with the help of a number of explanatory variables.As expectation is

E(Yi) = nipi och E(Pi) = pi

we can use the following model with the probabilities pi

g(pi) = x′iβ.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 22: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Link function 22

The simplest case is the linear model

p = x′β.

Problem here is that x′β can become negative or bigger than 1and we know that obviously 0 ≤ p ≤ 1.

If we let

p = g−1(x′β

)=

∫ t

−∞f(z)dz,

where f(z) is the probability density function, so-calledtolerance- distribution, we ensure that p ∈ [0, 1].

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 23: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Model: Linear 23

Tolerance function: Re[a, b]

p =x− ab− a

, a ≤ x ≤ b

Link function:

g(p) = p =x− ab− a

= β1 + β2x,

where β1 = − a

b− aand β2 =

1

b− a.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 24: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Model: Probit 24

Tolerance function: N(µ, σ)

p =1

σ√

∫ x

−∞e−

(z−µ)2

2σ2 dz = Φ

(x− µσ

)Link function:

g(p) = Φ−1 (p) =x− µσ

= β1 + β2x Probit (Normit),

where β1 = −µσ

and β2 =1

σ.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 25: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Model: Logistic 25

Tolerance function: f(z) = β2eβ1+β2z

(1 + eβ1+β2z)2

p =eβ1+β2x

1 + eβ1+β2x

Link function:

g(p) = log

(p

1− p

)= β1 + β2x Logit

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 26: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Model: Extreme value 26

Tolerance function: f(z) = β2 exp{β1 + β2z − eβ1+β2z

}p = 1− exp {− exp (β1 + β2x)}

Link function:

g(p) = log (− log(1− p)) =β1 + β2x

Complementary log-log (Gompit)

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 27: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Link functions, cont. 27

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 28: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance 28

Assume that we have to models, one with p parameters and onewith the maximal amount of m parameters, where m > p. Letparameters be β0 : p× 1 and β1 : m× 1. Assume also that thesmaller model is a special case of the bigger one. Then we wantto test hypothesis

H0 : Smaller model with p parameters is the same good as themaximal model with m parameters,motH1 : Maximal model is better.

and we do it using the analysis of deviance.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 29: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance 29

DefinitionDeviance is defined as

D = 2(l(β1;y)− l(β0;y)

).

One can show that under H0 it holds that

D ≈ χ2(m− p),

and we want to reject H0 in favor of H1 for large values of thedeviance D.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 30: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Binomial distribution 30

We have random variables Yi ∼ Bin (ni, pi). The maximalmodel has m different parameters p1, . . . , pm with ML-estimates

P 1 = (p1, . . . , pm) ,

where

pi =yini.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 31: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Binomial distribution, cont. 31

Let P 0 be ML-estimator for some other model (with fewerparameters). Then, the deviance is

D = 2(l(P 1,y

)− l(P 0,y

))= 2

m∑i=1

(yi log

pip0i

+ (ni − yi) log1− pi1− p0i

)

= 2

m∑i=1

(yi log

yinip0i

+ (ni − yi) logni − yi

ni(1− p0i)

)

= 2

m∑i=1

(yi log

yiyi

+ (ni − yi) logni − yini − yi

).

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 32: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Binomial distribution, cont. 32

Again the deviance

D = 2

n∑i=1

(yi log

yiyi

+ (ni − yi) logni − yini − yi

)has the form

D = 2∑

oi logoiei,

where oi are the observed values (yi and ni − yi), and ei are thefitted values (yi and ni − yi).

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 33: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example 1 – Beetle mortality 33

The table below shows the number of beetles dead after fivehours exposure to gaseous carbon disulphide at variousconcentrations (data from Bliss, 1935).

Dose, xi Number of Number(log10CS2mgl−1) beetles, ni killed, yi

1.6907 59 61.7242 60 131.7552 62 181.7842 56 281.8113 63 521.8369 59 531.8610 62 611.8839 60 60

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 34: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 34

We will analyze the data using the different link functions givenabove. We start with the logit link function.

log

(p

1− p

)= β1 + β2x.

The log-likelihood function with the logic link function is

l =

n∑i=1

(yi (β1 + β2xi)− ni log

(1 + eβ1+β2xi

)+ log

(niyi

)).

We use MINITAB.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 35: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 35

Binary Logistic Regression: y_i, n_i versus x_i

Link Function: Logit

Response Information

Variable Value County_i Event 291

Non-event 190n_i Total 481

Logistic Regression TablePredictor Coef SE Coef Z PConstant -60.7175 5.18071 -11.72 0.000x_i 34.2703 2.91214 11.77 0.000

Log-Likelihood = -186.235Test that all slopes are zero: G = 272.970, DF = 1, P-Value = 0.000

Goodness-of-Fit TestsMethod Chi-Square DF PPearson 10.0268 6 0.124Deviance 11.2322 6 0.081Hosmer-Lemeshow 10.0268 6 0.124

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 36: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 36

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 37: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

ML-estimators 37

Maximum-likelihood-estimators (MLE) have many goodproperties. For example the the large n we have

β ≈ N(β,I−1

),

where information matrix I is given by

I =(Ijk)j,k

=(

E (UjUk))j,k,

with Ui =∂l

∂βi. One can also prove that

I =

(−E

(∂2l

∂βj∂βk

))j,k

.

Let the element of the covariance matrix be denoted byI−1 =

(Ijk)j,k

.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 38: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 38

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 39: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 39

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 40: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - Embryogenic anthers 40

The data in the table are taken from Sangwan-Norrell (1977).They are numbers yjk of embryogenic anthers of the plantspecies Datura innoxia Mill. obtained when numbers njk ofanthers were prepared under several different conditions.

Centrifuging force (g)Storage condition 40 150 350

Control y1k 55 52 57n1k 102 99 108

Treatment y2k 55 50 50n2k 76 81 90

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 41: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 41

We have one factor with two levels, storage in 3oC under 48hours (treatment) and a control group type of storage. There isalso a continuous explanatory variable corresponding to thedifferent centrifuging forces. We will investigate how the storageand centrifuging forces affect the number of embryogenicanthers.

If we plot pjk = yjk/njk against the logarithm of those differentcentrifuging forces xk, we obtain

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 42: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 42

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 43: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. 43

We will do now to logistic models for πjk, probability thatanthers are embryogenic. The first model is a model withdifferent constant terms and different slope for those two groups.

logit πjk = β0 + α0zj + β1xk + α1zjxk

= β0 + α0zj + (β1 + α1zj)xk,

where zj = 0 for control group, zj = 1 for treatment group.

The other model has a different constant term but the sameslope for the two groups

logit πjk = β0 + α0zj + β1xk.

We use MINITAB.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 44: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. - Model 1 44

Binary Logistic Regression: y, n versus z, x = logcf, zx

Link Function: Logit

Logistic Regression TablePredictor Coef SE Coef Z PConstant 0.233910 0.628418 0.37 0.710z 1.97721 0.998079 1.98 0.048x = logcf -0.0227412 0.126851 -0.18 0.858zx -0.318628 0.198881 -1.60 0.109

Log-Likelihood = -374.109Test that all slopes are zero: G = 10.424, DF = 3, P-Value = 0.015

Goodness-of-Fit TestsMethod Chi-Square DF PPearson 0.0276564 2 0.986Deviance 0.0276407 2 0.986Hosmer-Lemeshow 0.0276564 4 1.000

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 45: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example, cont. - Model 2 45

Binary Logistic Regression: y, n versus z, x = logcf

Link Function: Logit

Logistic Regression TablePredictor Coef SE Coef Z PConstant 0.876775 0.487037 1.80 0.072z 0.406841 0.174624 2.33 0.020x = logcf -0.154596 0.0970260 -1.59 0.111

Log-Likelihood = -375.404Test that all slopes are zero: G = 7.833, DF = 2, P-Value = 0.020

Goodness-of-Fit TestsMethod Chi-Square DF PPearson 2.59800 3 0.458Deviance 2.61878 3 0.454Hosmer-Lemeshow 2.59800 4 0.627

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 46: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

46

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 47: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

47

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 48: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

48

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 49: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - Poisson regression 49

Assume that we have the following observations

yi 2 3 6 7 8 9 10 12 15xi -1 -1 0 0 0 0 1 1 1

and that we want to fit a poisson regression. Then, we assumethat data are Poisson distributed.

A r.v. Y is Poisson distributed with parameter µ > 0(Y ∼ Po (µ)) if the probability function is given by

pY (y) = e−µµy

y!, y = 0, 1, . . . ,

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 50: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - Poisson regression, cont. 50

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 51: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - Poisson regression, cont. 51

We assume that we have the following model

EYi = µi = β1 + β2xi = x′iβ,

where xi = (1 xi)′ and β = (β1 β2)

′.We take link function g(µi) as identity function

g(µi) = µi.

If we try to maximize the likelihood function we deal with

l(β1, β2) =∑

yi log (β1 + β2xi)−∑

log (yi!)−Nβ1 − β2∑

xi

that is difficult to maximize. We use for example MATLAB.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 52: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - Poisson regression, cont. 52

The following MATLAB-code can solve the problem

y = [2 3 6 7 8 9 10 12 15]’;x = [-1 -1 0 0 0 0 1 1 1]’;m = 9;

lnL = @(b) -(y’*log(b(1) + b(2)*x) - m*b(1) - b(2)*sum(x));[b value] = fminsearch(lnL,[7 5]);

so the solution is

>> b

b =

7.4516 4.9353

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 53: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Possion distribution 53

Assume that we have response variables Y1, . . . , Ym andYi ∼ Po(µi). Assume that our big model that we would like totest is the one with all µi, i = 1, . . . ,m being different. Then wehave β = (µ1, . . . , µm)′ and log-likelihood function is

l(β1;y) =∑

yi logµi −∑

µi −∑

log yi!

and MLE is µi = yi with the value

l(β;y) =∑

yi log yi −∑

yi −∑

log yi!.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 54: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Poisson distribution, cont. 54

Assume that the smaller model have p < n parameters with theMLEs λi, and a value

l(β0;y) =∑

yi log λi −∑

λi −∑

log yi!,

for the log-likelihood function.

Now, the deviance is

D = 2(l(β1)− l(β0)

)= 2

(∑yi log

yi

λi−∑(

yi − λi))

.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 55: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Poisson distribution, cont. 55

The estimated means yi = λi are called the fitted value and onecan show that

∑(yi − yi) = 0 in many cases.

Hence, the deviance is

D = 2∑

yi logyiyi

= 2∑

oi logoiei,

where oi is the observed value (yi) and ei is the estimatedexpected value (yi).

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 56: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Deviance - Pearsons χ2-test 56

If one do a Taylor series approximation for the terms in thedeviance, i.e.,

oi logoiei

= (oi − ei) +1

2

(oi − ei)2

ei+ . . .

The deviance is then approximately given as

D ≈ 2∑(

(oi − ei) +1

2

(oi − ei)2

ei− (oi − ei)

)=∑ (oi − ei)2

ei= X2.

Hence, the deviance is closely related to Pearsons χ2 test.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 57: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Example - Poisson regression, cont. 57

Fitted values in the numerical example above yi = β1 + β2xi,with β1 = 7.4516 and β2 = 4.9353 and deviance D = 0.9473.

xi yi yi yi log (yi/yi)

-1 2 2.5164 -0.4593-1 3 2.5164 0.52740 6 7.4516 -1.30000 7 7.4516 -0.43770 8 7.4516 0.56810 9 7.4516 1.69911 10 12.3869 -2.14061 12 12.3869 -0.38081 15 12.3869 2.8711∑

72 72 0.9473D = 2 · 0.9473 = 1.8946. If the small and maximal model are thesame good then D ∼ χ2

9−2 = χ27. We choose the big model if

D > χ27,0.95 = 14.07. Hence, we cannot reject the small model!

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 58: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Appendix: Estimators 58

There are several ways to point to estimate the parameters of aprobability model

moment method,least square method,maximum-likelihood method.

We now want to look more closely at the maximum likelihoodmethod since it is one the most often used by us.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 59: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

Likelihood function 59

Let x1, . . . , xn be a random sample with independentobservations from the distribution f(x;θ) that depends on theunknown parameters θ.

DefinitionThe function

L(θ) =

n∏i=1

f(xi;θ) = f(x1;θ) · . . . · f(xn;θ)

is called the likelihood function.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 60: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

ML-estimator 60

Definition

A value θ, for which the likelihood function L(θ) obtains itshighest value is called maximum- likelihood-estimate(ML-estimate) of θ.

Before one maximize it is often convenient to take the logarithmof the likelihood function

l(θ) = logL(θ) =

n∑i=1

log f(xi;θ)

and then differentiate with respect to the parameters that youwant to maximize.

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 61: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

ML-estimator, cont. 61

Some of the maximum-likelihood-estimators (MLE) propertiesare given below.

If θ is MLE of θ then, under certain (rather mild) conditions,for large n we have

θ − Eθ√V arθ

≈ N(0, 1).

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11

Page 62: TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer ...€¦ · TAMS38-Lecture11 Linearmodels&Logisticregression Lecturer:JolantaPielaszkiewicz Matematisk statistik - Matematiska

ML-estimator, cont. 62

This can be generalized to the multidimensional case where onecan show that for large n we have

θ ≈ N(θ,I−1

),

where information matrix I is given by

I = (Ijk) = (E (UjUk)) ,

with

Ui =∂l

∂θi.

One can show also that

I =

(−E ∂2l

∂θj∂θk

).

Singull, Pielaszkiewicz, MAI - LiU TAMS38 - Lecture 11