log-linear analysis summary. focus on data analysis focus on underlying process focus on model...

25
Log-linear analysis Summary

Post on 22-Dec-2015

229 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Log-linear analysis

Summary

Page 2: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

• Focus on data analysis

• Focus on underlying process

• Focus on model specification

• Focus on likelihood approach

• Focus on ‘complete-data likelihood’

• Focus on prediction

• Focus on interaction/association

• Link with risk analysis

• Unified perspective on different models

The approach

Page 3: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Risk measures

• Count: Number of events during given period (observation window)

• Probability: probability of an outcome: proportion of risk set experiencing a given outcome (event) at least once

• Risk set = all persons at risk at given point in time.

• Rate: number of events per time unit of exposure(per unit of any measure of size, e.g. time, space, miles

travelled)

Page 4: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Risk measures

• Difference of probabilities: p1 - p2

• Relative risk: ratio of probabilities (focus: risk factor)• prob. of event in presence of risk factor/ prob. of event in absence of risk

factor (control group; reference category): p1 / p2

• Odds: odds on an outcome: ratio of favourable outcomes to unfavourable outcomes. Chance of one outcome rather than another: p1 / (1-p1)

The odds are what matter when placing a bet on a given outcome, i.e. when

something is at stake. Odds reflect the degree of belief in a given outcome.

Relation odds and relative risk: Agresti, 1996, p. 25

Page 5: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Risk measures

• Odds

• Odds ratio : ratio of odds (focus: risk indicator, covariate)• odds in target group / odds in control group [reference category]: ratio of favourable

outcomes in target group over ratio in control group. The odds ratio measures the ‘belief’ in a given outcome in two different populations or under two different conditions. If the odds ratio is one, the two populations or conditions are similar.

) ... 0:scale] [odds (Range p-1

p Odds

Odds 1

1

Odds 1

Odds p

1-

) ,- :(range logit(p) p-1

pln ln(odds)

]exp[- 1

1 p

Page 6: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Risk analysis• Probability models:

– Counts Poisson r.v. Poisson distribution Poisson regression / log-linear model

– Probabilities binomial and multinomial r.v. binomial and multinomial distribution logistic regression / logit model

(parameter p, probability of occurrence, is also called risk; e.g. Clayton and Hills, 1993, p. 7)

– Rates Occurrences/exposure Poisson r.v. log-rate model

Page 7: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Analysis of count data

Introduction to log-linear models

Page 8: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

The Poisson probability model

]exp[- !

} Pr{n

nNn

Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter :

The parameter is the expected number of events per unit time interval: = E[N]

Page 9: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Likelihood function

]exp[- !n

}n NPr{n

Probability mass function:

Log-likelihood function: n!ln - - lnn n) ;l(

Likelihood equations to determine ‘best’ value of

Page 10: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

The log-linear model

The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure.

Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).

Page 11: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Log-linear models for two-way tables

μμμλAB

ij

B

j

A

iij μ ln

Saturated log-linear model:

μ

μA

i μ

B

j

Overall effect (level)

Main effects(marginal freq.)

Interaction effect μAB

ij

In case of 2 x 2 table:

4 observations

9 parameters

Normalisation constraints

Page 12: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Relation log-linear model and Poisson regression model

μμμλAB

ij

B

j

A

iij μ ln

xxxλln 3ij32j21i10ij

x , , 3ij2j1i xx are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is x x*x 2j1i3ij

Page 13: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Design matrixunsaturated log-linear model

μ ln μμλB

j

A

iij

uuuuu

λλλλ

B

2

B

1

A

2

A

1

22

21

12

11

10101

01101

10011

01011

ln

ln

ln

ln

Number of parameters exceeds number of equations need for additional equations

(X’X)-1 is singular identify linear dependencies

μ Y X Yμ XX ' -1

Page 14: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Hybrid log-linear models

• Hybrid log-linear models contain unconventional effect parameters.

• Interaction effects are restricted in certain way.

restrictions on interaction parameters.

Page 15: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Diagonals parameter model 1: (main) diagonal effect

cbaλ kjiij

With ck = 1 for i j and ck = c for i = j (diagonal)

Off-diagonal elements are independent and diagonal elements are changed by a common factor.

Examples of hybrid log-linear models

Page 16: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Diagonals parameter model 3: the diagonal and each minor diagonal has unique effect parameter

cbaλ kjiij

With k indicated the diagonal: k = R + i - j where R is the number of rows (or columns). There are 2R-1 values of ck.

Application: APC models

Diagonals parameter model 2: each diagonal element has separate effect parameter

ck = 1 for i j and ck = ci for i = j (diagonal)

Diagonal elements are predicted perfectly by the model

Page 17: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

The log-rate model

Statistical analysis of occurrence-exposure rates

Page 18: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

The log-rate model: the occurrence matrix and the exposure matrix

Occurrences: Number leaving home by age and sex, 1961 birth cohort: nij

Exposures: number of months living at home (includes censored observations): PMij

Age Female Male Total<20 135 74 209>=20 143 178 321Total 278 252 530Censored 13 40 53Total 291 292 583

Sex

Age Female Male Total<20 15113 16202 31315>=20 4876 9114 13990Total 19989 25316 45305

Sex

Page 19: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

The log-rate model

u u u u PMλln AB

ijBj

Ai

ij

ij

with A AGE [early (before age 10) = 0; late (at age 20 or later) =1] and B SEX [female = 0; male = 1]

u u u u ABij

Bj

Aiijij PMln λln

u u u u ABij

Bj

Ai

o

ijij mλln offset

exposures

countsln

exposure

soccurrenceln

The log-rate model is a log-linear model with OFFSET

(constant term)

ij = E[Nij]PMij fixed

Page 20: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Log-rate model: rate = events/exposure

]u u u [u exp mm

N ABij

Bj

Ai

ij

ij

ij

ij E

mck ijjiij

Gravity model

With ck = 1 for i j and ck = c for i = j (diagonal)

Page 21: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

!ln - - ln n) ;,l( nmcmcn ijijkjiijkjiij

]exp[- !

} Pr{ ij

ij

ij

ijij

nnN

nij

mck ijjiij

Page 22: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Logit model and log-linear model

A comparison

Page 23: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Log-linear model: μ ln μμμλAB

ij

B

j

A

iij

Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex

Are females more likely to vote conservative than males?

Logit model: γ ln B

j

2j

1j

λλ γ

Page 24: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

μμμμμμλλ AB

21

B

1

A

2

AB

11

B

1

A

1

21

11 μ μ ln

Males voting conservative rather than labour:

Females voting conservative rather than labour:

μμμμμμλλ AB

22

B

2

A

2

AB

12

B

2

A

1

22

12 μ μ ln

Are females more likely to vote conservative than males?

Log-odds = logit

2 - - ln μ2μμμμμλλ AB

21

A

1

AB

21

AB

11

A

2

A

1

21

11

2 - - ln μ2μμμμμλλ AB

22

A

1

AB

22

AB

12

A

2

A

1

22

12

Effect coding (1)

θγγ B

1

B

1ln

θγγ B

2

B

2ln

A = Party; B = Sex

Page 25: Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data

Are women more conservative than men? Do women vote more conservative than men? The odds ratio.

γγγγθθ B

1

B

2

B

1

B

2B

1

B

2 - γ γ ln

If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men.

0* - γ ln γγγθB

1

B

2

B

1

B

1

1* - γ ln γγγθB

1

B

2

B

1

B

2

bx a p-1

pln ln logit(p) η

pp

2

1 Logit model:

with a = γB

1 γ

and b = γγB

1

B

2

Log odds of reference category (males)

Log odds ratio (odds females / odds males)

with x = 0, 1