as04

Stratifying on time for cohort studies (AS04)

EPM304 Advanced Statistical Methods in Epidemiology

Course: PG Diploma/ MSc Epidemiology

This document contains a copy of the study material located within the computer assisted learning (CAL) session. If you have any questions regarding this document or your course, please contact DLsupport via [email protected]. Important note: this document does not replace the CAL material found on your module CDROM. When studying this session, please ensure you work through the CDROM material first. This document can then be used for revision purposes to refer back to specific sessions. These study materials have been prepared by the London School of Hygiene & Tropical Medicine as part of the PG Diploma/MSc Epidemiology distance learning course. This material is not licensed either for resale or further copying.

London School of Hygiene & Tropical Medicine September 2013 v2.0

Section 1: Stratifying on time for cohort studies Aims To learn how to deal with variables that change systematically with time in cohort studies. Objectives By the end of this session students will be able to: recognise variables that change systematically with time, such as current age and calendar period manipulate data to account for such time changing variables using Lexis expansion compare rates in different subgroups of a time changing variable assess confounding and effect modification by a time changing variable compare rates in two time changing variables, using the example of current age and calendar period compute and understand standardised mortality ratios (SMRs) Section 2: Planning your study The purpose of this session is to introduce methods of data manipulation to deal with variables that change over time. To follow this session, you need to be familiar with classical methods of analysing cohort studies, together with the basics of Poisson regression. If you have completed SME, then you should make sure that you understand the following sessions before starting this one: Cohort studies SM02 Introduction to Poisson and Cox regression SM11 Section 3: Introduction In a cohort study a group of people is followed over a period of time to study the occurrence of disease. Interaction: Hyperlink: cohort Output (appears in separate window): A study in which subsets of a defined population can be identified that are, have been, or may in the future be exposed, or not exposed, to a factor that is thought to influence the probability of occurrence of a given disease or other outcome. Cohort studies can also be called follow-up, longitudinal, or prospective studies.

'Disease' is used as a general term to refer to the outcome of interest, whether this is disease onset, death or any other well-defined event. 3.1: Introduction Typically, we are interested in the rate of occurrence of disease in the group and how this rate varies between sub-groups with different patterns of exposure. In the examples that we have looked at previously, the exposure has always been the same through time for each individual and so the sub-groups have been fixed through the duration of the cohort. Can you think of any examples of exposures that are fixed through time? Interaction: Button: Show Output: Gender, age at entry to the cohort and country of origin will all be fixed during the duration of the cohort. 3.2: Introduction We will now consider exposures that can change during the follow up period. We can use the same analysis to calculate the usual estimates of crude, stratum-specific and adjusted rate ratios, but first some data manipulation is necessary. Let's first think about the types of variable that change over time. Can you think of any? Click the button below for examples.

Interaction: Button: Example (1):

Output: What is a time changing variable?

Interaction: Tabs: Fixed :

Output: Some characteristics of a person are fixed and never change, for example 'date of birth', 'colour of eyes', 'place of birth'.

Interaction: Tabs: Random :

Output: Other variables can change randomly.

For example, a person may be a smoker, give up smoking, and start smoking again six months later. Such details can be difficult to obtain but it is sometimes possible.

Interaction: Tabs: Deterministic:

Output: The change in other variables is deterministic. For example, we know that over a specific length of time a person will age, we can therefore determine the change.

These are what we call time changing variables. Click below for an example

Interaction: Button: Example:

Output: Example A person may enter a study at age 25 and be followed up for 20 years, and therefore become 20 years older. The risk of certain diseases is known to be much greater in older age groups. For this reason, it is important to worry about time-changing variables, especially age.

(back to main text on LHS): If the follow-up is fairly short, it is unlikely that exposure variables will change, but for longer studies some variables will change substantially during the follow up period. It is necessary to account for such changes.

Interaction: Button: Example (2):

Output (appears on RHS): Imagine the exposure of interest is aged 50 years or more, therefore the unexposed is < 50 years. Consider the 4 subjects shown in the diagram. During a 10-year follow up period, subject 3 changes from non-exposed to exposed.

3.3: Introduction

In the diagram below, subject 3 spends the first half of the follow-up period in the unexposed group and the second half of the follow-up period in the exposed group. To deal with this we can split the follow-up period into 2 parts. In the same way it is possible to split the follow up time of an individual into many parts. The method that splits the individual follow-up times, for example into 5-year age intervals, is called Lexis expansion.

Interaction: Hyperlink: Lexis expansion.:

Output: Lexis Expansion A method that splits the follow-up time of individuals in a cohort study. Using this manipulation of cohort data we can examine the effect of variables which change over time.

Imagine the exposure of interest is aged 50 years or more, therefore the unexposed is < 50 years. Consider the 4 subjects shown in the diagram. During a 10-year follow up period, subject 3 changes from non-exposed to exposed.

3.4: Introduction

Example Imagine an individual followed for 12 years from the age of 21 years to 32 years inclusive. If we split the follow up time for this individual into the age bands 20 to 24 years, 25 to 29 years and 30 to 34 years, the follow-up time in each interval would be: (a) 4 years in the ageband 20 to 24 years (b) 5 years in the ageband 25 to 29 years (c) 3 years in the ageband 30 to 34 years

3.5: Introduction

To study the effect of changing age on cohort mortality rates, the total observation time for each individual is split into age specific intervals. Once this is done, the separate age-specific records for all subjects are treated as independent and age-specific rates estimated.

Lexis expansion assumes that the true rate for the cohort is constant within each age band. This is clearly an approximation. In general, little is gained by using intervals shorter than 5 years unless the rate is changing very rapidly with age. If this occurs, survival analysis should be used. This was looked at in SM03 and SM11 and will be covered in more depth in AS06 and AS07.

3.6: Introduction To illustrate the basic Lexis principle we will use 3 subjects from a cohort which was followed from time of entry into the study until 01/01/1981. The event of interest is death. For simplicity all dates fall on the first day of the month. Click on Swap below to view these data as a diagram. Subject

Birth date

Entered

Age at entry

End of follow-up

Age at exit

Outcome

1 01/03/1927

01/07/1966

39.3 01/09/1977

50.5 Alive

2 01/04/1935

01/11/1961

26.6 01/12/1973

38.7 Death

3 01/11/1942

01/02/1970

27.3 01/01/1981

38.2 Alive

Interaction: Button: Swap:

Output:

The follow-up periods for the three subjects are shown in the diagram above, plotted on an age scale (rather than calendar date).

Now, click Split below to split the follow-up time into 5-year intervals.

Interaction: Button: Split:

Output:

3.7: Introduction

Let's first focus on Subject 1. You can now see the observation time and number of outcomes in each interval for this subject.

The total observation time for Subject 1 is 11.2 years (from 01/07/1966 to 01/09/1977). This is equal to the sum of the separate times spent in the different age groups. For each age interval we want the follow-up time and the outcome for each individual. Use the drop-down menu below to show the observation time and outcomes in each interval, for each of the subjects.

Interaction: Pulldown: Subject 1:

Output:


Output:


Output:

Use Swap to see the original table of data for the subjects.

Interaction: Button: Swap:

Output: Subject Birth date Entered Age at

entry End of follow-up

Age at exit

Outcome

1 01/03/1927 01/07/1966 39.3 01/09/1977 50.5 Alive

2 01/04/1935 01/11/1961 26.6 01/12/1973 38.7 Death

3 01/11/1942 01/02/1970 27.3 01/01/1981 38.2 Alive

3.8: Introduction If we add up the follow-up times for each subject within each ageband, we will get the total observation time for each ageband. If we add up the number of events (deaths) for each subject within each ageband, we will get the total number of events for each ageband.

What is the total observation time (in years) for the age-interval 25 to 29 years?

Interaction: Calculation: Total Y = Incorrect answer : Output: No, you should have added the time each subject spent in the age-interval 25 to 29 years. 0.0 + 3.4 + 2.7 = 6.1 years

(Subject 1) (Subject 2) (Subject 3) Correct answer : Correct Yes, the total observation for the age-interval 25 to 29 years is the sum of the time each subject spent in that interval. 0.0 + 3.4 + 2.7 = 6.1 years

(Subject 1) (Subject 2) (Subject 3)

Interaction: Calculation: Total D = Incorrect answer : Output: No, none of the subjects died during the 25 to 29 years age interval, so the number of events is zero.

Correct answer : Output:

That's correct, none of the subjects died during the 25 to 29 years age interval, so the number of events is zero.


Output:


Output:


Output:

3.9: Introduction In the table below you can now see the values you just calculated for the 25 - 29 years interval. Click below to do this for all age groups. Interaction: Button: Show: Output: Now we have Y and D for each age interval. Using these values we can now compute the overall ageband specific rates. Click below to do this. Interaction: Button: Show: Output:

In this example we have used only 3 subjects as a simple illustration of the Lexis expansion process. In practice this is done on many subjects. Rates within age bands for three subjects

Age Y D Rate 25-29 6.1 0 30-34 35-39 40-44 45-49 50-54 Total

(changes on when click on first Show button) Rates within age bands for three subjects

Age Y D Rate 25-29 6.1 0

30-34 10.0 0 35-39 7.6 1 40-44 5.0 0 45-49 5.0 0 50-54 0.5 0 Total 34.2 1

(changes on when click on second Show button) Rates within age bands for three subjects

Age Y D Rate 25-29 6.1 0 0/6.1 30-34 10.0 0 0/10.0 35-39 7.6 1 1/7.6 40-44 5.0 0 0/5.0 45-49 5.0 0 0/5.0 50-54 0.5 0 0/0.5 Total 34.2 1 1/34.2 3.10: Introduction Once a dataset is changed with Lexis expansion, age-specific rates can be assessed with the 'new data' from realistic ages rather than age at entry. Applying a Lexis expansion to follow-up data updates age throughout the follow-up period. It is important to use Lexis expansion when the follow-up is long term. Click below to apply the Lexis expansion to records for all individuals in the Whitehall dataset. Interaction: Button: Show: Output(appears on RHS): This table shows the Lexis expansion for all individuals in the Whitehall dataset. The original 1677 individual records were split into 5243 age-specific records.

Estimated rates (per 1000) and lower/upper bounds of 95% confidence intervals Age-band

D Y Rate Lower Upper

40 - 49 6 3.14 1.91 0.86 4.26 50 - 54 18 4.43 4.07 2.56 6.45 55 - 59 39 6.00 6.50 4.75 8.89 60 64 89 6.17 14.43 11.73 17.77 65 69 94 4.40 21.37 17.46 26.16 70 74 93 2.44 38.11 31.07 46.66 75 79 46 0.91 50.55 37.77 67.33 80 - 89 18 0.12 150.00 95.76 241.25 Section 4: Adjusting for changing age

We now know how to update age data throughout a long-term cohort. This is useful because age may be a potential confounder, effect modifier or a risk factor. Once we have split the data using a Lexis expansion, we will have one line in the dataset for each person in each age category that they were in during the duration of the cohort. Hence, we will typically have many more lines than before we used a Lexis expansion. We can now use current age as if it were any of the other time fixed variables that we have looked at before, because for each line in the dataset, the person and age category is fixed. Although it may seem as though we must adjust for the fact that several lines (or observations) are actually from the same person, this is not necessary (see Clayton and Hills for an explanation of why this is, if you are interested). If we now analyse the effect of another exposure and do not include current age in the model, we will get the same results as if we had not done the Lexis expansion. This is because the summed person time and number of events in each category remains the same. 4.1: Adjusting for changing age As an example, consider whether current age acts as a confounder or effect modifier on the relationship between employment grade and mortality rates in the Whitehall cohort. How do we assess confounding and effect modification by 'current' age?

Interaction: Button: thought bubble button: Output (appears below and on RHS): Now that we have used a Lexis expansion on the data we can assess for confounding and interaction in the usual way. 1 First we stratify by age-band and look at the rate ratio for the employment grade and mortality within each stratum. 2 Then we assess whether there is homogeneity across strata, that is are the rate ratios similar. Remember we can test this formally using a test for unequal rate ratios, a test for effect modification. 3 If there is no interaction we can present an adjusted Mantel-Haenszel estimate of the rate ratio. This is compared to the crude rate ratio to assess confounding. 4 If there is interaction we should present the stratum specific rate ratios. 4.2: Adjusting for changing age The rate ratios for the effect of grade on mortality within each 5-year interval of current age are shown below. The crude and adjusted estimates are also shown. The test for unequal rate ratios within strata is given below the table.

Considering these results, is there evidence of confounding or effect modification by current age? Go on to the next page when you have thought about this.

Crude, stratum specific and adjusted rate ratios

Ageband Rate ratio Lower CL Upper CL 40 49 1.13 0.13 9.71 50 54 2.35 0.88 6.26 55 59 1.87 0.96 3.64 60 - 64 1.91 1.25 2.91 65 69 1.78 1.19 2.67 70 74 0.98 0.65 1.48 75 79 1.33 0.74 2.39 80 - 89 1.94 0.64 5.89 Crude 2.30 1.89 2.80 Adjusted 1.52 1.25 1.86

Approximate test for unequal rate ratios (interaction): 2 = 7.72, P = 0.3583 4.3: Adjusting for changing age

Interaction: Tabs: 1 :

Output: The stratum-specific rate ratios are similar, with wide overlapping confidence limits. This suggests no interaction. This is confirmed by the test for interaction, P=0.36. We can therefore use the Mantel-Haenszel estimate, which is adjusted for the effect of changing age. RRM-H = 1.52

Interaction: Tabs: 2 : Output: After adjusting for current age, there is still a strong effect of grade on mortality, with a 52% increase in mortality in the lower grades of employment compared to the high grades.

The 95% confidence interval is narrow and does not include 1. We can be 95% confident that the low grade workers have a higher mortality compared to high grade workers in the population of civil servants working in Whitehall. This is after adjusting for the effect of age and men ageing during the cohort.

Interaction: Tabs: 3 : Output: To assess whether the relationship between employment grade and mortality is confounded by current age, we compare the crude and adjusted rate ratios. RRcrude = 2.30 RRM-H = 1.52 The adjusted rate ratio is lower than the crude rate ratio. This shows some evidence of positive confounding. There was an overestimate of the increase in mortality rate in the low-grade workers. This is due to the age differential in the different grades. Crude, stratum specific and adjusted rate ratios

Ageband Rate ratio Lower CL Upper CL 40 49 1.13 0.13 9.71 50 54 2.35 0.88 6.26 55 59 1.87 0.96 3.64 60 - 64 1.91 1.25 2.91 65 69 1.78 1.19 2.67 70 74 0.98 0.65 1.48 75 79 1.33 0.74 2.39 80 - 89 1.94 0.64 5.89 Crude 2.30 1.89 2.80 Adjusted 1.52 1.25 1.86

Approximate test for unequal rate ratios (interaction): 2 = 7.72, P = 0.3583 Section 5: Adjusting for time-changing confounders with Poisson regression

In the classical analysis we saw how to split the data (using a Lexis expansion) for time-changing confounders such as age and then stratify by this new variable to assess potential confounding or effect modification. We can control for such variables in the same way in a Poisson model.

5.1: Adjusting for time-changing confounders with Poisson regression

The parameter estimates (and SEs) from a Poisson model for the effect of Grade adjusted for Ageband (current age) is shown below.

The exposure of interest is grade of employment; we are not really interested in the estimates for the different age bands. Ageband is included in the model to adjust for the effect of changing age on the effect of Grade. Estimates from a Poisson model adjusted for age Coefficient Standard error

Grade1 0.3454 0.1681

Ageband50 1.8433 1.0541

Ageband55 2.1612 1.0291

Ageband60 2.9162 0.0134

Ageband65 3.3047 0.0128

Ageband70 3.5169 1.0184

Ageband75 3.8104 1.0350

Ageband80 4.4880 1.1214

Constant -8.1115 1.0006

Log likelihood = 831.8421 5.2: Adjusting for time-changing confounders with Poisson regression What does the coefficient (=0.3454) for Grade1 tell us?

Interaction: Button: thought bubble:

Output (appears in new window): The value of the coefficient for Grade1 is the difference in log rates from low-grade workers to high-grade workers, this is the log(rate ratio). This estimate is now adjusted for the potential confounding effect of age.

(appears on page) What is the adjusted rate ratio for the effect of Grade, to 2 decimal places?

Interaction: Calculation: Adjusted rate ratio = Output: Incorrect answer: No, in fact the adjusted rate ratio for the effect of Grade is given by the exponential of the coefficient for Grade1. Adjusted rate ratio = exp(0.3454) = 1.41

So, the rate in the low-grade workers is 1.41 times greater than that in high-grade workers, after adjusting for the effect of changing age. Correct answer: Correct The adjusted rate ratio for the effect of Grade is given by: exp(0.3454) = 1.41.

So, the rate in the low-grade workers is 1.41 times greater than that in high-grade workers, after adjusting for the effect of changing age. Estimates from a Poisson model adjusted for age Coefficient Standard error

Grade1 0.3454 0.1681

Ageband50 1.8433 1.0541

Ageband55 2.1612 1.0291

Ageband60 2.9162 0.0134

Ageband65 3.3047 0.0128

Ageband70 3.5169 1.0184

Ageband75 3.8104 1.0350

Ageband80 4.4880 1.1214

Constant -8.1115 1.0006

Log likelihood = 831.8421


Now, using the correct values from the table below, calculate a Wald test statistic for the hypothesis of no effect of low-grade employment. H0: log(rate ratio) = 0 (RR = 1) Give your answer to 3 decimal places.

Interaction: Calculation: Wald test statistic, z = Output: Incorrect answer: No, that's not correct. Remember that the Wald test statistic is given by: coefficient / standard error = 0.3454 / 0.1681

= 2.055. Correct answer: Yes, the Wald test statistic = coefficient / standard error = 0.3454 / 0.1681

= 2.055. Estimates from a Poisson model adjusted for age Coefficient Standard error

Grade1 0.3454 0.1681

Ageband50 1.8433 1.0541

Ageband55 2.1612 1.0291

Ageband60 2.9162 0.0134

Ageband65 3.3047 0.0128

Ageband70 3.5169 1.0184

Ageband75 3.8104 1.0350

Ageband80 4.4880 1.1214

Constant -8.1115 1.0006

Log likelihood = 831.8421


Referring z=2.055 to the normal distribution gives P = 0.04. Can you select the correct words from the dropdowns in the paragraph below to give the true interpretation of this result?

The Wald test, P = 0.04, suggests that the data are not compatible with the null hypothesis, RR = 1. We can, therefore, say that there is evidence the null hypothesis and there is a difference in the rate of CHD for low-grade workers and high-grade workers, after adjusting for age. Interaction: Pulldown: We can, therefore, say that there is evidence the null hypothesis: Incorrect Response to support (pop up box appears): No, with a P-value = 0.04 we can reject the null hypothesis and say that the data in this study (and inferences that we can draw from the study) do not support the null hypothesis. Correct Response against (pop up box appears): That's correct, P=0.04 is evidence against the null hypothesis. Incorrect Response describing (pop up box appears): No, you cannot say there is evidence describing the null hypothesis. The P-value tells us whether the evidence against the null hypothesis is weak or strong. In this case P=0.04 is moderately strong evidence against the null hypothesis. Interaction: Pulldown: and there is a difference in the rate of CHD for low-grade workers and high-grade workers, after adjusting for age: Correct Response significant (pop up box appears): Yes, we can say there is a significant difference in the rate of CHD for low-grade workers compared to high-grade workers after adjusting for the effect of changing age. Incorrect Response small (pop up box appears): No, if there is evidence against the null hypothesis we can say there is a significant difference in the rates of CHD for low grade workers compared to high grade workers, but we cannot say how small or large this difference is from a P-value Incorrect Response variable (pop up box appears): No, there is not a variable difference. From the P-value, we can conclude that there is a significant difference in the rate of CHD between low grade and high-grade workers.


The table below shows the estimated log rates by grade and age. The "40 ageband is the baseline group.

Click the highlighted cells to plot these estimates in the graph below. Current age High grade Low grade

40- -8.1115 -7.7661 (hotspot1)

50- -6.2682 -5.9228 (hotspot2)

55- -5.9503 -5.6049 (hotspot3)

60- -5.1953 -4.8499 (hotspot4)

65- -4.8068 -4.4614 (hotspot5)

70- -4.5946 -4.2492 (hotspot6)

75- -4.3011 -3.9557 (hotspot7)

80- -3.6235 -3.2781 (hotspot8)

Interaction: Hotspot: -7.7661 (hotspot1) Output: (changes table on RHS):


Note: parallel lines indicate assumption of proportional rates.

5.6: Adjusting for time-changing confounders with Poisson regression The Poisson model for Grade and Ageband is shown below. What is the assumption we make in this model?


Output: The assumption we make in this model is that the effect of grade is the same in all age groups. We can call this the proportional rates assumption (the same as the proportional odds assumption in logistic regression). This model does not account for potential interaction.

Go on to the next page to consider a model with interaction.

Log rate = constant + Grade1 + Ageband50 + Ageband55 + Ageband60 + Ageband65 + Ageband70 + Ageband75 + Ageband80

This is a Poisson model with a separate effect for each age group. Section 6: Testing for Interaction

So far, the model we have fitted assumes no interaction between Grade and Ageband, i.e., proportional rates.

We can check whether this assumption is valid in a Poisson model, the same way we do in a logistic model. How do we do this?

Interaction: Button: thought bubble: Output

(appears on page): To check the assumption of proportional rates we fit a model with interaction between Grade and Ageband and compare it to a model without interaction using a likelihood ratio test. If there is a large difference between the two models then there is significant interaction.

Interaction: Button: note: Output (appears in new window): When examining such lines they may not be exactly parallel. The likelihood ratio test tests whether they are close enough to being parallel that we can produce a model which assumes they are. Remember we should always try to produce the simplest model possible.

(appears on page): The tabs below show simple illustrations of no interaction and interaction.

Interaction: Tabs: Proportional :

Output:

Interaction: Tabs: Interaction 1 :

Output:

Interaction: Tabs: Interaction 2 :

Output:

6.1: Testing for Interaction In the following models, because of the small number of events in the extreme groups, the lowest age band has been combined with the second lowest and the highest age band has been combined with the second highest. We have 6 age groups in Ageband and thus there are 5 estimated parameters for ageband.

The log-likelihoods for the two models with and without interaction are: Model with interaction between Grade and Ageband: Log likelihood, L1 = - 829.68055 Model without interaction between Grade and Ageband:

Log likelihood, L0 = - 834.89366

Calculate the LRS to test for interaction, giving your answer to 2 decimal places:

Interaction: Calculation: LRS = Output: Incorrect answer: No, that's not right. The likelihood ratio statistic is given by: LRS = 2(L1 L0) = 2( 829.68055 ( 834.89366)) = 10.43

Correct answer: Yes, the likelihood ratio statistic is LRS = 2(L1 L0) = 2( 829.68055 ( 834.89366)) = 10.43

(back to main text)

In these models, because of the small number of events in the extreme groups, the lowest age band has been combined with the second lowest and the highest age band has been combined with the second highest. We therefore have 6 age groups in Ageband Section 7: Adjusting for age and calendar period Interaction: Tab 1 Think about studies which last 10 years or more; during such long periods rates may vary.

For example, men aged 40-44 in Britain had a different mortality rate in 1940 than had men aged 40-44 in 1970. In this situation it is better to divide events and observation time by both age and calendar period. Tab 2 The figure below shows three cohort subjects. The x-axis represents calendar period, the y-axis represents age. Instead of only splitting the total follow-up time for an individual when they change age group, it is also split when they change calendar period. Click below to show this. As before we can then calculate mortality rates using the combined age and calendar period intervals. Interaction: button: Show

Output:

7.1: Adjusting for age and calendar period

Example Consider subject 1:

Entry Changed age group Changed calender period Exit

01/07/1966

01/03/1967

01/01/1970

01/03/1972

01/01/1975

01/03/1977

01/09/1977

This splits the total follow-up time for subject 1 into 6 parts, as shown on the diagram below.

7.2: Adjusting for age and calendar period Once we have done a Lexis expansion on both age and calendar period, we will have separate records for each individual for each combination of current age and calendar period. We can then analyse these exposures as usual. Note that if we do the Lexis expansion and then do not include age or calendar period in the model, we will get the same result for other exposures as if we had not done the Lexis expansion at all. Similarly, if we do not include calendar period (or conversely age) in the model, we will get the same result as if we had only done the Lexis expansion on age (or conversely calendar period). Section 8: Standardised Mortality Ratios There are cohort studies conducted in populations that have all experienced an exposure of interest. It can be interesting to compare their mortality rates to the rates in a reference cohort. For example, consider a cohort consisting entirely of the workforce of a factory that manufactures a potentially hazardous chemical. Since the whole workforce has been

exposed to the chemical, we would need to compare the mortality rates in this cohort to an external, reference cohort. Can you think of any potential biases in such an analysis?

Interaction: Button: thought bubble: Output (appears on page): Many studies show that those in an occupational cohort have lower mortality than the general population. This is because those who are very sick, and hence at higher risk of death, often cannot work. This is referred to as the healthy worker effect. Those who are in the reference cohort may not be truly unexposed. For example, a reference cohort drawn from the communities surrounding a mine may also be exposed to mine dust and may include ex-miners. The exposed cohort and the unexposed reference population may differ substantially in age and calendar period. For example, if an occupational cohort has been followed for 30 years, it may have experienced changing mortality over three decades, which would not be reflected in a more recent reference cohort. 8.1: Standardised Mortality Ratios Standardised Mortality Rates (SMRs) are a way of dealing with these differences in age and calendar period. In essence, it is a stratified rate ratio between the exposed cohort mortality rates and the reference cohort mortality rates, where the strata are categories of age, calendar period and, possibly, sex. The Standardised Incidence Ratio (SIR) has the same definition but is for comparing disease incidence instead of mortality. 8.2: Standardised Mortality Ratios The SMR is calculated as the number of deaths observed in the cohort (D), divided by the number of deaths expected from the rates in the reference cohort (E). The expected number of events is calculated separately for each stratum (as defined by all combinations of age, calendar period and possibly sex) and then summed to give the total number of expected events. The expected number of events in a stratum is the person time in that stratum from the exposed cohort multiplied by the rate in that stratum from the reference cohort i.e. we compare the observed deaths in the exposed cohort to the number of deaths that would be expected if the reference cohort had the same age/calendar period/sex distribution as the exposed cohort. What are we assuming about the stratum-specific rate ratios, comparing the rate in exposed individuals with the rate in unexposed individuals?


Output (appears on page): As with all Mantel-Haenszel summary estimates, we assume that all the underlying stratum-specific rate ratios for the effect of the exposure are the same i.e. any differences between the stratum-specific rate ratios is just random variation. So we are assuming that the effect of the exposure (as measured by the rate ratio) is the same in all combinations of age group, sex, and calendar period. 8.3: Standardised Mortality Ratios We will now calculate the SMR, controlling for age only, comparing the Whitehall cohort during the period 1970-74 to the reference rates from England and Wales over the same time period. Calculate the rate ratio for age group 50-54 years to 2 decimal places. Interaction: Calculation: rate ratio = Output: Incorrect answer: No, that's not right. The rate ratio is the rate in the exposed group divided by the rate in the unexposed group. RR = 1.752 / 3.487 = 0.50

Correct answer: Yes, the rate ratio is the rate in the exposed group divided by the rate in the unexposed group. RR = 1.752 / 3.487 = 0.50 (back to main text)

On the next page we will see all the rate ratios completed.

Age group

Whitehall cohort Reference cohort Deaths Person-years

(per 1000py) Mortality rate (per 1000py)

Mortality rate (per 1000py)

50-54 55-59 60-64 65-69 70-74

39 87 92 62 15

22.2599 19.3621 14.6177 5.9896 1.1421

1.752 4.493 6.294 10.351 13.134

3.487 5.569 8.751 13.777 19.946

8.4: Standardised Mortality Ratios We can see below that the rate ratios vary between 0.50 and 0.81 with no obvious pattern. We will assume that the true rate ratio is the same in each age stratum and hence, calculate the SMR controlling for age. Now calculate the expected number of deaths for age group 50-54 years to 1 decimal place. Interaction: Calculation: expected deaths = Output: Incorrect answer: No, that's not right. The expected number of deaths is the person-years from the exposed cohort multiplied by the rate in the reference cohort. Expected deaths = 22.2599 * 3.487 = 77.6

Correct answer: Yes, the expected number of deaths is the person-years from the exposed cohort multiplied by the rate in the reference cohort. Expected deaths = 22.2599 * 3.487 = 77.6 (back to main text)

On the next page we will see all the expected deaths completed.

Age group

Whitehall cohort Reference cohort Rate ratio Deaths Person-years (per

1000py)



50-54 55-59 60-64 65-69 70-74

39 87 92 62 15

22.2599 19.3621 14.6177 5.9896 1.1421

1.752 4.493 6.294 10.351 13.134

3.487 5.569 8.751 13.777 19.946

0.50 0.81 0.72 0.75 0.66

8.5: Standardised Mortality Ratios We can see below the observed and expected deaths in each age group. Now calculate the SMR, controlling for age, to 2 decimal places. Interaction: Calculation: SMR = Output:

Incorrect answer: No, that's not right. The SMR is the observed divided by the expected number of deaths. SMR = (39+87+92+62+15) / (77.6+107.8+127.9+82.5+22.8) = 295 / 418.7 = 0.70

Correct answer: Yes, the SMR is the observed divided by the expected number of deaths. SMR = (39+87+92+62+15) / (77.6+107.8+127.9+82.5+22.8) = 295 / 418.7 = 0.70

So the Whitehall cohort had a 30% lower mortality than the general population, controlling for the effect of age. SMRs are often quoted to base 100 i.e. this SMR would be quoted as 70. Note this method of age standardisation is often referred to as indirect standardisation.

Age group

Whitehall cohort Reference cohort Expected deaths

Deaths Person-years (per 1000py)



50-54 55-59 60-64 65-69 70-74

39 87 92 62 15

22.2599 19.3621 14.6177 5.9896 1.1421

1.752 4.493 6.294 10.351 13.134

3.487 5.569 8.751 13.777 19.946

77.6 107.8 127.9 82.5 22.8

8.6: Standardised Mortality Ratios To calculate a 95% confidence interval for an SMR, we divide and multiply by the error factor, calculated as exp(1.96/D). For hypothesis testing that SMR differs significantly from 100, we calculate the test statistic as U2/V, where U is the (observed deaths expected deaths) and V is the expected deaths. This test statistic is then compared to the 2 distribution on one degree of freedom. Calculate the confidence interval for the Whitehall SMR, controlling for age, to 0 decimal places.

Confidence interval = Lower CL

Upper CL

Interaction: Calculation: Lower CL Output: Any incorrect answer: No, that's not right. The error factor is exp(1.96/295) = 1.12. Lower CL = 70/1.12 = 63

Correct answer: Thats right.

Interaction: Calculation: Lower CL Output: Any incorrect answer: No, that's not right. The error factor is exp(1.96/295) = 1.12. Upper CL = 70*1.12 = 78

Correct answer: Thats right. Yes, the error factor is exp(1.96/295) = 1.12. Lower CL = 70/1.12 = 63 Upper CL = 70*1.12 = 78

Hence, we are 95% confident that the true SMR lies between 63 and 78. 8.7: Standardised Mortality Ratios In some cases we want to compare the rates across many groups. Instead of calculating an SMR for each pair, it is easier to calculate a list of SMRs, one for each comparison of a group with the reference. By comparing two SMRs in the list, we get an indirect comparison between their two groups. These indirect comparisons are valid, providing that the true stratum-specific rate ratios (between each group and the reference population) can be assumed to be constant for each group being compared. If this assumption is valid, the ratio of the corresponding SMRs provides an unbiased estimate of the rate ratio between the two groups. Hence, we can see that if it is appropriate to calculate a series of SMRs in the first place, it is also appropriate to compare them. However, the comparison of SMRs should only be used as a rough guide, and more detailed comparisons between particular pairs should be made directly using the Mantel-Haenszel method. Section 9: Summary This is the end of AS04. When you are happy with the material covered here please move on to session AS05 .

The main points of this session will appear below as you click on the relevant title.

Recognising the importance of time changing variables

Some exposure variables change over time. When this change is deterministic (for example, we know that individuals will age during the period of follow-up), this can be taken into account during analysis. Note that some variables, such as age, can be regarded as time fixed (for example, age at entry to the cohort) or time changing (current age). Choosing which to use depends on the study question and on the duration of the cohort.

Manipulating the data using Lexis expansion We do this by first splitting the record for each individual by intervals of the exposure variable, for example, by current age groups. This manipulation of the data is called a Lexis expansion. Then we have the separate exposure specific records, such that each individual will have a separate record for each age group that they were in during the cohort (for the example of current age). Adjustment for time changing variables We can then treat these multiple records from one individual as independent records and the usual methods of analysis can be applied i.e. we can treat current age as though it were a time fixed variable when assessing if it is a risk factor, a confounder or an effect modifier. Standardised mortality ratios (SMRs) We have seen that in situations where an entire cohort has been exposed, we can compare the cohorts mortality or incidence rates to those in a reference population. In doing so, we normally need to standardise by age, calendar period and possibly sex. Standardised Mortality Ratios (SMRs) provide an indirect standardisation for this.

3.1: Introduction3.2: Introduction3.3: Introduction3.4: Introduction3.5: Introduction3.6: Introduction3.7: Introduction3.8: Introduction3.9: Introduction3.10: Introduction4.1: Adjusting for changing age4.2: Adjusting for changing age4.3: Adjusting for changing age5.1: Adjusting for time-changing confounders with Poisson regression5.2: Adjusting for time-changing confounders with Poisson regression5.3: Adjusting for time-changing confounders with Poisson regression5.4: Adjusting for time-changing confounders with Poisson regression5.5: Adjusting for time-changing confounders with Poisson regression5.6: Adjusting for time-changing confounders with Poisson regression6.1: Testing for Interaction7.1: Adjusting for age and calendar period7.2: Adjusting for age and calendar period8.1: Standardised Mortality Ratios8.2: Standardised Mortality Ratios8.3: Standardised Mortality Ratios8.4: Standardised Mortality Ratios8.5: Standardised Mortality Ratios8.6: Standardised Mortality Ratios8.7: Standardised Mortality Ratios

as04

Documents

period of time

time changing variables

cohort output

prospective studies

occurrence of disease

study materials

epidemiology course

learning cal session