age–period–cohort models of chronic disease rates. ii: graphical approaches

19
STATISTICS IN MEDICINE Statist. Med. 17, 1325 1340 (1998) AGEPERIODCOHORT MODELS OF CHRONIC DISEASE RATES. II: GRAPHICAL APPROACHES CHRIS ROBERTSON* AND PETER BOYLE Division of Epidemiology and Biostatistics, European Institute of Oncology, Via Ripamonti 435, 20141 Milan, Italy SUMMARY In a companion article we have reviewed a number of available modelling approaches employed in estimating the influence of age, period and cohort effects on chronic disease rates. Here we review some of the graphical methods for displaying disease rates with a view to extracting information about the separate and joint effects of age, period and cohort. The more traditional displays such as line charts are compared to approaches based on smoothing and two- and three-dimensional plots which have recently been proposed. Other graphical techniques which are principally concerned with displaying interactions, such as biplots and correspondence analysis, are also considered. These techniques are illustrated with examples to compare the techniques revealing their strengths and weaknesses. It is clear that graphical approaches can be useful tools in understanding the behaviour of chronic disease time trends. ( 1998 John Wiley & Sons, Ltd. 1. INTRODUCTION Most disease mortality and incidence data are published as a two-way table of age specific rates in a sequence of time periods. Generally, there are systematic changes in these rates associated with age, in that for many diseases the rates are higher in older ages. However, of prime importance is the assessment of changes in these rates with time. There are two linked views of time. One is the time at which the disease is recorded, known as the time period, and the second is the year of birth leading to the birth cohort. These two views are linked in that birth cohort"time period!age.1 Changes in the rates with time can arise in a number of ways and for a number of reasons. First, there is the possibility of a time period effect which influences all age groups at the same time. This might arise with improvements in treatment leading to a reduction in mortality rates in all age groups at the same time. Or there may be changes in disease classification or diagnostic techniques, again affecting all age groups at the same period. Secondly there is an effect associated with the birth cohort. This type of effect is common in many cancer sites where long term exposure to a carcinogen is the major cause of the disease, and different cohorts of individuals have different exposure levels throughout their lives. These are two extreme views of time effects in disease rates and ageperiodcohort models have been used to try and separately estimate the joint effects of these three variables.2 This is futile if the objective is the search for the linear trends,3,4 in view of the linear dependency between age, period and cohort, but not for the * Correspondence to: Dr. C. Robertson, Division of Epidemiology and Biostatistics, European Institute of Oncology, Via Ripamonti 435, 20141 Milan, Italy CCC 02776715/98/12132516$17.50 Received April 1996 ( 1998 John Wiley & Sons, Ltd. Accepted October 1997

Upload: chris-robertson

Post on 06-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

STATISTICS IN MEDICINE

Statist. Med. 17, 1325—1340 (1998)

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASERATES. II: GRAPHICAL APPROACHES

CHRIS ROBERTSON* AND PETER BOYLE

Division of Epidemiology and Biostatistics, European Institute of Oncology, Via Ripamonti 435, 20141 Milan, Italy

SUMMARY

In a companion article we have reviewed a number of available modelling approaches employed inestimating the influence of age, period and cohort effects on chronic disease rates. Here we review some ofthe graphical methods for displaying disease rates with a view to extracting information about the separateand joint effects of age, period and cohort. The more traditional displays such as line charts are compared toapproaches based on smoothing and two- and three-dimensional plots which have recently been proposed.Other graphical techniques which are principally concerned with displaying interactions, such as biplots andcorrespondence analysis, are also considered. These techniques are illustrated with examples to compare thetechniques revealing their strengths and weaknesses. It is clear that graphical approaches can be useful toolsin understanding the behaviour of chronic disease time trends. ( 1998 John Wiley & Sons, Ltd.

1. INTRODUCTION

Most disease mortality and incidence data are published as a two-way table of age specific rates ina sequence of time periods. Generally, there are systematic changes in these rates associated withage, in that for many diseases the rates are higher in older ages. However, of prime importance isthe assessment of changes in these rates with time. There are two linked views of time. One is thetime at which the disease is recorded, known as the time period, and the second is the year of birthleading to the birth cohort. These two views are linked in that birth cohort"time period!age.1

Changes in the rates with time can arise in a number of ways and for a number of reasons. First,there is the possibility of a time period effect which influences all age groups at the same time. Thismight arise with improvements in treatment leading to a reduction in mortality rates in all agegroups at the same time. Or there may be changes in disease classification or diagnostictechniques, again affecting all age groups at the same period. Secondly there is an effect associatedwith the birth cohort. This type of effect is common in many cancer sites where long termexposure to a carcinogen is the major cause of the disease, and different cohorts of individualshave different exposure levels throughout their lives. These are two extreme views of time effectsin disease rates and age—period—cohort models have been used to try and separately estimate thejoint effects of these three variables.2 This is futile if the objective is the search for the lineartrends,3,4 in view of the linear dependency between age, period and cohort, but not for the

* Correspondence to: Dr. C. Robertson, Division of Epidemiology and Biostatistics, European Institute of Oncology,Via Ripamonti 435, 20141 Milan, Italy

CCC 0277—6715/98/121325—16$17.50 Received April 1996( 1998 John Wiley & Sons, Ltd. Accepted October 1997

Figure 1. Line chart of breast cancer incidence in Scotland

non-linear trends such as curvatures. An age—period—cohort model is one specific model for theinteraction between age and period.5 Graphical presentations and inspections are a useful way ofvisualizing and assessing any interaction and trends in the rates.

The most widely used graphical presentation is the line chart6 which is illustrated for femalebreast cancer incidence in Scotland from 1960—1989 in Figure 1. In the first plot the rates areplotted on a logarithmic scale against period, with points corresponding to the same age linked.In the second plot, the abscissa is birth cohort. The latter type of plot has been used extensively inthe literature.7 It is a less cluttered display than the plot with period as the abscissa. It would alsobe appropriate to plot age on the abscissa and have separate lines for birth cohort or time period.8These are not used here as the main interest is in any changes in the rates over time not age.

If all the lines in Figure 1 were parallel to the x-axis then there would be no time trends. If thelines for each age group were parallel then there would be no interaction between age and time(however it is viewed). In the example the age-specific lines are neither parallel to the x-axis nor tothemselves. Any interpretations are clearer in the plot against birth cohort as the time span hasbeen stretched. Among the older ages the rates have increased with increasing time and from age47 are reasonably parallel with a slight negative curvature indicating a levelling off of the rates inmore recent periods. Among the younger age groups the increase in rates with time is not as steepand in the youngest two age groups they are fairly flat.

Different interpretations can be obtained with different presentations of the same data. Theabove paragraph is based on the cohort plot. Inspection of the period plot reveals a differentconclusion. First, it is difficult to make any headway with the older age groups as the lines are so

1326 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

Figure 2. Line chart of testicular cancer mortality in England and Wales

close together. The general increase in rates with period can be seen. Among the younger agegroups it looks as if the rates are fairly flat, certainly at age 37, 32 and 27. This is in contrast to theimpression in the cohort plot where increases were apparent in age groups 32 and 37, inparticular. This occurs because of the different scale used to plot the x-axes in the two plots.

Different scales can also be used on the ordinate. In Figure 1 a log scale is used as the range ofrates spans two orders of magnitude. The log scale is also useful to investigate whetherproportional changes in rates in older and younger age groups are similar. In this paper, logscales and linear scales will be used as appropriate. If there is not a great difference in the ratesthen the linear scaling may be used. The logarithmic scaling will serve to high-light differences atlower rates and mask the differences in rates at higher rates compared to the linear scaling.

This example illustrates the care needed in the interpretation of either one of these one-dimensional projections of the two-dimensional table of rates. This is one of the reasons whytwo-dimensional plots should be more useful. These are discussed in Section 2. While much usefulinformation can be obtained from line graphs, a plot for mortality rate of cancer of the testis inEngland and Wales,9 Figure 2, is not so amenable to interpretation. The survival improvedmarkedly and abruptly from the mid-1970s10 and the decrease can be seen in the time period plotof Figure 2. There is also an interaction between age and time in the earlier period, but, as therates are much smaller, the effects of random variation are more considerable in relative terms.

If there is evidence of interaction between age group and time period then it is natural to tryand describe what form this takes. Correspondence analysis and biplots can both be used to

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASE RATES 1327

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

investigate interactions. These techniques have an underlying statistical model and in Section3 examples will be shown to illustrate the potential of these methods and to discuss where theycan complement graphical displays.

There is nothing in the techniques which would permit the partitioning of trends in the rates totime period or birth cohort. Tarone and Chu11 published a non-parametric method of assessingwhether or not the trends in mortality rates could be ascribed to period or to cohort. The resultsof this method can have a graphical display and we demonstrate its use, in Section 4, asa supplement to the interpretation of the other graphs in this paper.

All of the graphs presented here were drawn using S-plus functions. Many other statisticalpackages will have the capability to construct these plots.

2. TWO- AND THREE-DIMENSIONAL PLOTS

Smoothing the rates, perspective plots, image plots and contour plots have been discussed byCislaghi et al.,12 Jolley and Giles8 and Weinkam and Sterling,13 although all have a slightlydifferent slant. Cislaghi et al.12 suggest using image plots of the rates, the predicted rates, anda residual plot, all within the age—cohort plane. Jolley and Giles8 consider that contour plotswithin the age—period plane with cohort lines superimposed are superior to perspective plots.Weinkam and Sterling13 suggest using another variation which they term a ‘level’ plot.

In this section all graphs are based on smoothed rates using cubic interpolation within trianglesof points in the age—period, age—cohort or cohort—period planes, as appropriate, using themethod described in Akima.14 There is a danger with smoothing using statistical models ofoversmoothing and covering up a ‘real’ short term fluctuation. This is not a problem with thissmoothing technique as it does not smooth out local random variation; it just interpolatesbetween existing data points. Consequently the observed rates are plotted. Using linear inter-polation results in spikes at the local peaks and troughs and sharp boundaries whereascubic interpolation yields a smoother picture but still retains the local peaks and troughs.Coleman et al.15 present incidence rates for cancer at many sites using time as a continuousscale where the data are available for blocks of periods, some overlapping and some non-overlapping.

While perspective plots using age and period as the axes are the most obvious it is possible toproject them into the age—cohort and cohort—period axes also. Perspective plots for the testismortality rates are presented in Figure 3. These are certainly a great improvement on Figure 2 asthe trends in the rates with age and with period can now be seen. This is not just a consequence ofthe smoothing arising from the use of cubic interpolation; the same pattern is seen using the rawrates without the cubic interpolation as the smoothing preserves the recorded rates. Amongyoung adults the increase in the mortality rates to a peak in 1980 is evident. In middle age thetrough in the rates appears to have been at a fairly stable level throughout the period, whileamong older men the rates were initially high but have lowered slightly since the peak in 1930.The age—cohort projection is not as useful, as the time axis has been lengthened leading toa squashing of the graph, though the path of a cohort as it ages can be traced.

While perspective plots are attractive plots to look at, portions of the rate surface are oftenobscured by local peaks. This is the case in Figure 3 as the decline in the rates among youngadults in the 1980s is obscured. This can be overcome by looking at the surface from differentpositions. Jolley and Giles8 criticize these types of plots on two grounds. First, it is difficult tolocate a point on the surface and secondly they claim that complex surfaces hide useful

1328 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

Figure 3. Perspective plots for the testis mortality rates: (a) age—period; (b) age—cohort

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

Figure 4. Contour plots for the breast cancer incidence rates in Scotland

information about trends. While these arguments do have some force, the visual impression givenby these plots is marked.

Weinkam and Sterling13 and Jolley and Giles8 both suggest that the use of contour plots in anage—period projection is the best way to see the trends in the rates. The Jolley and Giles8 plot hasage and period at 90 degrees and the dotted lines going from bottom left to top right represent thecohorts. Weinkam and Sterling13 use a plot which is symmetric in the two measures of time.Effectively the y-axis is obtained by a shift in the observed time to the time midway between theyear of incident and the year of birth. Thus the dotted lines from top left to bottom rightcorrespond to the period of incident while the dotted lines from bottom left to top rightcorrespond to the birth cohorts. In both graphs the important criteria are whether or not thecontours are parallel to or at right angles to the time dimensions.

The Weinkam and Sterling13 plot is illustrated in Figure 4 for breast cancer incidence; theJolley and Giles8 plot is very similar. The contours are parallel to the vertical time axis in agesunder 45 especially in the latter periods from 1970 onwards. This suggests that the rates have beenrelatively constant over time, both as period and cohort. There is a suggestion of a period effectassociated with the kink in the contours as they cross the 1970 period line among the younger agegroups, as the contour lines in all age groups shift at the same time. The highest rates are at thetop right of the graphs, and among the older age groups (60#) the contours of constant rate arealmost at right angles to the cohort lines. This means that as successive cohorts reach a fixed agethe more recent cohorts have greater incidence rates. It is tempting to assign these changes tocohort effects as there are no changes to the rates affecting all ages at the same period. The change

1330 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

in the shape of the surface takes place at the ages 45 to 55. There appears to be a short 5 yearperiod in which the contours turn to be approximately parallel to the dotted lines representingcohort. This suggests that as a cohort gets older there is no change in the risk. In terms of the mapreading analogy there is a shoulder to the incidence mountain at ages 45 to 55 with a tantalizinggap in the contours. Extrapolation of the contour lines into the future suggests that the hiatus offive years where rates are level may not really exist in the recent years. A finer grading for thecontour lines would assist as would the presentation of the contour lines in colour through animage enhancement routine. This is essentially the presentations by Cislaghi et al.,12 who used an8 point grey scale.

Image plots for the cervical cancer mortality rates in Birmingham are presented in Plate 1.These data were used by Esteve et al.1 in an illustration of age—period—cohort modelling. Thereare three projections corresponding to the (a) age—period projection, (b) cohort—period projection,and (c) age—cohort projection. The age—cohort projection is derived from the age—period projec-tion by a shift in the period scale to year of birth cohort. The cohort—period projection is derivedfrom the age—period projection by a shift in the age scale to year of birth and so the cohort scale isreversed with the younger cohorts at the left hand end of the axis. The age—period projection is theJolley and Giles8 projection with an image rather than contours, and the age—cohort projectionwas used by Cislaghi.12 The graphs are interpreted by looking for contour lines, corresponding tochanges in colour, which are parallel to one axis and perpendicular to the other.

The dominant feature in the plots in Plate 1 is the strong cohort-based temporal trend which isclearly seen in the high rates associated with those born around 1910 to 1925 and the contourlines in plots (b) and (c) which are perpendicular to the cohort axes. For women born before 1880to 1910 the rates were about 20 to 30 per hundred thousand and did not change too much as thecohort aged. The rates among women born after 1925 were lower than in the preceding cohortsand the low rates among younger women appeared to be staying with them throughout theperiod of observation. In both plots (a) and (b) the future is towards the top and the area of bigconcern which is clear on both plots is the area of relatively higher rates among younger womenin the top left of both plots suggesting that there are some recent cohorts which are possibly athigher risk.

In the level image projections of the testis mortality rates in Plate 2, the clearest pattern is of anemerging band of high mortality rates among young adults in the 1970s, which terminatesabruptly in the 1980s with a period effect which affects all age groups at the same time. The ratesamong older age groups were initially high in the 1930s but appear to be much smaller in the laterperiods. If anything the plot suggests that there were cohort effects in operation here up to the1980s as the contours are consistent with increasing risk at younger ages with successive cohorts.The cohort born in 1920 have rates less than 8 for age 17 to 27 whereupon they rise to 10 to 15until about 45 years of age followed by a fall to the original levels before the end of observationson that cohort. Consideration of the cohort born 20 years later reveals a much steeper rise toa larger peak at 30 years of age. This information cannot be seen so clearly in the line charts; somecan be distinguished in the perspective plots of Figure 3 but the recent drop in mortality is stillsomewhat obscured there.

The techniques of perspective plotting and contour plotting have much to recommend them byway of aiding the interpretation of trends in disease incidence rates. Different projections areuseful in different circumstances and all methods here have advantages over the line charts. It isclear that contour or image plots are more useful for detailed investigation and the image levelplot is a succinct summary.

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASE RATES 1331

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

3. PLOTS FOR THE INTERACTION BETWEEN AGE AND PERIOD

Biplots16 and correspondence analysis17 are both techniques which can be used to display theinteraction between the two classifications in two-way tables. They have been applied to diseasemortality rates by Osmond9 and Grassi and Visentin,18 respectively. Common to both tech-niques is a multiplicative model for the disease rates.

The matrix of rates is denoted by the m]n matrix R and the biplot model can be written asrij"a

1ip1j#a

2ip2j#e

ij, where a

kiand p

kjare parameters representing the effects of age groups

i and time period j, respectively, and eij

is a random error term. This is similar, but not identical, tothe age—period log-linear model which can be written as log(r

ij)"a

i#b

j#e

ij, where a

iand

bjare the additive age and period effects, respectively. The biplot model has additive errors as well

as a second additive term, a2ip2j

, which models the interaction between age and period, whereasthe log-linear model has multiplicative errors. Osmond9 provides details of the estimation of theparameters a

kiand p

kjwhich reduces to a singular value decomposition of the matrix R and a plot

of the first two row and column eigenvectors. The relative sizes of the eigenvalues can be used togauge the adequacy of the biplot model.

If age and period were independent of each other then the multiplicative modelrij"a

1ip1j#e

ijwould provide a good description of the rates and the first eigenvalue would be

large with the remaining ones small. For the cervix cancer data the first eigenvalue is 77 per centof the sum of the eigenvalues and the second 9 per cent. The biplot is shown in Figure 5, withdotted lines connecting the origin, marked with a cross, to the locations of the periods. Theprincipal dimension is along the x-axis where the co-ordinates of the first eigenvalues are plotted.Apart from the interchange of 1960 and 1965 the years are in reverse order of distance from theorigin on this dimension. There is a separation of the age groups into the younger (27 and 32,possibly with 37) against the rest. The interaction information is portrayed in the seconddimension. Bearing in mind the multiplicative form of the a

2ip2j

component of the biplot model, itcan be seen that if both terms are large and have the same sign then this corresponds to anobserved rate which is not well described by the product of an age effect and a period one and isunderestimated by the biplot independence model, a

1ip1j

. If both terms are large but have theopposite signs then this corresponds to an observed rate which overestimated by the indepen-dence model. Thus in Figure 5 the ages and periods most closely associated with the interactionare 42, 47 and 52 in 1965 and 1970 compared to 1980. There is overestimation of the 42—52 ratesin 1980 and underestimation of their rates in 1965 and 1970. There is also underestimation of therates for 27 and 32 year olds and 57—67 year olds in 1980. Essentially these points provideinformation on where the independence model fits least well.

The biplot also has an interpretation via inner products and projections, which may providea more intuitive interpretation.9 The inner product is the distance from the origin and the pointswhich are closer to the origin have the lower fitted rates while those furthest away have the higherrates. Thus the rates are lower in age groups 27 to 37 and higher in 47, 57 and 62. The highestrates are to be found in periods 1965 and 1980. Osmond9 also considered the projection of the agepoints onto the lines connecting the origin with the time periods, and, because of the symmetry inthe biplot, the projection of the period points onto lines connecting the origin to the age groups.This means that the biplot can highlight different shapes to the fitted age distribution in differentperiods and different fitted trends with period in different age groups. Thus in 1965 the order ofthe rates predicted by the model was 27, 32, 37, with a big gap between the latter two indicatingthat the rates increased rapidly. Then 67, 62, 57, 77, 72, 42, 52 and 47. In 1980 there is a big change

1332 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

.

.

Figure 5. Cervical cancer biplot and correspondence plot

in the predicted order of 47, 42, 52, 77, 27, 37, 72, 32, 67, 57 and 62, from lowest to highest. Among32 year old women the lowest rates were in 1965 and the highest in 1980; among 47 year olds thereverse was the case. In summary, the biplot analysis tell us about the different patterns of rates inthe age groups 27—37, 42—52, 57—67 and 72—77.

The correspondence analysis model is similar and details are given in Greenacre19 and Grassiand Visentin.18 If the rate matrix, R, has an associated matrix of fitted values, M, obtained froman appropriate model then correspondence analysis is based on a singular value decomposition ofthe residual matrix QT

3(R!M)Q

#, where Q

3and Q

#are row and column weight vectors,

respectively, whose elements are Jq3i

and Jq#j

. The individual elements of the residual matrixare (r

ij!m

ij)/J(q

3iq#j), which corresponds to the model

rij"m

ij#q2

3iq2#j A

xi1yj1

p1

#

xi2yj2

p2B#e

ij,

assuming only two dimensions. The terms xi1

, xi2

, yj1

, and yj2

are elements of the first two rightand left eigenvectors, respectively, from the singular value decomposition of the residual matrix,and p

1and p

2are the two largest singular values corresponding to these eigenvectors. Corres-

pondence analysis can be thought of as a means of modelling and displaying the interaction ina two-way contingency table and if R is a matrix of counts then m

ij"r

i`r`j

/r``

, where the#denotes summing over the margin. The weights are q

3i"r

i`/r

``and q

#j"r

`j/r

``yielding

the residual matrix which has as elements the X2 components in the usual test for independence.

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASE RATES 1333

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

With a two-way table of rates then the fitted values can be calculated through a Poissonregression model, see Grassi and Visentin.18 This requires the population sizes. If only the ratesare available, then a linear model for the logarithm of the rates can be used to givelog(m

ij)"yN

i.#yN

.j!yN

.., where y

ij"log(r

ij). The same weights as above can be used but Grassi

and Visentin18 discuss others.The relative size of the squares of the singular values can be used to give an indication of the

number of dimensions required. Generally with correspondence analysis only two are used, and,like the biplot, it is the distance between the row and column points which gives an indication ofthe departures from the age—period independence model. Row and column points which are closeto each other and are far away from the origin are positively associated with each other, in thatthe observed rate will be greater than the fitted one, row and column points which are far awayfrom the origin but are in opposite quadrants are negatively associated.

The correspondence plot for the cervical cancer data is also presented in Figure 5. The first twocanonical variates account for 68 per cent and 20 per cent of the residual variance, respectively.There is a clear ordering of the time periods. The horseshoe shape is very common in correspond-ence analysis and other multidimensional procedures when there is an order.20 In terms of theassociations the fitted rates in 1980 for the 47 and 52 age groups are greater than the observedones; the positive associations are between age groups 47 and 52 in 1970, and 67 to 77 in 1960. Onthe first dimension, which is the most important of the two, there is a positive association between1980 and the two youngest age groups. This is the increased risk among younger women alreadyalluded to in the discussion of the biplot. The one feature that the biplot does not reveal is thehigher rates among the oldest age groups in 1960.

Biplots and correspondence analysis serve a similar purpose, namely to provide an interpreta-tion of any non-independence in the rates between age group and time period. If the rates for theage groups are completely independent of the time period the plots will just be a random scatterof points and no valuable interpretation will arise. With biplots, Osmond9 suggests looking at theproportion of variance accounted for by the singular values as a means of gauging the strength ofthe association between age group and time period. With correspondence analysis the adequacyof the age—period model can be assessed, by a formal significance test, for example. As there aretwo dimensions to model the interaction with correspondence analysis it is more flexible but maybe overinterpreted. However, with both of these methods there is no real feeling for systematiccohort effects as compared to the image plots in Section 2.

4. NON-PARAMETRIC COMPARISON

The major difficulty with all of the plots is in apportioning any trends to birth cohort or timeperiod. Tarone and Chu11 proposed a non-parametric method and demonstrated that changes inbreast cancer mortality rates were associated with birth cohort effects. This technique can assistthe interpretation of the graphical techniques.

The method is based on a comparison of rates among individuals in the same age group fromone time period to the next, or one birth cohort to the next. The matrix of rates, R, is rewritten asan (n#m!1)]n matrix where the rates are classified by cohort and period, Table I. Thechanges in the rates over the same age group are then assessed by comparing r

kjwith

r(k`1), (j`1)

and recording 1 if there has been an increase in the rates and 0 if there has beena decrease. This leads to a [(n#m!1)!1]][n!1] matrix of ones and zeros, where each rowis a comparison of cohort k with cohort (k#1) and each column a comparison of period j with

1334 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

Tabel I. Cervix cancer-rates by cohort and period

Cohort Period1960 1965 1970 1975 1980

1883 37)45 — — — —1888 41)59 27)82 — — —1893 34)12 32)55 26)02 — —1898 34)27 30)48 30)08 21)97 —1903 34)20 27)13 23)72 27)78 20)881908 28)46 33)10 32)40 24)60 25)621913 30)21 37)70 36)38 36)39 33)451918 31)45 41)68 39)92 37)38 34)251923 22)58 38)48 40)72 35)96 36)591928 8)44 24)57 27)47 29)83 22)721933 1)58 8)67 16)11 21)52 22)171938 — 2)80 10)80 16)29 21)681943 — — 3)56 13)86 22)791948 — — — 7)03 19)951953 — — — — 13)50

Age groups run along the diagonals from top left to bottom right

Tabel II. Cervix cancer column increases

Comparison of Total Expected Standard t-value Probabilityperiods deviation

1960—1965 6 5)5 1)7 0)3 11965—1970 5 5)5 1)7 !0)3 11970—1975 6 5)5 1)7 0)3 11975—1980 5 5)5 1)7 !0)3 1

( j#1), within the same age group. The number of increases in the columns, Table II, and rows,Table III, are the basis of the test statistics.11

In the comparison of the 11 rates in column 1 of Table I with the 11 rates in column 2 therewere 6 increases of the rates in the later cohorts. This is a diagonal comparison of both period andcohort, keeping age group fixed. If the rates do not change systematically with period or cohortthen the probability of an increase is 0)5 and as there are n age groups there are n comparisons ineach column. Thus the number of decreases in a column will follow a binomial distributionassuming independence. The probabilities in Table II are two-tailed p-values; these suggest that inthe absence of cohort effects there are no significant changes in cervix cancer mortality associatedwith period.

Within a particular row of Table I, there are 1 to 4 comparisons which can be made for any pairof adjacent cohorts. Tarone and Chu11 suggest combining the cohorts in blocks, to obtaina comparison on a larger number of components. In Table III, blocks consisting of two adjacentpairs of cohort comparisons are used. Thus there are three comparisons within the 1883—1893block of cohorts: 37)45 with 27)82; 41)59 with 32)55 and 27)82 with 26)02. The exact distribution of

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASE RATES 1335

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

Tabel III. Cervix cancer cohort block increases

Comparison of Total Expected Standard t-value Probabilitycohorts deviation

1883—1893 0 1)5 0)76 !1)96 0)04951888—1898 0 2)5 0)96 !2)61 0)00901893—1903 0 3)5 1)12 !3)13 0)00171898—1908 2 4)0 1)22 !1)63 0)10251903—1913 6 4)0 1)22 1)63 0)10251908—1918 7 4)0 1)22 2)45 0)01431913—1923 4 4)0 1)22 0)00 1)00001918—1928 2 4)0 1)22 !1)63 0)10251923—1933 2 4)0 1)22 !1)63 0)10251928—1938 5 4)0 1)22 0)82 0)41421933—1943 7 3)5 1)12 3)13 0)00171938—1948 5 2)5 0)96 2)61 0)00901943—1953 3 1)5 0)76 1)96 0)0495

the number of increases in the cohort comparison blocks requires enumeration but the expectedvalue and standard deviation can easily be evaluated.11 The probabilities in Table III, areobtained from a normal approximation, which is not valid in this example as there are too fewcomponents.

The evidence for cervical cancer would appear to lie on the side of cohort-based increases withlocally increasing rates associated with the blocks of cohorts 1908—1918 and 1938—1948, withmore increases in the comparisons than expected, compared to locally decreasing rates in1883—1903 and 1923—1933. These correspond directly to the sloping regions in the cohortprojections in Plate 1.

Tarone and Chu11 used data on a much finer scale and initially had data in two year agegroups and time periods. This gives a larger number of components to the block and columntotals and the distribution of the t-statistics will be approximately normal. A graphical presenta-tion of the statistics used by Tarone and Chu11 is given in Figure 6 for the breast cancer incidencerates, using two year groups for both age and period, and cohort blocks spanning 16 years (that is,7 consecutive pairs of cohorts). The changes in the rates are plotted in the cohort—period plane.Black indicates that there has been an increase in the rates over two adjacent periods or cohortswith age kept constant while grey indicates a decrease. The t-statistics compare the observednumber of increases to the expected. There is little evidence of any systematic year to year changein the rates associated solely with period as all the t-values are between $2, apart from the firstcomparison in 1962 where there are predominantly increases. The t-statistics for cohort are notindependent as the blocks of cohorts overlap. As each block contains 7 pairs of adjacent cohorts,for example 1880 compared to 1882 and 1882 compared to 1884, the block spans 8 cohorts and soevery eighth t-value is independent. The most striking feature of the t-values for the cohorts is thestrong evidence of an increase in rates associated with the blocks of cohorts born in the period1916—1940. The vertical band of increases in the rates at 1920 in the cohort—period plane is highlyvisible prior to 1900, there appeared to be cohort-based increases in the rates in later cohortscompared to earlier ones. After the war in 1945 the rates among later cohorts are less than amongthose born earlier. These graphs certainly suggest that the time trends in breast cancer incidencein Scotland are driven by cohort effects rather than period ones.

1336 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

Figure 6. Tarone plots for breast cancer yearly rates. Breast cancer cohorts in blocks of 15

5. CONCLUSIONS

It is apparent that different graphs highlight different aspects of the pattern of rates and this is tobe expected. The use of the line plots solely may not reveal the pattern and perspective plotscontain all the information and more. Perspective plots require, sometimes, a careful choice oforigin for the perspective.

Contour plots allow more detailed interpretation of the changes in the trends and the imageplots of all three projections are useful. Weinkam and Sterling13 go so far as to claim that theirprojection avoids the pitfalls of age—period—cohort modelling by using a graph in such a mannerthat non-trivial age, period and cohort effects can be easily recognized. The level plots aresymmetric in period and cohort but require judgement to be made along axes lines which are notperpendicular. This is a harder graphical task than comparing points along perpendicular lines(Cleveland21).

A careful graphical analysis can highlight the patterns and interactions in the rates but itcannot partition changes in the rates between period and cohort. The non-parametric analysis ofTarone and Chu11 has some potential in this area but only if there are no systematic changes in

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASE RATES 1337

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

one of the time dimensions. Also this method does not really have much precision or power insmall tables such as the ones discussed here, which are typical of many in the literature.

A graphical inspection of the data should always be a preliminary analysis before routineapplication of any age, period or cohort model estimations. All of the methods discussed can beused as a residual plot based on any model. Cislaghi et al.12 used an image projection in theage—cohort plane but there is no reason why perspective plots or level plots cannot be used toinvestigate systematic departures from a model.

Correspondence plots and biplots are the most difficult to interpret and it is questionable ifthey give any extra information over and above that in the two-dimensional plots. In the examplethey did not really do so. Possibly the biplot is the most useful of the two as it provides directinformation on the changes in the rate with age and period as well as one dimension for anyinteraction, while the correspondence plot is solely concerned with interaction.

Little account has been taken of the random variation in these rates which is generally muchgreater in the younger age groups than in the older ones. This is a hindrance to the fullinterpretation of the rates and is much more of a problem in the testis and cervical cancerexamples where the rates are much lower than in the case of breast cancer. Even there, in the areaof most importance — young women in the recent periods — the rates are small. The variance of therate, r, is given by

var [log(r)]"1

r]P,

where P is the population size.1 If the population figures are available error bars can be addedto the line plots in Figure 1 to convey information about sampling variation, although theoverall result would be quite complicated and ‘busy’. It is not such an easy thing to do withthe two-dimensional plots. The Tarone and Chu11 method does take random variation intoaccount.

In the time plot compared to the cohort plot in Figure 1 and Figure 2, the change in the scale ofthe time axis from a time period to a cohort, which must stretch over a longer time span, serves toaccentuate the vertical movement in the rates. This is a feature which is common to all cohortplots relative to period plots unless the time scale is kept the same. Consequently the local peaksand troughs stand out. Only the level plot of Weinkam and Sterling13 has common scales for thetime factors.

Contour and image plots highlight curvatures in the rates. These are the identifiable para-meters in the age—period—cohort models4 whereas the linear trends are not. Also some interac-tions between age and period which are not cohort-based may be highlighted. The illustrationshere have shown the potential that the different graphical presentations have. It would seem clearthat some two-dimensional plot is to be preferred and while the three image projection has muchto offer, the level plot of Weinkam and Sterling,13 using an image rather than contours, containsthe same information in one single presentation. The interpretation can be supplemented by a runthrough with the Tarone and Chu11 non-parametric technique to add some probability calcu-lations to the interpretation.

ACKNOWLEDGEMENTS

The individual records of breast cancer incidence in Scotland were supplied by the InformationServices Division of the Scottish Cancer Intelligence Unit. The testis mortality data for 1956—1990

1338 C. ROBERTSON AND P. BOYLE

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

were taken from the World Health Organisation Cancer Mortality Database. We are alsograteful for the constructive comments of the referees.

REFERENCES

1. Esteve, J., Benhamou, E. and Raymond, L. Statistical Methods in Cancer Research. »olume I»:Descriptive Epidemiology, IARC Publications, Lyon, 1994.

2. Holford, T. R. ‘Analysing the temporal effects of age, period and cohort’, Statistical Methods in MedicalResearch, 1, 317—337 (1992).

3. Clayton, D. and Schifflers, E. ‘Models for temporal variation in cancer rates. I: Age—period andage—cohort models’, Statistics in Medicine, 6, 449—467 (1987).

4. Clayton, D. and Schifflers, E. ‘Models for temporal variation in cancer rates. II: Age—period—cohortmodels’, Statistics in Medicine, 6, 469—481 (1987).

5. Kupper, L. L., Janis, J. M., Salama, I. A., Yoshizawa, C. N. and Greenberg, B. G. ‘Age-period-cohortanalysis: an illustration of the problems in assessing interaction in one observation per cell data’,Communications in Statistics — ¹heory and Methods, 12, (23), 2279—2807 (1981).

6. MacMahon, B. and Pugh, T. F. Epidemiology: Principles and Methods, Little Brown, Boston MA, 1970.7. La-Vecchia, C., Lucchini, F., Negri, E., Boyle, P., Maisonneuve, P. and Levi, F. ‘Trends of cancer

mortality in Europe, 1955—1989: I, Digestive sites’, European Journal of Cancer, 28, 132—235 (1992).8. Jolley, D. and Giles, G. ‘Visualising age—period—cohort trend surfaces: A synoptic approach’, Interna-

tional Journal of Epidemiology, 21, 178—182 (1992).9. Osmond, C. ‘Biplot models applied to cancer mortality rates’, Applied Statistics, 34, 63—70 (1985).

10. Boyle, P., Kaye, S. B. and Robertson, A. G. ‘Changes in testicular cancer in Scotland’, European Journalof Cancer and Clinical Oncology, 23, 827—830 (1987).

11. Tarone, R. E. and Chu, K. C. ‘Implications of birth cohort patterns in interpreting trends in breastcancer rates’, Journal of the National Cancer Institute, 84, 1402—1410 (1992).

12. Cislaghi, C., Negri, E., La-Vecchia, C. and Levi, F. ‘The application of trend surface models to theanalysis of time factors in Swiss cancer mortality’, Soz-Praventivmed, 33, 359—373 (1988).

13. Weinkam, J. J. and Sterling, T. D. ‘A graphical approach to the interpretation of age—period—cohortdata’, Epidemiology, 2, 133—137 (1991).

14. Akima, H. ‘A method of bivariate interpolation and smooth surface fitting for irregularly distributeddata points’, ACM ¹ransactions on Mathematical Software, 4, 148—164 (1978).

15. Coleman, M., Esteve, J., Damiecki, P., Arslan, A. and Renard, H. ¹rends in Cancer Incidence andMortality, IARC, Lyon, 1993.

16. Gabriel, K. R. ‘The biplot graphic display of matrices with application to principal component analysis’,Biometrika, 58, 453—467 (1971).

17. Greenacre, M. J. ¹heory and Applications of Correspondence Analysis, Academic Press, New York, 1984.18. Grassi, M. and Visentin, S. ‘Correspondence analysis applied to grouped cohort data’, Statistics in

Medicine, 13, 2407—2425 (1994).19. Greenacre, M. J. ‘Correspondence analysis in medical research’, Statistical Methods in Medical Re-

search, 1, 97—117 (1992).20. Everitt, B. and Dunn, D. Applied Multivariate Statistical Methods, Edward Arnold, London, 1993.21. Cleveland, W. S. ¹he Elements of Graphing Data, Wadsworth, Monterey, California, 1985.

AGE—PERIOD—COHORT MODELS OF CHRONIC DISEASE RATES 1339

( 1998 John Wiley & Sons, Ltd. Statist. Med. 17, 1325—1340 (1998)

©1998 John W

iley & Sons, L

td. Statist. M

ed.17(1998)

SIM

854C

YA

N M

AG

EN

TAY

ELLO

W B

LAC

K

Plate 1. Image plots for cervical cancer mortality

3999 for PDF 25/3/98 12:53 pm Page 1

SIM 854CYAN MAGENTA YELLOW BLACK

© 1998 John Wiley & Sons, Ltd. Statist. Med. 17 (1998)

M 854

Plate 2. Level image plots for testis mortality rates

3999 for PDF 25/3/98 12:53 pm Page 2