predicting health program participation: a gravity-based...

25
Predicting health program participation: a gravity-based, hierarchical modelling approach Nicole White Queensland University of Technology, Brisbane, Australia. Cooperative Research Centre for Spatial Information, Melbourne, Australia. Kerrie Mengersen Queensland University of Technology, Brisbane, Australia. Cooperative Research Centre for Spatial Information, Melbourne, Australia. Summary. Statistical analyses of health program participation seek to address a number of objectives compatible with the evaluation of demand for current resources. In this spirit, a spatial hierarchical model is developed for disentangling patterns in participation at the small area level, as a function of population-based demand and additional variation. For the former, a constrained gravity model is proposed to quantify factors associated with spatial choice and account for competition effects, for programs delivered by multiple clinics. The implications of gravity model misspecification within a mixed effects framework are also explored. The pro- posed model is applied to participation data from a no-fee mammography program in Brisbane, Australia. Attention is paid to the interpretation of various model outputs and their relevance for public health policy. Keywords: Bayesian methods, Gravity model, Health services research, Log-linear model, Markov chain Monte Carlo, Random effects 1. Introduction Population-based screening programs play an integral role in public health policy. Designed to promote health awareness and reduce burden of disease, the success of such programs relies on ongoing participation from the community, in part aided by the distribution of screening resources in line with current use and expected demand. It follows that continual assessment of participation is essential and represents a key contribution to decision making. The provision of mammographic screening for the early detection of breast cancer has been a long standing public health initiative in many developed countries. Since its intro- duction in 1991, participation in the ‘BreastScreen Australia’ program has been associated with reductions in breast cancer mortality and treatment-related morbidity (Department of Health and Ageing, 2009). With mounting evidence for the efficacy of mammography in improving outcomes in other countries (Smith et. al, 2012), the need to understand patterns of program participation is apparent to ensure their continued success. Previous statistical analyses in this area have centered on the determination of risk factors affecting participation. Of these factors, studies have consistently inferred relation- ships between participation and access-based measures, including distance travelled to be screened (Hyndman et. al, 2000; Maheswaran et. al, 2006; Brustrom and Hunter, 2001) and the presence of a screening facility in a subject’s region of residence (Elting et. al, 2009).

Upload: others

Post on 08-Jan-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation: a gravity-based,hierarchical modelling approach

Nicole White

Queensland University of Technology, Brisbane, Australia.

Cooperative Research Centre for Spatial Information, Melbourne, Australia.

Kerrie Mengersen

Queensland University of Technology, Brisbane, Australia.

Cooperative Research Centre for Spatial Information, Melbourne, Australia.

Summary. Statistical analyses of health program participation seek to address a number ofobjectives compatible with the evaluation of demand for current resources. In this spirit, aspatial hierarchical model is developed for disentangling patterns in participation at the smallarea level, as a function of population-based demand and additional variation. For the former,a constrained gravity model is proposed to quantify factors associated with spatial choice andaccount for competition effects, for programs delivered by multiple clinics. The implications ofgravity model misspecification within a mixed effects framework are also explored. The pro-posed model is applied to participation data from a no-fee mammography program in Brisbane,Australia. Attention is paid to the interpretation of various model outputs and their relevancefor public health policy.

Keywords: Bayesian methods, Gravity model, Health services research, Log-linear model,Markov chain Monte Carlo, Random effects

1. Introduction

Population-based screening programs play an integral role in public health policy. Designedto promote health awareness and reduce burden of disease, the success of such programsrelies on ongoing participation from the community, in part aided by the distribution ofscreening resources in line with current use and expected demand. It follows that continualassessment of participation is essential and represents a key contribution to decision making.

The provision of mammographic screening for the early detection of breast cancer hasbeen a long standing public health initiative in many developed countries. Since its intro-duction in 1991, participation in the ‘BreastScreen Australia’ program has been associatedwith reductions in breast cancer mortality and treatment-related morbidity (Departmentof Health and Ageing, 2009). With mounting evidence for the efficacy of mammography inimproving outcomes in other countries (Smith et. al, 2012), the need to understand patternsof program participation is apparent to ensure their continued success.

Previous statistical analyses in this area have centered on the determination of riskfactors affecting participation. Of these factors, studies have consistently inferred relation-ships between participation and access-based measures, including distance travelled to bescreened (Hyndman et. al, 2000; Maheswaran et. al, 2006; Brustrom and Hunter, 2001) andthe presence of a screening facility in a subject’s region of residence (Elting et. al, 2009).

Page 2: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

2 N. White et al.

Indicators of socioeconomic status at both the individual and regional level (Zackrisson et.al, 2004, 2007) have also been related to screening attendance.

Data collected under this setting are inherently spatial, capturing information on par-ticipants’ place of residence and the locations of clinics attended. For administrative andpatient confidentiality purposes, residence is commonly only available at the areal level,resulting in data in the form of aggregated numbers of visits to each clinic by areal unit.This data structure presents the opportunity to evaluate the current state of program par-ticipation and provision, in a way that encompasses both the influence of observed riskfactors and additional variation at both the individual clinic and program-wide levels. Forthe decision maker, such an analysis aims to provide a number of valuable insights into un-derlying patterns in participation, including the comparison of the relative use of differentclinics and the assessment of small area variation.

A popular modelling strategy for obtaining inferences of this nature is the spatial Gen-eralised Linear Mixed Model (GLMM). Falling under the broad class of hierarchical mod-els, spatial GLMMs provide a flexible framework for partitioning variation across multiplesources, by virtue of their accommodation of fixed and random effects. In health, spatialGLMMs are commonplace in disease mapping (see, for example, Pascutto et. al (2000),Best et. al (2005) and Earnest et. al (2013)), with the objective being the decomposition ofspatial trends in health outcomes into components of observable and unexplained or excessrisk, as described by the fixed and random effects, respectively.

For programs delivered by multiple clinics, the analysis of participation data must takeinto account the influence of factors associated with spatial choice. That is, posited modelsshould adequately describe the movement of patients from their places of residence (origins)to their respective chosen clinics (destinations). As the fixed effects component of a GLMM,due consideration must therefore be given to the inclusion of spatial choice factors, as theymay impact upon the estimation of random effects. These random effects reflect residualvariation in participation rates that is potentially of interest to the policy maker. For thisreason, their estimation needs to be carried out with some care.

Similar studies in health services research have utilised gravity models (Bailey and Gat-trell, 1995) for the prediction of patient movement. Based on Newton’s Law of Gravitation,gravity models characterise the influence of origin and destination-based factors, and theirinteraction, on spatial choice. For this reason, they are a popular tool in flow modelling,with examples found in migration (Congdon, 2010), biosecurity (Stanaway et. al, 2011), dis-ease mapping (Dreassi and Biggeri, 2003) and Geographic Information Systems (GIS) (Luoand Wang, 2003; Schuurman et. al, 2010). In health services research, key studies includeCongdon (2000, 2001) and Congdon and Best (2000). The application of gravity models inthese studies focused on quantifying spatial interaction with individual destinations. Dis-cussed further in Section 3, it is argued that these models are potentially misspecified asthey are unable to account for the effects of competition, and their impact on spatial choice.For the purposes of exposition, competition refers to the inverse relationship between anorigin’s propensity to access a single destination and its proximity to alternate destinations.Failure to account for this behaviour, if present, has implications for model estimation.

This paper considers a GLMM for understanding patterns in mammography participa-tion by individual clinic and region, the latter defined as place of residence at the timeof screening. Extending previous research, an origin-constrained gravity model (Fother-ingham, 1981) is proposed for the fixed effects component to account for competition. Toprovide a comparison to other formulations, the consequences of gravity model misspec-ification with respect to competition are also explored and a graphical model diagnostic

Page 3: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 3

Table 1. Summary of clinics available in Brisbane for the period January 2007 -December 2008. Summary statistics are in the form of time each clinic was availablein days and the median and interquartile range (IQR) of visits per day of availability

Clinic 1 2 3 4 5

Time available (days), Ti 480 569 101 119 351Median visits per day, Mi 54 52 39 40 30

IQR (visits per day) (39, 70) (41, 67) (37, 41) (34, 42) (18, 39)

Clinic 6 7 8 9 10

Time available (days), Ti 418 391 82 331 16Median visits per day, Mi 23 21 33 22 14

IQR (visits per day) (15, 24) (15, 24) (23, 41) (20, 24) (12, 14)

is developed. Explicitly, it is shown that model misspecification in this regard affects theestimation of areal level random effects.

The remainder of this paper is organised as follows. Section 2 introduces the motivatingcase study and outlines the objectives of analysis, with an emphasis on implications forhealth policy. In Section 3, the proposed methodology is outlined. An examination of thegravity model and previous approaches to modelling health services data are included tomotivate the development of the proposed model. Results from the analysis of the case studyare summarised in Section 4. As part of this analysis, the proposed model is compared toa gravity-based GLMM previously applied in the literature in terms of goodness of fit anddifferences in inferences drawn of relevance to policy. A discussion of results and directionsfor future work is presented in Section 5.

2. Case study: participation in breast cancer screening in Brisbane, Australia

The motivating case study considers data collected for a government-funded mammogra-phy program in Brisbane, Australia, from January 2007 to December 2008. The programoffered no-fee, biennial mammograms to Queensland women over forty years of age and wasdelivered via a combination of fixed and mobile screening sites. During the two-year period,ten clinics were assessed, with their geographic locations depicted in Figure 1. Informationon the availability of each clinic is provided in Table 1.

Data were collected on each individual screening visit or episode, including the clinicvisited and the patient’s place of residence at the time of screening, coded by StatisticalLocal Area (SLA) (Australian Bureau of Statistics, 2006). For the purposes of this study,sets of contiguous SLAs were combined to represent 43 suburban areas, henceforth referredto as regions. Visits were then aggregated for each clinic and region combination. It isnoted that the islands visible in Figure 1 were not defined by their own region(s). Rather,consistent with their original SLA definition, they were included as part of the geographicallyclosest region on the mainland.

Distances to each clinic were measured using the straight-line geographic distance be-tween the coordinates of each clinic and the population weighted centroid of each region.While the discussion of various measures of accessibility is left to other authors (see Higgs(2004)), geographic distance was chosen as it bypassed potential issues such as mode oftransportation and peak versus off-peak travel, as influencing factors of true travel time(Hyndman and Holman, 2000). A caveat of this choice is, given the distance between theincluded islands and the mainland, geographic distance may not accurately reflect the whole

Page 4: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

4 N. White et al.

1

2

3

45

6

7

8

910

0 5 10 km

Fig. 1. Map of Brisbane divided into areal units, based on the aggregation of Statistical Local Areas(SLAs). Superimposed are the locations of available clinics during the defined study period

Page 5: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 5

population within this region. For this reason, inferences drawn for this region should betreated with care.

Analysis of these data sought to address three objectives relevant to the assessment ofpublic health initiatives, to facilitate decisions on their management. The first objectivewas to determine which spatial factors were predictive for the movement of eligible patientpopulations from each region to available clinics. Secondly, averaging over clinics, there wassubstantial interest in the description of small area variation in participation, as a way ofidentifying potential regions for future program promotion. Thirdly, variation between clin-ics was also of interest, with differences illustrative of relative under and over-participation.A hierarchical model was developed in response to these objectives, with details providedin the next Section.

3. Methodology

This Section begins with an introduction to the gravity model for the prediction of patientmovement. Explicitly, a constrained version of the gravity model is considered, to accountfor competition effects. As highlighted in Section 1, the proposed methodology seeks toincorporate gravity modelling principles into the GLMM framework, through their inclusionas fixed effects. A complete description of the GLMM is provided in Section 3.2 andincludes rationale for the specification of random effects, in response to the second andthird objectives defined in Section 2. This Section concludes with a discussion of previousgravity-based approaches to health services modelling and a description of techniques formodel assessment.

3.1. The gravity modelThe analysis of health program participation can be recast in terms of modelling spatialchoice, with patient movement in part influenced by clinic demand, supply and proximityto places of residence. The gravity model is representative of a class of models suited to thispurpose, that seek to predict patterns of movement from origins (regions) to destinations(clinics). An application of Newton’s Law of Gravitation, the gravity model describesmovement in terms of the attractiveness between a select origin and destination, which isassumed to be proportional to the product of their masses and inversely proportional to thedistance between them.

For a movement of interest between origin j and destination i, denoted by Zij , thesimplest form of the gravity model is,

Zij = CiPjf (dij ;θ) , (1)

where Pj and Ci represent observed origin and destination-based factors, respectively, andf (dij ;θ) describes the interaction between i and j, as a decreasing function of distance, dij ,parameterised by unknown parameter(s) θ.

In its current form, Equation (1) predicts movement from each origin to destinationindependently. For this reason, Equation (1) is henceforth referred to as the unconstrainedgravity model. A consequence of this specification is that an origin’s concurrent relation-ships with competing destinations, and their impact on Zij , are ignored. Early work oncompetition modelling was published by Fotheringham (1981, 1983a,b), leading to the pro-posal of the origin-constrained gravity model. Building on Equation (1), this variant is

Page 6: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

6 N. White et al.

given by,

Z(c)ij =

PjCif (dij ;θ)

Aj

Aj =L∑

k=1

Ckf (dkj ;θ) (2)

where Z(c)ij denotes a constrained movement and Aj =

∑Lk=1 Ckf (dkj ;θ) is defined as the

accessibility of origin j to all L destinations. It is noted that Aj depends on both dij andother parameters, however for simplicity this dependence is suppressed in our notation. Theinclusion of Aj in the denominator implies that predicted movement between i and j willdecrease as the origin’s proximity to alternative destinations increases. As a constraint,Equation (2) can also be viewed as a model for distributing each origin-specific mass acrossdestinations. That is, given Pj , the terms C1f (d1j ;θ) /Aj , . . . , CLf (dLj ;θ) /Aj are seenas weights such that the total predicted movement from origin j is equal to the sum ofmovements to each destination; i.e.

∑Li=1 Zij = Pj .

In the next Section, the components of Equation (2) are incorporated as fixed effects intoa GLMM for the prediction of participation rates. As noted in Section 1, the specificationof the gravity model in this framework has implications for the estimation of other modelparameters. This is explored in more detail in Section 3.4.1, as part of a discussion oftechniques for model assessment.

3.2. Generalised Linear Mixed ModelFor a single clinic i (i = 1, . . . , L) and region j (j = 1, . . . , n), let yij denote the observednumber of visits by women to clinic i that resided in region j at the time of screening. Thesecounts are assumed to follow a Poisson distribution,

yij |µij ∼ Poisson (µij)

log (µij) = log (ti) + β0 + log(

Z(c)ij

)

+ log (λij) (3)

such that the Poisson mean, µij , consists of a prediction under the constrained model, Z(c)ij ,

as defined in Equation (2) and a relative excess rate, λij . The offset, ti = Ti/(2 × 365),was included given differences in service availability noted in Table 1. For service i, Ti

denoted the total number of available days over the two year period, such that ti ∈ (0, 1)was the proportion of time service i was available over the two year period. An overallconstant, β0, was also included to calibrate the model and was assigned a non-informativeprior distribution of the form p (β0) ∼ N (0, 100).

It is noted that Equation (3) could be easily expanded to include a vector of p fixed

effects xTijβ, where xij = [xij1, . . . , xijp]

Tare observed covariates and β = [β1, . . . , βp]

Tare

the corresponding regression coefficients. However, initial analyses indicated that availablecovariates were not predictive of log (µij) (see Section 4 for more details).

The gravity model component of Equation (3) requires choices for the masses Ci and Pj ,in addition to the distance-based function f (dij ;θ). For the analysis presented in Section 4,Pj was substituted with the eligible population in region j and therefore acted as a Poissonoffset to account for varying population sizes. For Ci, this measurement can be chosento reflect the capacity of each clinic, for example, staffing, or more generally, the relative

Page 7: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 7

Table 2. Summary of different choices for f (dij ;θ), including prior distributions forkey parametersFunction Form of f (dij ;θ) Parameter/s Prior distribution/s

F1 exp (−κdij) θ = [κ] log (κ) ∼ N (0, 1000)κ > 0

F2 (1 + dij)−α θ = [α] log (α) ∼ N (0, 1000)

α > 0F3 (1 + dij)

κ1 exp (−κ2dij) θ = [κ1, κ2] κ1 ∼ U(−10, κ2)κ1 < κ2, κ2 > 0 κ2 ∼ U(0, 10)

F4 (1 + α2 + dij)−α1 θ = [α1, α2] α1 ∼ U (0, 10)

α1 > 0, 0 < δ < 1 δ ∼ U (0, 1)α2 = δα1 > 0

attractiveness of each clinic, compared to others available. For the case study introduced inSection 2, no direct measures of clinic capacity were available. To approximate this, Ci wasestimated using the competing destinations model by Fotheringham (1983) and is definedas follows. If d∗im is the distance between clinics i and m and Mi is a data-based measureof capacity, then,

Ci =Mi

m 6=i Mm/d∗im(4)

where, for clinic i, Mi was equal to the median number of visits per day of availability (Table1). It follows that, for equal Mi, Equation (4) predicts that the attractiveness of clinic idecreases as its proximity to alternative clinics increases. Conversely, isolated clinics areperceived to be more attractive due to a lack of 67alternatives nearby. Given its constructionbased on observed data, the unknown parameter β1 was introduced. In log-linear form, thegravity model component of Equation (3) is thus given by,

log(

Z(c)ij

)

= log (Pj) + β1 log (Ci) + log f (dij ;θ)− log

K∑

k=1

{

Cβ1

k f (dkj ;θ)}

.

To reflect the assumption that Z(c)ij and Ci are positively associated, a U (0, 100) prior

distribution was chosen for β1.The choice of distance-based function f (dij ;θ) is likened to the representation of risk in

relation to a point source, a concept commonly encountered in environmental epidemiology.Early works in this area include Lawson (1993), Diggle et. al (1997) and Wakefield andMorris (2001). In this paper, four choices for f(dlj ;θ) were considered, with a short sum-mary provided in Table 2. Each function was chosen such that it was strictly non-negative,monotonically decreasing and finite at dij = 0. These properties are in line with the in-verse relationship between distance and movement predicted under a gravity model. In theanalysis presented in Section 4, the choice of f(dij ;θ) was determined using goodness of fitdiagnostics, outlined in Section 3.4.2.

The first two functions in Table 2, F1 and F2, represent the exponential decay andinverse power functions, respectively. In contrast, functions F3 and F4 are more flexible,with the former allowing for a compromise between F1 and F2. Similarly, function F4 can beviewed as a generalised form of F2, achieved with the inclusion of an additional parameterα2 to ‘flatten’ or reduce the rate of distance decay at shorter distances. Justification for thechoice of constraint on the parameters of F4 is provided in the Supplementary Material.

Page 8: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

8 N. White et al.

Other, more complex, functions not considered here include origin-modified exponentialdecay (Diggle, 1990; Wakefield and Morris, 2001), step functions (Diggle et. al, 1997), anddirectional effects (Lawson, 1993). For some of these choices, however, their use may notalways be conducive to estimation, by virtue of the lack of information in the likelihoodfunction (Diggle et. al, 1997).

A combination of random effects were used to describe each λij in Equation (3). Foranalyses presented in Section 4, λij was specified by,

log (λij) = vi + sj + ǫij . (5)

The inclusion of random effects was motivated by the desire to infer the extent to whichexcess variation in rates was attributable to different sources. By including an overall in-tercept in the Poisson mean (β0) in Equation (3), estimates of the random effects bear theinterpretation of excess participation compared to the overall log relative rate. At the cliniclevel, exp (vi) represented excess participation assigned to clinic i, and was included to quan-tify excess use or demand. Similarly, the exponential of region-level effect, sj , representedthe excess participation attributable to each region. For the policy maker, this term wasof interest as it allowed for the assessment of small area variation and, subsequently, theidentification of regions corresponding to relative excess over-participation (exp (sj > 1))versus under-participation (exp (sj < 1)). Finally, the residual ǫij was included to accountfor any remaining Poisson overdispersion.

An intrinsic conditional autoregressive (CAR) prior (Besag et. al, 1991) was adoptedfor the vector of region-level random effects, to promote the sharing of information betweenneighbouring regions. The prior is defined by the set of marginal distributions,

sj |s(−j), σ2s ∼ N

(∑nk=1 mjksk

∑nk′=1 mjk′

,σ2s

∑nk′=1 mjk′

)

mjk =

{

1 if k and j are neighbours, k 6= j0 otherwise

(6)

for j = 1, . . . , n with (−j) denoting all regions excluding j. The indicator variable, mkj ,is included such that, for a given j, the prior expectation of sj is equal to the average ofits neighbours, therefore encoding a priori spatial smoothing. For the case study, first-order neighbours were assumed whereby, for a single region, its neighbours were defined asthe subset of regions that shared a common geographical boundary. For a more generaldiscussion of neighbourhood definitions, the reader is directed to Earnest et. al (2007).

The remaining random effects were assumed independent and identically distributed,with the following prior distributions

vi|σ2v ∼ N

(

0, σ2v

)

ǫij |σ2 ∼ N

(

0, σ2)

.

The three variance components associated with each set of random effects were assumedunknown, with posterior estimates providing insight into the partitioning of excess variationover different sources. Uncertainty in these parameters was modelled using non-informative,U (0, 100) prior distributions, defined on the standard deviation scale. For the case study, asensitivity analysis was conducted on select prior distributions, discussed further in Section4.4.

Page 9: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 9

Substituting Equations (5) and (5) into Equation (3), it is noted that the resultingGLMM is overparameterised, with consequences for identifiability of individual parameters.To improve convergence of key parameters, it is common to work with a hierarchically cen-tered form of the GLMM (Gelfand et. al, 1996). Given the prior distribution specified forǫij , whose estimate is not of direct interest, this centering is achieved by reparameterising the

Poisson mean in terms of the linear predictor; log (µij) ∼ N(

log (ti) + β0 + log(

Z(c)ij

)

+ vi + sj , σ2)

.

It is noted that there are other possibilities with respect to the defined random effectsspecification, that are not considered in this paper. In terms of spatial smoothing, onecould replace of sj+ǫij with a single interaction effect uij , to allow for clinic-specific spatialsmoothing. This could be achieved by the specification of an intrinsic CAR prior for eachui = (ui1, . . . , uin)

T, to be used in cases where neighbouring regions are believed to share

a similar propensity for using the same clinic. However, there were concerns regarding theimplementation of this alternative model. Aside from the loss of a region-level summaryof excess variation (s), of particular concern was the issue of spatial confounding (Hodgesand Reich, 2010), describing the presence of collinearity between residuals and fixed effectswith the same spatial structure. Given the implied spatial smoothing on µij , there wasconcern that variation in the Poisson mean would be spuriously assigned to uij , when it

may be readily explained by effects contained within Z(c)ij , namely f (dij ;θ), which itself

acts as a spatial smoother. In cases where the relationship with distance is thought to varywith clinic, f (dij ;θ) can be replaced by f (dij ;θi), however preliminary analysis of the casestudy data did not support this assumption.

3.3. Existing gravity-based models in health services researchThe ability to write the gravity model in log-linear form has led to its use in a numberof applications, as a predictive model for µij . In this Section, previous applications of thegravity model in health services research are discussed, to differentiate previous work fromthe proposed methodology.

For the analysis of hospital admissions data, Congdon (2000, 2001) proposed a Poissonregression that incorporated a gravity model with non-proportional effects. The baselinemodel was written as,

µij = exp (β0)Ca1i P a2

j f (dij ;θ) .

In Congdon and Best (2000), a similar model was considered but extended to also includerandom effects, similar to those discussed in Section 3.2. Excluding the balancing constant,the above model aligns with the unconstrained gravity model in Equation (1) with originand destination factors given by Ca1

i and P a2j , respectively. It is noted that, by allowing a

non-proportional effect for Pj , population density no longer acts as an offset since there isnow an associated coefficient, a2, that must be estimated.

An alternative proposal by Congdon and Best (2000) borrowed from the disease mappingliterature to develop a GLMM for the prediction of the Poisson mean. In this case, thePoisson mean comprised of an “expected count” and a relative rate, defined in Equation(7),

µij = eij exp (β0)λij

eij =Ci

k∈KjCk

Ej (7)

Page 10: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

10 N. White et al.

where Ej = rPj and r is a baseline participation rate, calculated using the observed dataor external data source. The corresponding relative rate was defined similarly to Equation(5), however differed by the inclusion of f (dij ;θ) to retain the main features of the gravitymodel.

Here, eij represents the expected or anticipated number of visits from j to i, as aproportion of the eligible population. In its denominator, Kj defines the set of clinicsassociated with region j; that is, the set of clinic catchments that contain region j. In theabsence of catchment definitions, one could approximate each Kj by only including clinicsthat account for a fixed percentage, say 5%, of total visits observed from region j. Thisapproach may not be preferred as it represents a form of internal standardisation, wherebythe observed data were used to calibrate the distribution of a priori patterns of participation.This approach also implies eij = 0 ∀i /∈ Kj , therefore restricting the analysis of observeddata to {yij : i ∈ Kj} only. For these reasons, one may instead set Kj = (1, . . . , L) ∀j toreflect a lack of knowledge about regional preferences for different clinics a priori.

In light of the above argument, Equation (7) is a GLMM with an unconstrained gravity

model as fixed effects. Comparing this with the GLMM defined Equation (5), Z(c)ij is

replaced with Zij from Equation (1) and the global intercept is equal to β0 + log (r) −

log(

∑Lk=1 Ck

)

.

3.4. Model assessment3.4.1. Evaluating the consequences of gravity model misspecification

The impact of misspecifying the gravity model component of the GLMM is now consideredin more detail. It is first noted that, given the composition of the Poisson mean in termsof fixed and random effects, the estimation of µij is unlikely to be sensitive to the precisechoice of gravity model. This is due to any effects not captured by the gravity modelbeing absorbed into the residual. This observation motivates the development of a heuristicapproach to show that misspecification with respect to omitting competition affects theregion-level random effects, s1, . . . , sn. The argument developed in this Section builds uponresults from Fotheringham (1983b), where the effects of misspecification were examined formovements predicted by the gravity model only. The following approach is therefore anextension of this work into a GLMM framework.

Consider two candidate models for the prediction of the Poisson mean, involving the un-constrained and constrained gravity model, respectively. Without loss of generality, propor-tional effects for Ci are assumed under each model, with implications for non-proportional

effects discussed at the end of this Section. Denoting the respective Poisson means by µ(u)ij

and µ(c)ij leads to the following expressions for the Poisson mean,

µ(u)ij = ti exp

(

β(u)0

)

CiPjf(

dij ;θ(u))

λ(u)ij (8)

µ(c)ij =

ti exp(

β(c)0

)

CiPjf(

dij ;θ(c))

A(c)j

λ(c)ij (9)

with all parameters as defined in Section 3.2 and A(c)j =

∑Lk=1 Ckf

(

dkj ;θ(c))

. To exam-

ine the impact of including competition effects, the first derivative with respect to dij isevaluated for both Equations (8) and (9). These derivatives are,

Page 11: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 11

∂µ(u)ij

∂dij= ti exp

(

β(u)0

)

CiPjf′(

dij ;θ(u))

λ(u)ij (10)

∂µ(c)ij

∂dij=

ti exp(

β(c)0

)

CiPjf′(

dij ;θ(c))

A(c)j

A(c)j − Cif

(

d(c)ij ;θ

)

A(c)j

λ(c)ij . (11)

Under the same choice of f (dij ;θ) and assuming differences in(

θ(u),θ(c))

are small such

thatf ′(dij ;θ

(c))f ′(dij ;θ(u))

≈ 1, Equation (11) becomes,

∂µ(c)ij

∂dij≈ exp

(

β(c)0 − β

(u)0

) ti exp(

β(u)0

)

CiPjf′(

dij ;θ(u))

A(c)j

1−Cif

(

d(c)ij ;θ

)

A(c)j

λ(c)ij .

Next, assume that L is large and that A(c)j is not dominated by contributions from a small

proportion of clinics. In this case,{

1− Cif(

d(c)ij ;θ

)

/A(c)j

}

will be roughly equal to 1 and

we can write,

∂µ(c)ij

∂dij≈ exp

(

β(c)0 − β

(u)0

) ti exp(

β(u)0

)

CiPjf′(

dij ;θ(u))

A(c)j

λ(c)ij . (12)

Excluding the constant of proportionality, exp(

β(c)0 − β

(u)0

)

, a comparison of Equation (12)

with the derivative obtained under the unconstrained model (Equation (10)) shows these

expressions to be approximately equivalent upon setting λ(u)ij = λ

(c)ij /A

(c)j . Alternatively, on

the logarithmic scale, log(

λ(u)ij

)

= log(

λ(c)ij

)

− log(

A(c)j

)

.

This result may be used to provide insight into potential gravity model misspecificationwith respect to competition. In the current case, for data generated by a constrained gravitymodel, Equation (12) suggests that fitting an unconstrained gravity model will lead to excessvariation being absorbed by λij , in particular, sj . Given the definition of accessibility at theareal level, one expects a negative, approximately linear, relationship between sj , estimated

under the unconstrained model, and log(

A(c)j

)

. A similar result can also be obtained for

data generated by an unconstrained gravity model: in this case however, incorrectly fittinga constrained gravity model will lead to a positive slope. Given this result, evidence ofgravity model misspecification is explored graphically in Section 4, by constructing plots

of sj against log(

A(c)j

)

for each model fitted. For the case of gravity models with non-

proportional effects for Ci, the above result holds assuming the fixed effects are similarunder each gravity model. For cases that result in different fixed effects, such as the GLMMdefined in Section 3.2 (Equation (5)), this will impact on the magnitude of each clinic-levelrandom effect v1, . . . , vL. For completeness, estimates of each vi are also discussed in Section4.

Page 12: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

12 N. White et al.

3.4.2. Other goodness of fit measures

Models were also compared with respect to predictive performance, that is, the ability togenerate replicate observations, yrepij , that are consistent with the observed data, or featuresthereof. These replicate observations can be drawn from the model’s posterior predictivedistribution, p

(

yrepij |y)

=∫

Θp(

yrepij |Θ)

p (Θ|y) dΘ, with Θ representing the full set of modelparameters. Given the dependence of draws from p (Θ|y) on the observed data, includingyij , it has been suggested that this approach to model assessment is conservative (Sternand Cressie, 2000), given the influence of yij on the posterior distribution for Θ, in turninfluencing yrepij . Instead, a cross-validatory posterior predictive approach may be preferred,whereby the prediction of yrepij is not directly influenced by its observed value.

An easy-to-implement method, described here, is a mixed predictive approach that wasfirst discussed by Gelman et. al (1996) and applied by Marshall and Spiegelhalter (2003).In the present context, this approach is motivated by noting that one way of sampling fromthe aforementioned posterior predictive distribution would be to first sample each Poissonmean, µij , from its full conditional distribution, p

(

µij |Θ−µij,y)

, where Θ−µijdenotes the

set of parameters (and hyperparameters) excluding µij . Conditional on each sampled µij , areplicate observation yrepij is then drawn from Poisson (µij). The mixed predictive approachreplaces the first of these steps and proceeds as follows. Instead of sampling each µij fromits full conditional distribution, an independent replicate µrep

ij is drawn from p(

µij |Θ−µij

)

which, given the model presented in Section 3.2, corresponds to sampling log(

µrepij

)

from

N(

log (ti) + β0 + log(

Z(c)ij

)

+ sj + vi, σ2)

. In the second step, yrepij is sampled from a

Poisson distribution with mean µrepij in place of µij . By omitting the conditioning on the

observed data in the first step, the aforementioned influence of yij on yrepij is reduced. It

is acknowledged that the parameters(

β0, sj , vi, σ2)

, along with parameters contained in

Z(c)ij , are still sampled from the posterior distribution and thus remain dependent on y.

Nonetheless, this dependence is indirect as the full conditional distribution for each of theseparameters does not involve y. An application of this method for a similar model is outlinedin Riebler and Held (2010).

Given a sequence of T draws for each yrepij , their concordance with the observed data wasassessed using three goodness of fit measures suited to count data. The first two criteria,from Czado et. al (2009), are the mean Ranked Probability Score (RPS),

RPS =1

nL

i,j

1

T

T∑

t=1

|yrep(t)ij − yij | −

1

T

T/2∑

t=1

|yrep(t)ij − y

rep(t+T/2)ij |

,

where yrep(t)ij is the tth replicate of yij , and the mean Dawid-Sebastiani Score (DSS),

DSS =1

nL

i,j

[

yij − E(yrepij |y)

V ar(yrepij |y)+ 2 log (V ar(yrepij |y))

]

.

A recent application of these criteria for model choice can be found in Riebler and Held(2010). Smaller values of RPS and DSS indicate a relative improved model fit.

The third criterion is a model discrepancy measure based on the Poisson deviance ad-justed for zero counts (Waller et. al, 1997; Gelfand ang Ghosh, 1998). An extension ofwork by Laud and Ibrahim (1995) for linear models, this function represents the expected

Page 13: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 13

posterior discrepancy between observed and replicated datasets,

D(

y,yrep(t))

=1

T

T∑

t=1

d(

yrep(t)ij , yij |yij

)

d(

yrep(t)ij , yij |yij

)

= 2∑

i,j

(yij + 0.5) log

(

yij + 0.5

yrep(t)ij + 0.5

)

− (yij − yrep(t)ij ). (13)

For a given replicate, the discrepancy is the Poisson deviance assuming E[

Y repij

]

= yij .

To account for zero counts, 0.5 is added to yrep(t)ij and yij for all i and j. This continuity

correction is commonly applied in practice since, similar to many goodness of fit criteria,models are compared in terms of relative fit. In Section 4, Equation (13) is referred to asthe Expected Predictive Deviance (EPD).

4. Results: Screening patterns for no-fee mammography

A total of eight models were fitted to the data described in Section 2. Models were char-acterised by (i) the use of a constrained versus unconstrained gravity model and (ii) thechoice of accessibility function. The unconstrained gravity models fitted resemble the ap-proach proposed by Congdon and Best (2000) (see Section 3.3). For each choice of gravitymodel, a non-proportional effect (β1) was assumed for Ci, given its calculation based onobserved data. All models were fitted using WinBUGS (Spiegelhalter et. al, 1996) withresults presented based on 50,000 MCMC iterations following a burnin period of 50,000iterations. Accompanying WinBUGS code is provided in the Supplementary Material. Asindicated in Section 3, the prior distribution on ǫij enabled the use of hierarchical centering(Papaspiliopoulos et. al, 2007) and was adopted to improve MCMC convergence of keyparameters. Identifiability of the random effects s and v was achieved through the use ofsum-to-zero constraints, applied at the end of each MCMC iteration.

In a preliminary analysis, socioeconomic status was included as a covariate in light ofits influence documented in previous studies of no-fee mammography (Zackrisson et. al,2004, 2007). However, it did not appear to be predictive of relative rates and was thereforeexcluded from the models presented here.

4.1. Estimation of fixed effects and the assignment of uncertaintyPosterior means and 95% credible intervals (CIs) for fixed effects associated with each choiceof f (dij ;θ) and the overall intercept β0 are provided in Table 3. Overall, posterior estimatesof θ were similar when comparing constrained and unconstrained models. An exception wasthe estimation of κ1 in the constrained model with function F3, that suggested only theexponential decay term in was required. Differences in the overall intercept, β0, and fixedeffect associated with Ci, β1 were observed. This was in part expected given constraints onthe random effects.

For each model, Table 3 also summarises the percentage of total residual variationassigned to each of the three variance components, by comparing the posterior means ofσ2, σ2

v and σ2s . Across all fitted models, the percentage of total variation was smallest for

σ2v , which suggested that excess participation was least attributable to differences between

clinics. For accessibility functions F1 and F3, there was a reduction in the percentage of

Page 14: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

14 N. White et al.

Table 3. Posterior means and 95% credible intervals (CIs) for fixed effects includedin each fitted model. Model were differentiated by the choice of gravity model (con-strained/unconstrained) and choice of f (dij ;θ) (F1-F4).Gravity model f (dij ;θ) Parameter Posterior estimate % of total variation

Mean 95% CI(

σ2, σ2v, σ

2s

)

Unconstrained F1 κ 0.37 (0.35,0.39) (29, 25, 46)β0 -0.56 (-0.85,-0.28)β1 0.71 (0.09,1.77)

F2 α 3.45 (3.23,3.66) (41, 26, 33)β0 3.39 (2.87,3.91)β1 0.58 (0.06,1.38)

F3 κ1 -1.07 (-1.69,-0.60) (30, 24, 46)κ2 0.27 (0.20,0.32)β0 0.79 (0.13,1.58)β1 0.62 (0.09,1.36)

F4 α1 6.60 (5.93,7.35) (32, 24, 44)α2 6.36 (5.27,7.26)β0 13.96 (9.35,17.12)β1 0.64 (0.11,1.30)

Constrained F1 κ 0.37 (0.35,0.39) (39, 30, 31)β0 -1.09 (-1.27,-0.91)β1 0.34 (0.02,0.81)

F2 α 3.31 (3.08,3.53) (23, 15, 62)β0 -0.92 (-1.15,-0.69)β1 0.35 (0.02,0.96)

F3 κ1 -0.52 (-1.07, 0.01) (36, 29, 35)κ2 0.32 (0.26, 0.37)β0 -1.02 (-1.22,-0.83)β1 0.31 (0.02,0.80)

F4 α1 6.42 (5.75,7.08) (30, 22, 47)α2 6.23 (5.27,7.01)β0 -0.84 (-1.02,-0.64)β1 0.31 (0.01,0.78)

Page 15: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 15

Table 4. Selected goodness of fit criteria for each model, de-scribed in Section 3.4.2. For each criterion, smaller scoresindicate a relative improved fit. For the Expected PredictiveDeviance (EPD), results are given on the logarithmic scaleGravity Model f (dij ;θ) Goodness of fit

RPS DSS log (EPD)

Unconstrained F1 71.95 11.84 11.76F2 191.64 13.18 12.83F3 80.98 11.79 11.86F4 96.24 11.98 12.01

Constrained F1 66.80 11.76 11.65F2 109.85 13.27 12.28F3 67.27 11.72 11.66F4 77.63 11.94 11.81

variation assigned to the spatial random effects, which was indicative of more variationbeing explained by the fixed effects under the constrained model.

4.2. Model comparison and goodness of fitGoodness of fit diagnostics for all models are provided in Table 4. All assessments indicatedan improved fit under the constrained gravity model for functions F1 and F3, both ofwhich involved an exponential decay function in f (dij ;θ). Comparing constrained andunconstrained models across all choices of f (dij ;θ), results were in favour of the constrainedgravity model. The relative poor fits of functions F2 and F4 were articulated under thesecriteria, in particular F2, with values markedly higher than other functions.

Applying results from Section 3.4.1, Figure 2 provides an assessment of gravity modelmisspecification, displaying the relationship between the posterior means of sj from each

model and log(

A(c)j

)

, computed for each constrained model. For all unconstrained mod-

els, a clear linear relationship was observed, indicative of the effects of competition beingreflected in the residual. This relationship was mitigated under the proposed methodology,in particular for F1 and F3. The result for F2 under the constrained model, however, wasnot consistent and raised further concerns regarding lack of model fit.

Combining results from all diagnostics and Table 3, the constrained model with expo-nential decay function (F1) was selected for further inference. In the next Section, selectedmodel outputs are compared to the unconstrained model with the same distance function,to explore the impact of model misspecification with respect to competition on inference.

4.3. Impact of gravity model specification on statistical inferenceEstimates of the Poisson means under the chosen constrained and equivalent unconstrainedmodels are summarised in Figure 3, on the logarithmic scale. Overall, there were no sys-tematic differences in the posterior means, which was expected given the trade off betweenfixed and random effects. With respect to posterior estimate uncertainty, results showeda gradual increase as estimated log rates became more negative. This was linked to theexistence of sparse counts at large distances from each clinic.

Given differences in estimates of the fixed effect for Ci, the posterior distributions ofeach exp (vi) were also compared (see Supplementary Figure 1). In this case, there was a

Page 16: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

16 N. White et al.

F1 F2 F3 F4

−1

0

1

2

3

−1

0

1

2

Constrained

Unconstrained

−2.5−2.0−1.5−1.0−0.5 0.0 0.5 −6 −4 −2 −3 −2 −1 0 −16 −15 −14 −13 −12

Posterior mean of log (Aj(c)

)

Pos

terio

r m

ean

of s

j

Fig. 2. Plots of regional random effects against log(

A(c)j

)

for the eight fitted models, differing in

terms of the type of gravity model and the choice of f (dij ;θ)

Page 17: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 17

Constrained Unconstrained

−2.50.02.55.0

0.02.55.0

−5.0−2.5

0.02.55.0

−5.0−2.5

0.02.55.0

0.02.55.0

−2.50.02.55.0

0.02.55.0

−5.0−2.5

0.02.55.0

2.5

5.0

−5.0−2.5

0.02.5

Clinic 1

Clinic 2

Clinic 3

Clinic 4

Clinic 5

Clinic 6

Clinic 7

Clinic 8

Clinic 9

Clinic 10

5 10 15 20 25 30 35 40 5 10 15 20 25 30 35 40Region (sorted)

log(

µ ij)

Fig. 3. Comparison of posterior means and CIs for log (µij) for the constrained (left) versus un-constrained (right) gravity model with exponential decay distance function. Posterior estimates arecompared by clinic, with regions sorted by decreasing order with respect to the posterior mean oflog (µij) for a given clinic. Solid dot = Posterior mean; Dark grey area = 95% CI; Light grey area =80% CI

Page 18: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

18 N. White et al.

lack of evidence to suggest strong differences in estimates of relative excess participationbetween the two models.

Consistent with previous results, differences emerged in the posterior means of area-levelrandom effects. This comparison is illustrated in Figure 4, with posterior means and 95%CIs provided for each exp (sj) as estimates of excess participation. Accounting for posterioruncertainty, the degree of overlap between estimates for the two models varied. For theunconstrained model, high estimates of excess participation were associated with higherposterior uncertainty.

0

2

4

6

8

10

12

5 10 15 20 25 30 35 40Region (sorted)

Exc

ess

part

icip

atio

n

Fig. 4. Posterior means for 95% CIs for exp (sj). Regions are sorted in increasing order with respectto the posterior mean estimated under constrained model with exponential decay function. Thehorizontal black line in the centre of each bar denotes the posterior mean. Dark Grey= Constrainedgravity model, Light Grey = Unconstrained gravity model

Differences observed in Figure 4 impacted the classification of regions into differentcategories of excess participation. Given interest in identifying regions for further investi-gation, Table 5 summarises the number of regions classified into categories of relatively low(exp (sj) < 0.7) or high (exp (sj) > 1.3) excess participation. Different classification rulesbased on regions exceeding a selected posterior probability were considered, ranging from0.5 to 0.95.

Disagreement in classification between models was greatest for low excess participation,with more regions identified under the unconstrained model. At the smallest posterior

Page 19: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 19

Table 5. Number of regions classified into low versus high excess participation, as afunction of posterior probability. For each selected posterior probability cut-off, the num-ber of identified regions common to both models is given, alongside the number of re-gions identified by each model. For each model, the number in brackets denotes thenumber of additional regions identified.Posterior probability cut-off Low excess participation: Pr (exp (sj) < 0.7|y) > K

(K) Common regions Number of regions identifiedidentified Constrained Unconstrained

0.5 7 8 (1) 20 (13)0.6 6 7 (1) 19 (13)0.7 4 4 (0) 16 (12)0.8 2 2 (0) 12 (10)0.9 1 1 (0) 6 (5)0.95 1 1 (0) 5 (4)

High excess participation: Pr (exp (sj) > 1.3|y) > K

Common regions Number of regions identifiedidentified Constrained Unconstrained

0.5 9 10 (1) 12 (3)0.6 7 8 (1) 10 (3)0.7 6 7 (1) 10 (4)0.8 4 4 (0) 10 (6)0.9 4 4 (0) 10 (6)0.95 4 4 (0) 10 (6)

probability cut-off, an additional 13 regions were classified, which corresponded to 30% ofthe total study area. For high excess participation, the number of regions classified underthe unconstrained model was mostly consistent, compared with a decrease in numbers forthe constrained model, as the cut-off increased.

4.4. Evaluating the current state of health program provisionIn light of its improved fit, the selected constrained model was further explored with respectto the second and third objectives outlined in Section 2. Based on the hierarchical modeldeveloped, these objectives relate to the estimation and comparison of random effects s andv, defined at the region and clinic levels respectively.

Prior to the compilation of results in this Section, a sensitivity analysis was conductedon the prior distributions assumed for the variance components,

(

σ2, σ2v , σ

2s

)

, in light of theirdirect influence on the random effects. Briefly, the prior distributions defined in Section 3.2were compared with two alternatives: a left-truncated Normal distribution on the standarddeviation, N (0, 1)+ (Li et. al, 2012), and a proper Inverse Gamma distribution on thevariance, IG (1, 0.01) (Banerjee et. al, 2004). The results of this analysis, provided in theSupplementary Material, indicated sensitivity with respect to the clinic-level random effects,v. Therefore, a secondary analysis was carried out comparing inferences relating to theseeffects for the above defined prior distributions with an independent random effects prior,vl ∼ N (0, 1000); l = 1, . . . , L. The latter prior on v was motivated by concerns aboutexcessive hierarchical shrinkage towards zero. Overall, while differences in random effectsestimates were observed, this did not impact on consequent posterior summaries regardingrelative excess participation at the clinic level. For this reason, we restrict the presentation

Page 20: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

20 N. White et al.

[0,0.7)[0.7,0.9)[0.9,1.1)[1.1,1.3)[1.3,10]

Fig. 5. Posterior means of exp (sj) under the constrained gravity model with exponential decayfunction (F1)

of results in this Section to those obtained assuming the prior distributions as outlined inSection 3.2.

The posterior mean of each regional random effect is mapped in Figure 5, as a visualmeans of identifying spatial trends in excess regional participation. In general, regions clas-sified as relative excess under-participation were concentrated in the centre of the studyarea, corresponding to suburbs in close proximity of the city centre. From a policy per-spective, under-participation in these regions, irrespective of clinic, could motivate futureprogram promotion, and is discussed in Section 5. Of those regions with posterior estimatesin excess of 1, these were mostly located near the limits of the study area, particularly inthe southeast and southwest directions.

The relative excess under and over participation in different clinics was investigatedby summarising the posterior distributions for each vl, with results presented in Table 6.Overall, posterior mean estimates and 95% CIs indicated relative excess over-participationin three clinics (1, 2 and 9).

By sorting these random effects from smallest to largest after each MCMC iteration,clinics were able to be ranked, with uncertainty in this ranking expressed in terms of quantiledistributions. This additional output showed that, out of all clinics, clinics 6, 7 and 8 wereconsistently ranked in the lowest 25% and experienced relative excess under-participation.In contrast, the random effect for clinic 9 was among the largest, with greatest membershipin the top 25% of all ranks, providing strong evidence of relative excess over-participation.

Page 21: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 21

Table 6. Posterior summary of clinic level random effects under the constrained model withexponential decay function (F1). Posterior distributions for each vi are first summarised interms of their posterior mean and 95% CI. For each clinic, the distribution of ranks (smallestto largest vi) are described by the proportion of MCMC iterations that the estimated randomeffect falls into each quantile. For example, 0−25% characterises the proportion of iterationsthat a clinic rank corresponded to the lowest 25% of clinics.Random effect Posterior estimate Distribution of ranks

Mean 95% CI 0− 25% 26− 50% 51− 75% 76− 100%

v1 0.77 (0.22,1.23) 0 0 0.06 0.94v2 0.61 (0.11,1.04) 0 0.01 0.15 0.84v3 0.42 (-0.17,0.54) 0 0.13 0.81 0.05v4 -0.02 (-0.47,0.42) 0.05 0.47 0.46 0.01v5 -0.36 (-0.65,-0.08) 0.25 0.73 0.02 0v6 -0.94 (-1.32,-0.54) 1.00 0 0 0v7 -0.58 (-0.97,-0.14) 0.79 0.20 0.01 0v8 -0.68 (-1.09,-0.30) 0.90 0.10 0 0v9 0.91 (0.55,1.32) 0 0 0.01 0.99v10 0.12 (-0.41,0.73) 0.01 0.35 0.47 0.17

Given the geographic location of this clinic in the city centre, this result was not unexpected,with participation in this clinic likely to be from patients with workplaces nearby.

5. Discussion

This paper has considered a hierarchical modelling framework for understanding patternsin health program use, with an emphasis on the specification of the gravity model as fixedeffects. Building on work from previous authors, a constrained gravity model was proposed,to account for the effects of competition. To assess evidence of gravity model misspeci-fication with respect to competition, a graphical diagnostic was proposed. The analysisof spatial trends in relative excess participation was also of interest, with a focus on thepartitioning of residual variation between sources.

Using a combination of model assessment diagnostics, the application of the proposedmethodology in Section 4 resulted in an improved model fit, with greatest evidence for theuse of an exponential decay accessibility function. The consequences of model misspecifi-cation were also demonstrated, with inferences relating to small area variation affected. Inpractice, it is therefore recommended that the choice of model be determined by a selectionof goodness of fit diagnostics, in the absence of knowledge of the existence of competition.With this in mind, future work may consider strategies for unsupervised learning of the trueaccessibility function, for example model averaging, and the possibility that this functionmay vary between clinics. Related work in spatial model choice was recently developed byLi et. al (2012).

Under the selected model, the estimation and interpretation of random effects providedadditional insights into the current state of program provision. At the regional level, theresults summarised in Section 4.4 indicated evidence of geographic inequalities in clinic use,deserving of further investigation. By identifying small areas corresponding to under par-ticipation, a possible application of these results is targeted program promotion. However,given the classification of such regions based on random effects, care should be taken in theirinterpretation, particularly since this identification has been made relative to participation

Page 22: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

22 N. White et al.

in other regions. Furthermore, their estimation does not preclude partial explanation byother covariates unavailable at the time of analysis, for example, regional preferences forprivately-funded mammography. Similarly, acknowledging residual differences at the cliniclevel, inference was gained about relative excess participation in clinics available duringthe study period. Potential drivers of these differences may be truly latent or able to beexplained by additional measures of clinic capacity, or indeed a combination of the two.In either case, the utility of random effects in this setting should not be understated, andshould be viewed as means of prompting further investigation into the drivers of inferredinequalities.

The methodology in this paper was developed to address gravity model misspecifica-tion with respect to competition defined at each origin. With this in mind, there exist aadditional number of gravity model variants that could be further explored, as part of theGLMM framework. An example of this is misspecification with respect to competition ef-fects among destinations, irrespective of the origin. In terms of gravity model specification,this would involve more than one destination-based mass, compared to this paper that as-sumed a single factor. Future work in this area could therefore extend the misspecificationdiagnostic from Section 3.4.1 to assess the impact of ignoring these additional aspects ofbehaviour. Further to this, in terms of the origin-based factor, Pj , one could also assessthe impact of proportional versus non-proportional effects. In this paper, a proportionaleffect for Pj was assumed for both constrained and unconstrained gravity models. This wasprimarily due to the desire to retain the interpretation of the gravity model as a mecha-nism for prediction the distribution of each eligible population across clinics. The chosenunconstrained model, similar to the approach by Congdon and Best (2000) also assumed aproportional effect.

In their current form, the models outlined in this paper assume that the time duringwhich different clinics were available did not affect a participant’s choice of clinic. Althoughthe length of time each clinic is available was included as an offset, this is a simplifyingassumption, and precludes inference about changes in participation behaviour as a functionof availability. Future work by the authors will be focused on extending this methodologyto both spatial and temporal domains.

Acknowledgments

This work was supported by the Cooperative Research Centre for Spatial Information, whoseactivities are funded by the Australian Commonwealth’s Cooperative Research CentresProgramme. The authors thank Dr. Stephen Ball (Institute for Child Health Research,Perth, Western Australia), for his assistance in providing maps used in this paper andNathan Dunn (Queensland Health) for his assistance with the case study data. We alsothank the Joint Editor, Associate Editor and two anonymous referees for their generousfeedback that contributed greatly to revisions of this manuscript.

References

Australian Bureau of Statistics (2006). Technical Report. Australian Standard GeographicalClassification.

Bailey, T. and Gattrell, A. (1995). Interactive Spatial Data Analysis. London: Longman

Page 23: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 23

Banerjee, S., Carlin, B. and Gelfand, A. (2004). Hierarchical Modeling and Analysis forSpatial Data. Boca Raton: Chapman and Hall-CRC.

Besag, J., York, J. and Mollie, A. (1991). Bayesian image restoration, with two applicationsin spatial statistics. Annals of the Institute of Statistical Mathematics, 43, 1–20.

Best, N., Richardson, S. and Thomas, A. (2005). A comparison of Bayesian spatial modelsfor disease mapping. Statistical Methods in Medical Research, 14, 35–59.

Brustrom, J. E. and Hunter, D. C. (2001). Going the distance: How far will women travelto undergo free Mammography? Military Medicine, 166, 347–349.

Congdon, P. (2000). A Bayesian approach to prediction using the gravity model, with anapplication to patient flow modeling. Geographical Analysis, 32, 205–224.

Congdon, P. (2001). The Development of Gravity Models for Hospital Patient Flows underSystem Change: A Bayesian Modelling Approach. Health Care Management Science, 4,289–304.

Congdon, P. (2010). Random-effects models for migration attractivity and retentivity: aBayesian methodology. Journal of the Royal Statistical Society: Series A (Statistics inSociety), 173, 755–774.

Congdon, P. and Best, N. (2000). Small area variation in hospital admission rates: Bayesianadjustment for primary care and hospital factors. Journal of the Royal Statistical Society:Series C (Applied Statistics), 49, 207–226.

Czado, C., Gneiting, T. and Held, L. (2009). Predictive model assessment for count data.Biometrics, 65, 1254–1261.

Department of Health and Ageing (2009). Technical Report, Commonwealth of Australia.Evaluation of the BreastScreen Austrlia Program: Evaluation Final Report.

Diggle, P. (1990). A point process modelling approach to raised incidence of a rare phe-nomenon in the vicinity of a prespecified point. Journal of the Royal Statistical Society.Series A (Statistics in Society), 153, 349–362.

Diggle, P, and Morris, S., Elliott, P. and Shaddick, G. (1997). Regression modelling ofdisease risk in relation to point sources. Journal of the Royal Statistical Society. SeriesA (Statistics in Society), 160, 491–505.

Dreassi, E. and Biggeri, A. (2003). Incorporating gravity model principles into diseasemapping. Biometrical journal, 45, 207–217.

Earnest, A., Morgan, G., Mengersen, K., Ryan, L., Summerhayes, R. and Beard, J. (2007).Evaluating the effect of neighbourhood weight matrices on smoothing properties of Con-ditional Autoregressive (CAR) models. International journal of health geographics, 6,54–65.

Earnest, A., Cramb, S. M. and White, N. M. (2013). Disease Mapping Using BayesianHierarchical Models. In Case Studies in Bayesian Statistical Modelling and Analysis (edsAlston, C., Mengersen, K. and Pettitt, A.), pp 221–239. Chichester: Wiley.

Page 24: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

24 N. White et al.

Elting, L., Cooksley, C., Bekele, B., Giordano, S., Shih, Y., Lovell, K., Avritscher, E.Theriault, R. et. al. (2009). Mammography capacity impact on screening rates and breastcancer stage at diagnosis. American Journal of Preventative Medicine, 37, 102–108.

Fotheringham, A. (1983a). A new set of spatial interaction models: the theory of competingdestinations. Environment and Planning A, 15, 15–36.

Fotheringham, A. (1983b). Some theoretical aspects of destination choice and their relevanceto production-constrained gravity models. Environment and Planning A, 15, 1121–1132.

Fotheringham, A. (1981). Spatial structure and distance-decay parameters. Annals of theAssociation of American Geographers, 71, 425–436.

Gelfand, A. E. and Ghosh, S. K. (1998). Model choice: A minimum posterior predictiveloss approach. Biometrika, 85, 1–11.

Gelfand, A. E., Sahu, S. K. and Carlin, B. P. (1996). Efficient Parameterizations for Gener-alized Linear Mixed Models. In Bayesian statistics 5 (eds Bernardo, J. M, Berger, J. O.,Dawid, A. P. and Smith, A. M.), pp479–488. Oxford: Oxford University Press.

Gelman, A., Meng, X. and Stern, H. (1996). Posterior predictive assessment of model fitnessvia realized discrepancies. Statistica Sinica, 6, 733–760.

Higgs, G. (2004). A literature review of the use of GIS-based measures of access to healthcare services. Health Services and Outcomes Research Methodology, 5, 119–139.

Hodges, J. and Reich, B. (2010). Adding spatially-correlated errors can mess up the fixedeffect you love. The American Statistician, 64, 325–334.

Hyndman, J. and Holman, C. (2000). Differential effects on socioeconomic groups of mod-elling the location of mammography screening clinics using Geographic Information Sys-tems. Australian and New Zealand Journal of Public Health, 24, 281–286.

Hyndman, J. Holman, C. and Dawes, V. (2000). Effect of distance and social disadvantageon the response to invitations to attend mammography screening. Journal of MedicalScreening, 7, 141–145.

Laud, P. and Ibrahim, J. (1995). Predictive model selection. Journal of the Royal StatisticalSociety. Series B (Statistical Methodology), 57, 247–262.

Lawson, A. (1993). On the analysis of mortality events associated with a prespecified fixedpoint. Journal of the Royal Statistical Society. Series A (Statistics in Society), 156,363–377.

Li, G., Best, N., Hansell, A., Ahmed, I. and Richardson, S. (2012). BaySTDetect: detectingunusual temporal patterns in small area data via Bayesian model choice. Biostatistics,13, 695–710.

Luo, W. and Wang, F. (2003). Measures of spatial accessibility to health care in a GIS en-vironment: synthesis and a case study in the Chicago region. Environment and PlanningB, 30, 865–884.

Page 25: Predicting health program participation: a gravity-based ...eprints.qut.edu.au/90762/1/WhiteMengersenJRSSC_main.pdf · gravity-based approaches to health services modelling and a

Predicting health program participation 25

Maheswaran, R., Pearson, T., Jordan, H. and Black, D. (2006). Socioeconomic depriva-tion, travel distance, location of service, and uptake of breast cancer screening in NorthDerbyshire, UK. Journal of epidemiology and community health, 60, 208–212.

Marshall, E. and Spiegelhalter, D. (2003). Approximate cross-validatory predictive checksin disease mapping models. Statistics in medicine, 22, 1649–1660.

Papaspiliopoulos, O., Roberts, G. O. and Skold, M. (2007). A general framework for theparametrization of hierarchical models. Statistical Science, 22, 59–73.

Pascutto, C., Wakefield, J., Best, N., Richardson, S., Bernardinelli, L., Staines, A. andElliot, P. (2000). Statistical issues in the analysis of disease mapping data. Statistics inMedicine, 19, 2493–2519.

Riebler, A. and Held, L. (2010). The analysis of heterogeneous time trends in multivariateage-period-cohort models. Biostatistics, 11, 57–69.

Schuurman, N., Berube, M. and Crooks, V. (2010). Measuring potential spatial access toprimary health care physicians using a modified gravity model. The Canadian Geographer,54, 29–45.

Smith, R., Duffy, S. and Tab’ar, L. (2012). Breast cancer screening: the evolving evidence.Oncology, 26, 471–486.

Spiegelhalter, D. J., Thomas, A., Best, N. G. and Gilks, W. R. (1996). BUGS: BayesianInferenice uising Gibbs Sampling, Manual Version 0.50. User Manual, Medical ResearchCouncil Biostatistics Unit.

Stanaway, M. A., Reeves, R. and Mengersen, K. L. (2011). Hierarchical Bayesian modellingof plant pest invasions with human-mediated dispersal. Ecological Modelling, 222, 3531–3540.

Stern, H. and Cressie, N. (2000). Posterior predictive model checks for disease mappingmodels. Statistics in medicine, 19, 2377–2397.

Wakefield, J. C. and Morris, S. E. (2001). The Bayesian modeling of disease risk in relationto a point source. Journal of the American Statistical Association, 96, 77–91.

Waller, L. A., Carlin, B. P., Xia, H. and Gelfand, A. E. (1997). Hierarchical spatio-temporalmapping of disease rates. Journal of the American Statistical Association, 92, 607–617.

Zackrisson, S., Andersson, I., Manjer, J. and Janzon, L. (2004). Non-attendance in breastcancer screening is associated with unfavourable socio-economic circumstances and ad-vanced carcinoma. International journal of cancer, 108, 754–760.

Zackrisson, S., Lindstrom, M., Moghaddassi, M., Andersson, I. and Janzon, L. (2007).Social predictors of non-attendance in an urban mammographic screening programme: amultilevel analysis. Scandinavian journal of public health, 35, 548–554