empirical bayes methods for estimating hospital-specific mortality rates

STATISTICS IN MEDICINE, VOL. 13,889-903 (1994)

EMPIRICAL BAYES METHODS FOR ESTIMATING HOSPITAL-SPECIFIC MORTALITY RATES

NEAL THOMAS, NICHOLAS T. LONGFORD Educational Testing Service, Princeton, NJ 08541, U S A .

AND

JOHN E. ROLPH RAND Corporation. Sanra Monica, CA 90407-2138. U.S.A.

SUMMARY We present alternative methods for estimating hospital-level mortality rates to those used by the Health Care Finance Administration for Medicare patients. We use an empirical Bayes model to represent the different sources of variation in observed hospital-specific mortality rates and we use a logistic regression model to adjust for severity differences (in patient mix) across hospitals. In addition to providing a principled derivation of a standard error for the commonly used estimator, our fully model-based formulation produces much more accurate estimates and resolves the severe problem of multiple comparisons that arises when extreme estimates are used to identify exceptional hospitals. We estimate models for each of four disease conditions using the national Medicare mortality data base which does not contain patient severity descriptors, and mortality data from national samples which do include patient severity descriptors. We find substantial between-hospital variation in the unadjusted death rates from the national data base. Mortality rates differ substantially with patient severity in our models, but the sample sizes are too small to yield reliable estimates of the between-hospital variation in adjusted mortality rates.

1. INTRODUCTION

Release of hospital-level mortality data by the Health Care Finance Administration (HCFA), state organizations, and private groups in recent years has focused attention on how hospital-level mortality rate data can be used to evaluate the effectiveness of hospital care. There is some controversy about the value of hospital mortality data for measuring the quality of care.'V2 The principal objection is that mortality rates obtained from administrative data sources are difficult to interpret because there has been little adjustment for the clinical condition of patients on admi~sion.~

We use the data and reporting methods utilized by the Medicare Mortality Predictors System (MMPS)4.s as an illustrative example. The MMPS is a computerized medical record abstraction system that allows a hospital to compare its own raw mortality rate for each of four diseases to an adjusted rate based on the estimated national mortality rate for patients with similar clinical conditions. We focus, however, on the common statistical issues involved in the analysis of hospital-specific severity-adjusted mortality rates and present methods which are substantial improvements over those currently in use.

We use an empirical Bayes model to represent the different sources of variation in observed mortality rates at different hospitals. This fully parametric approach provides a coherent

CCC 0277-6715/94/090889-15 0 1994 by John Wiley & Sons, Ltd.

Received October 1992 Revised June 1993

890 N. THOMAS, N. LONGFORD AND J. ROLPH

framework for evaluating the relative importance of different sources of variation in the observed death rates. As a limiting case, the Bayesian framework contains the estimator now commonly used in mortality reporting systems (for example Reference 6). It also provides standard inferen- tial procedures for computing confidence or high-probability coverage intervals, in contrast to existing methods, which have not been derived within a rigorous analytic framework. The most important benefits of this approach are:

1. The empirical-Bayes methods provide more accurate, stable estimates of hospital-specific mortality rates.

2. Identifying exceptional hospitals is perhaps the most important use of hospital-specific estimates of mortality rates. The confidence or high-probability coverage intervals produced by the Bayesian methods account for the severe multiple comparisons problem that arises when identifying exceptional hospitals from among thousands of hospitals.

3. The fully model-based approach provides a principled assessment of the uncertainty in the severity-adjusted mortality rate estimator currently in common use. Simulation studies show that the model-based standard error estimate is somewhat more accurate than several alternative standard error estimates.

4. The variation in hospital-specific mortality rates not attributable to small-sample fluctuations or measurable differences in patient severity can be explicitly quantified. Pre- vious studies, such as Park et al.,’ have focused on establishing the existence of such variation.

5. We can assess the value of reporting the hospital-specific death rates based on different sample sizes in terms of knowledge gained. We can answer, for example, how much information about a hospital’s underlying death rate we would expect to obtain from a sample of 50 of its patients beyond that obtained from the national distribution of hospital mortality rates (adjusted for patient severity differences). Similar calculations could be used for establishing minimum sample sizes for reporting purposes. These calculations may inform decisions not to collect expensive severity information.

6. The empirical Bayes models can be extended in a natural way to improve the accuracy of reporting by appropriately pooling information across several medical conditions or time periods.

In Section 2 we describe the three relevant data sources for the study. Section 3 covers statistical methods. A Bayesian approach produces shrinkage estimates of individual hospital mortality rates and provides a principled derivation of the currently used MMPS hospital mortality rate estimator. In Section 4, we present results on the variation in hospital-specific mortality rates with and without adjustment for patient severity, and in Section 5 we report the results of simulation studies comparing the performance of the Bayes shrinkage estimator and the current MMPS estimator.

2. DATA SOURCES The Prospective Payment System (PPS) RAND Quality of Care Study* designed a data collection method for patient severity at admission that was used by both the PPS and MMPS studies.4** It includes 60 to 80 disease-specific variables from the medical record that describe the patient’s acute and chronic, morbid and comorbid condition^.'*^-'^ We use the severity adjustment models developed by the MMPS‘ and the PPS” in our hospital mortality models.

Our analyses make use of several different data sets. We now briefly describe each data set.

EMPIRICAL BAYES METHODS FOR ESTIMATING HOSPITAL-SPECIFIC MORTALITY RATES 891

2.1. MMPS national sample

The Medicare Mortality Predictor System (MMPS)4*5 collected a nationally representative sample of 5888 hospital discharges for the four medical conditions of stroke, pneumonia, myocardial infarction and congestive heart failure. The mortality outcome of each patient 30 days after hospital admission and the patient's severity characteristics were abstracted from medical records. The clinical variables used in the MMPS are described in Daley et aL4

Estimates of the parameters of the national survival rate function, equation (1) in Section 3, and the asymptotic covariance matrix of these estimates are computed using the MMPS data. The severity characteristics of subsets of the MMPS data are also used to form the simulated hospitals presented in Section 5.

2.2. Prospective Payment System (PPS) evaluation sample

The MMPS sample included many different hospitals, each represented by a very small number of patients. Although this is a good sampling design for the purpose of estimating severity adjustments, it is a very poor design for estimating the amount of the non-sampling variability between hospitals that can be attributed to the differences in the severity of the patients they treat. Because of its more appropriate design for estimating variability between hospitals, we used the Prospective Payment System (PPS) data.* The PPS project evaluated the change in the quality of care resulting from the introduction of the diagnostic related group-based Prospective Payment System. The PPS evaluated the same medical conditions as the MMPS study, plus two additional medical conditions not considered here.

Here is a brief description of the sampling plan employed by the PPS project; see Draper et ~ 1 . ' ~ for details. Approximately nine patients were sampled for each medical condition from each of 297 hospitals. The number of patients sampled at each hospital varied owing to the lack of eligible patients at some hospitals. Two features of the sampling design cause complications for the analyses in this paper. First, the patients from each hospital were selected from two time periods: half of the patients were admitted during 1981-2 before the introduction of the PPS, and half of the patients were admitted during 1985-6 after the introduction of the PPS. Second, hospitals were selected through a sampling plan that stratified on hospital variables representing hospital size, urbanicity, and the poverty rate of each hospital's patients. Very small hospitals were excluded from the sample. See Sections4.2 and 4.3 for a discussion of this sampling design.

2.3. Medicare national mortality data

Hospital-specific mortality data for the entire population of Medicare patients (from approximately 5500 hospitals) in each of the four MMPS medical conditions for fiscal year 1986 are used to give accurate estimates of the between-hospital variation in the death rates for each condition. The median number of qualifying patients treated at hospitals in the medical conditions of stroke, pneumonia, myocardial infarction and congestive heart failure are 33,56,30 and 54, respectively.

2.4. Hospital-specific MMPS data

An important feature of the MMPS is that each hospital collects its own data on a voluntary basis with no centralized reporting. As a consequence, the MMPS reports cannot reference or use the MMPS results from other hospitals. Because there is currently no centralized reporting, we do not have hospital-specific MMPS data.


3. STATISTICAL METHODS

3.1. Statistical formulation

This section presents a statistical framework for quantifying the variation in the 30-day mortality rates across hospitals. The seventy information for an inividual patient is denoted by X, and Y is a dichotomous variable taking the value one if the patient dies within 30 days of hospital admission and zero otherwise. The proportion of patients who die among all patients in the nation with the same severity X is denoted by PN(Y = 1 IX). The (hypothetical) death rate for all patients in the nation with the same severity characteristics if they could be observed receiving treatment at a particular hospital is denoted by PH( Y = 1 IX). A common notation is used for each medical condition when estimating death rates.

The MMPS4 used a linear logistic regression model,

logit (PN( Y = 11 X)} = a + /3X, (1)

to represent the national death rate, where logit(p) = ln{p/(l - p)}. They computed maximum likelihood estimates of the parameters (a, /I) using the large national sample collected by the MMPS. We denote their maximum likelihood estimates by (amle, Bmle), and the asymptotic variance-covariance matrix by emlc. This model is commonly used for mortality prediction.' '

To assess a hospital's performance with respect to mortality, the severity measures are collected from the medical records of all patients entering the hospital with a qualifying condition during the preceding year. We represent the 30-day mortality outcomes and severity characteristics for a hospital's qualifying patients by Y,, . . . , Y, and X1,. . . , X,.

We assume throughout that each patient in the hospital has a conditionally independent mortality outcome. This is a defensible assumption because our medical conditions are not contagious. We also assume that the assignment of patients to hospitals may depend on the measured patient characteristics X, but does not depend on any other unmeasured patient characteristics that are related to Y after conditioning on X. Note that differences in estimated hospital mortality rates may be due in part or whole to remaining unmeasured differences in severity.

We compare the observed death rate among the patients admitted to a hospital, Y = (l/n) I;=' 5, to the average of the estimated national death rates for patients with the same severity, pN,,,,. = ( l /n)I;=, pj,Nm,., where pj,Nm,. = logit-'(im,e + fimleXj) and the subscript mle indicates that the estimates are based on maximum likelihood estimates of the national death rate from the MMPS data.

Although there is little argument about what to compare, there are many competing statistical conceptualizations for measuring the uncertainty in the difference between the adjusted national death rate and a particular hospital death rate. We use the estimand

-

i n i n

and condition on X I , . . . , X, throughout.

3.2. Between-hospital variation

Let ri be the observed death rate at hospital i , and Pi be the underlying death rate at the hospital (without regard to severity). Our model for the number of deaths at hospital i is binomial


conditional on Pi and the number of patients at the hospital n i :

n i yi - Bi(ni, Pi). (3)

We quantify the variation in hospital-specific death rates different from that expected as a consequence of small-sample binomial fluctuations by assuming that the Pi vary among hospitals according to a logit-normal distribution,

logit(Pi) - N(p, oi), (4)

with ni carrying no information about Pi. Maximum likilihood estimates of oa in (4) are given in Section 4.1. The logit-normal formulation given here is very similar to the beta-binomial distribution used by Jencks et al.’

3.3. Incorporating severity

A simple expansion of the model in (1) allows underlying death rates at hospitals to vary as a function of severity. The expanded model including patient severity is

( 5 )

with the subscripts i and j indicating patient j treated at hospital i , and Si representing how much the death rate (on the logit scale) for patients at hospital i differs from the national rate. Analogously to (4), the Si vary according to a logit-normal model:

logit { PH( yi j = 1 I X i j ) } = a + p X i j + S i ,

6i - N ( 0 , ~ i ) . ( 6 ) Maximum likelihood estimates of od in (6) are given in Section 4.2. Computational details of the maximum likelihood estimators are given in Thomas et a l l 6

The model in (5 ) and (6) has been used by many authors; see Longford” for references. There are more elaborate formulations that allow the hospital differences to vary depending on the values of X, and more general forms for the distribution of Si.

The logit-normal model given by (3) and (4) is a special case of the model in (5 ) and (6) with no covariates and it is part of a nested sequence of models measuring the reduction in variation in underlying hospital death rates after adjustment for measured severity. The standard deviation na must be interpreted carefully, however, because ud does not necessarily decrease when additional predictor variables are added.

3.4. Bayes shrinkage estimators

We use Bayesian ideas from Sections 3.2 and 3.3 to calculate estimates of the individual hospital mortality rates based on the posterior distribution of Si. The Bayesian approach ‘shrinks’ maximum likelihood estimates (described in Section 3 3 , and produces a more stable, improved estimator.’*

A prior distribution for (a, p, Si) can be constructed in two steps, where S i raises or lowers the log odds of survival at a specijic hospital i. The prior distribution for (a, 8 ) uses the maximum likelihood estimates, the information matrix, and the approximate sampling distribution from the MMPS data:

(7) Because we have no reason to believe that the quality of care at a particular hospital is related to the relative predictive strength of patient characteristics from a national sample, we assume that hi is independent of (tl, p). Using the normal prior distribution for Si in (6) yields a logit-normal model consistent with the model in ( 5 ) and (6).

(a, B ) - N{(krnIe, Brnle), grnle).


Note that the prior or mixing distribution for Si depends on bd, which is also unknown, but can be estimated from the PPS data. Thus, to obtain a prior distribution for S i , we specify a prior distribution for ad and then integrate it to obtain

In Section 4.4, we use a scaled inverse chi distribution with three degrees of freedom, xi’, to approximate the substantial uncertainty about od that remains after it has been estimated using the MMPS and the PPS samples. The scaled x;’ distribution for implies a scaled t 3 prior distribution for S i in (8), and is used in the simulation studies in Section 5 . Thomas et describe the numerical methods used to evaluate the point estimates (posterior means) and high posterior probability intervals. Recent developments based on Monte Carlo methods” and asymptotic corrections” may be useful in further refining the Bayesian calculations.

3.5. A Bayesian derivation of the estimator of hospital mortality rates currently used by the MMPS

The current MMPS estimator is -

- pNm,. with standard error given by n

(SEc)’ = 1 pj,Nml. (1 - pj,Nma.) + var(PNN,,.), j = 1

- where var(pNm,.) is computed using the delta method applied to (amlc, Bmlc).’ Rather than the normal prior distribution in (6), consider a flat, improper prior distribution for 6. We reuse the normal prior distribution of (a, /3) in (7) based on the large MMPS national sample. Denote the mode of the resulting posterior distribution of (a, /3, 6) by (a, b, 8), and define the corresponding estimate of FH - PN:

The ( 2 , j ) are based on the new data at a specific hospital, and differ from (amle, Bmlc), which are based on the previously analysed data from the MMPS national sample. The difference is very small, however, because the prior distribution of (a, /3) obtained from the large MMPS national sample is much more concentrated than the information about (a, /3) contained injhe d&ta from a single hospital. The resulting hybjid Bayes/maximum-likelihood estimator, p, - pN, is a very close approximation to u - fNm,., the estimator currently used in the MMPS and other hospital-level mortality reporting systems. We formalize this assertion in the following result that is easily proved using the fact that the sum of the estimated probabilities from a logistic regression equals the total number of cases with & = 1.”

Result Consider the estimator P, - pN computed using the mode (2, b, s )̂ of the posterior distribution formed with the normal prior distribution for (a, /I) in (7) and the flat prior distribution for 6, and suppose that there is at least one death and one survival in the hospital data. Then as the prior variance of (a, /3) approaches 0,

-

- The model-based derivation of - pNm,. motivates alternative estimators and alternative esti- mands. Hospital death rates can be standardized to a common population to make estimates for different hospitals directly comparable and the estimate of 6 can also be used to produce


Table I. Estimates of bd without severity conditioning based on the national data base for 1986

Medical condition Death rate $4 SE (6~)

Stroke 0 2 0 0 2 6 0.0 1 Pneumonia 0.19 028 0.0 1 Myocardial infarction 0 2 6 0 2 1 0.0 1 Congestive heart failure 0.15 0.2 1 0.0 1

Table 11. Estimates of bd with severity adjustment based on the PPS data

1981-2 1985-6 Medical condition

Bb (SE) ad (SE)

Stroke Pneumonia Myocardial infarction Congestive heart failure

0.50 (0.20) 000 (022) 0.33 (0.40) 0.00 (0.44 0.39 (020) 0.07 (079) 0.50 (0.29) 048 (0.23)

- predictions for future patients. A model-based standard error for - PNm,. can be obtained by applying the delta method using the observed information matrix for (a, p, 8). Simulation results in Thomas et ~ 1 . ' ~ show that confidence intervals with more accurate coverage result from using the model-based standard errors than from using SE,.

4. RESULTS

4.1. Estimates of @d without severity adjustment based on the Medicare national mortality data

The estimates of cd for the logit-normal model in (3) and (4) based on the national data base (with no severity information) are given in Table I. The standard errors of 0.01 show that the sample sizes in the national data base are sufficient to accurately estimate bar the between-hospital variability without severity adjustment. The values of ad differ more than is consistent with all medical conditions having identical 0 6 . However, the narrow range of a d across the four medical conditions suggests a good common estimate of cd after including the covariates measuring severity because the covariates are only moderately strong predictors of survival, and further, we anticipate only a moderate amount of patient selection among hospitals.

4.2. Estimates of @d with severity adjustments based on the PPS data

Table I1 displays the maximum likelihood estimates for the logit-normal model in ( 5 ) and (6) based on the PPS data. Our regression parameter estimates are consistent with the regression parameter estimates given in Daley et aL4 and Keeler et ~ 1 . ' ~ and hence are not presented. Details of the PPS predictor variables are given in Thomas et ~ 1 . ' ~ The severity characteristics are predictive of mortality.


From the wide variation and large standard errors of Ba, we conclude that the PPS data is uninformative about the between-hospital variation in the death rates. Only the 1985-6 values for congestive heart failure rule out no between-hospital variation (aa = 0) on statistical grounds (p < 0.05). However, the numerous comparisons make interpreting the significance probabilities problematic.

The PPS data are from a stratified sample of hospitals that over-represented hospitals treating a high proportion of Medicaid patients. Thus, the hospital mortality rates in the PPS sample will tend to vary more than those of the population of all hospitals, resulting in larger values of Nonetheless, the estimates ad computed using the PPS data without the severity covariates are consistent with those from the national mortality data. Moreover, adding stratifying variables as regressors left the estimates of aa unchanged.

4.3. Combining information about estimates ofad from different data sources

If the medical conditions and time periods are roughly exchangeable, pooling the estimates of a d

across time periods and medical conditions can improve precision. We combined the various conditions using simple averages rather than using more elaborate model-based pooling of the data. We averaged the estimates of ad in Table I1 across the two time periods for the four medical conditions to obtain: stroke (0.25), pneumonia (0.1 7), myocardial infarction (0.23), and congestive heart failure (0.49). The overall average of 8, is 0.28, consistent with the estimates of g d from the national data fitted without severity covariates, and also consistent with the average estimate of 0.22 based on the PPS data fitted without severity covariates.

In summary, our estimates and combinations of the estimates under varying exchangeability assumptions, and our plots of the likelihood functions,I6 led us to conclude that values for ad larger than 0.6 are increasingly unlikely and values of ad larger than 1.0 are extremely unlikely, but the relatively small hospital sample sizes in the PPS data do not allow us to distinguish 0 6

values in the range 0.0 to 06.

4.4. Constructing a prior for S The national data base gives very accurate estimates of between-hospital variation in death rates in the logit-normal model with no severity adjustment. Adjustment should produce only moderate changes in ad, although the changes could be in either direction.

The PPS data provide direct, but weak, information about the between-hospital variation remaining after adjustment for severity. From it we conclude only that ad with severity adjustment included is likely to be no greater than 0.60 for any of the medical conditions. That is, even after the analysis of the PPS data, we cannot make an informed assessment of the variance in hospital mortality rates after severity adjustment.

We use a prior distribution that reflects our vague information about a d to construct our proposed Bayes shrinkage estimator. Based on familiarity and analytic tractability, we chose a scaled xi! distribution,

(m - 6 6 - 66

x m - 1

The similar values of Ba without severity adjustment, and no substantive reason for thinking that adjustment has a differing effect on aa across the different medical conditions, motivate us to assume a common prior distribution. Based on Section 4.3, = 0.30 is a reasonable estimate of ad in the severity adjustment model. After some exploration, we chose rn - 1 = 3, which


represents substantial uncertainty. Standard Bayesian calculationsZ3 show that the prior distribution for 6 is 6 - bat3. This is the largest uncertainty about od within this framework that produces a prior distribution for 6 with a finite mean and variance.

5. SIMULATION STUDIES COMPARING THE BAYES SHRINKAGE ESTIMATOR AND THE ESTIMATOR CURRENTLY USED BY THE MMPS

We simulate the statistical properties of the estimators of severity-adjusted hospital-specific mortality rates. The simulated data vary across medical conditions, patient mix, hospital sizes and underlying hospital mortality rates. Two estimators are compared: the Bayes posterior Cean shrinkage estimator presented in Sections 3.4 and 4.4, and the current MMPS estimator - PNm,.. Outcomes include confidence interval coverage, mean square error of the estimators, bias, and the selection of exceptional hospitals. All results are presented for estimates computed on the death rate ssale, PH - PN, because this allows direct comparison of the Bayes shrinkage estimator with

The simulation studies cover only the idealized conditions of ( 5 ) and (1). They do not evaluate robustness of the modelling assumptions; however, the parameter values do cover the outer limits likely to be encountered in practice. The difference in the hospital and national death rates given in (2) is used as the target estimand to evaluate each of the estimators.

We report on two simulation studies. The first generates a large population of hypothetical hospitals to simulate our best information about the current hospital-level mortality distribution and its sources. This study summarizes the statistical properties of the proposed estimators. In the second study, many replications are generated from each of a small number of fixed hypothetical hospitals to evaluate the estimators’ performance on specific types of hospitals (that is, patient severity mix, hospital size, quality, etc.). Some extreme hospital types are included to highlight trends in the behaviour of the estimators.

- PN,,,,..

5.1. Design of the simulation study with repeated sampling from a small number of hypothetical hospitals

The simulation is arranged in a factorial design with four factors:

1. four medical conditions (stroke, pneumonia, acute myocardial infarction, and congestive

2. three hospital severity levels (low, medium, high); 3. three hospital sizes (25, 50, 100); 4. five hospital quality indices (6 = - 0.6, -0.3,0*0,0.3,0-6).

heart failure);

For each of the four medical conditions, hypothetical hospitals with differing levels of severity in their patient populations were constructed using data from the MMPS national sample (see Appendix I of Reference 16 for details).

Survival data were simulated for patients at each hypothetical hospital using the model in (5). The underlying mortality rates at the hospital were simulated at five different levels (as measured by 6), which were selected to represent the outer limits of plausible differences in hospital mortality as determined in Section 4. The maximum likelihood estimates from the MMPS sample (&,,,,=, Bmlc) were used as the population values (a, /?) in the simulation.

One hundred simulated replications of the estimation procedures were produced for each of the 240 settings in the factorial design. Each replication included a simulated MMPS national estimate and simulated outcomes for each patient at the hypothetical hospital (details are in Appendix H of Reference 16).


Table 111. Coverage probabilities for nominal 95 per cent intervals for each 6 value averaged across the simulation settings

Estimator Hospital quality 6 - 0.6 - 0.3 0.0 0.3 0.6

Bayeg 0.93 099 099 0.92 073 r - L,. 0.97 0.96 0.95 0.93 0.92

Simulation standard errors range from 0.002 to 0.006

TableIV. Coverage probabilities for nominal 95 per cent intervals when 6 are randomly generated

Estimator Standard deviation of the 6 distribution 0.15 0.25 0.50

Bayes_ 098 0.97 09 1 - k,. 0.95 095 0.95

Simulation standard errors are approximately 0005

5.2. Design of the simulation study with a large collection of hospitals

We performed additional simulations using the same design, but with 6 randomly selected from distributions chosen to represent the likely range determined in Section 4. Three 6 distributions representing low, medium, and high variation were generated from normal distributions with mean zero and standard deviations of 0.15,0.25 and 0-50. Recall that the prior distribution used for the Bayes-based procedure is (0 .30)~~.

The size of the sample at each hospital was also randomly selected (independent of 6) from 25, 50, 100 and 200 with probabilities 0.35,0-35,0.25 and 0.05 respectively, yielding an average hospital size of 61, in rough agreement with the national data. The high, medium, and low patient severity levels at each hospital were also selected randomly (independent of 6 and hospital size) with probabilities 0.25,0.50 and 0.25 respectively.

Because the performance of the estimators did not appear to vary with the different medical conditions except through severity (marginal death rates), only myocardial infarction was simulated. Two thousand hospitals were simulated, indicative of the large number of hospitals available for extreme value screening.

5.3. Overview of simulation results

By construction, the empirical Bayes procedure works well for hospitals with underlying mortality rates near the national average, and produces a large estimated difference only when data from a hospital strongly support a large difference. Thus, confidence interval coverage for the empirical Bayes procedure should be high with small values of (61, and degrade for large values of 161. Table111 gives the coverage properties of the nominal 95 per cent intervals of the two procedures estimated from the simulation replications with fixed hospital types. The results are averaged across all simulation settings except the level of 6 because the coverage probabilities varied little across the other simulation factors.

Table IV shows that both estimation procedures produce good confidence intervals when the coverage probabilities are taken over a large population of hospitals. For specific hospital types,


TableV. Ratio of MSE of r - p,.,,,,. and MSE of the Bayes posterior mean

~~ ~~

Values of 6 Hospital size 25 50 100

- 0.6 - 0.3

0.0 0.3 0.6

1.84 1.19 1.02 4.5 1 2.86 2.19

10.32 5.82 3.85 6.07 2.96 2.09 2.10 1.24 1 .00

the empirical Bayes shrinkage procedure tends to overcover hospitals with small 6 and under- cover hospitals with large 6. The confidence interval coverage for the current MMPS procedure is satisfactory and varies little across hospital type. The Bayesian intervals are much shorter, with the average length of the Bayesian intervals given in Table IV being 0.146,0*148 and 0.155 (from left to right), while the corresponding value for the current MMPS procedure is 0.223 for each simulation setting in Table IV (the equality across simulation settings is a consequence of the simulation design).

The empirical Bayes estimator is much more accurate and stable. In the study with 6 generated randomly to simulate a large collection of hospitals, the ratios of the mean square error (MSE) for - FN,,,. and the empirical Bayes estimator is 3.96, 3.18 and 1.68 when the standard deviations of

the distribution used to generate 6 are equal to 0.15,0.25 and 0.50 respectively. The Bayes estimator dominates Y - pN,,,. for all hospital types except those with the most extreme 6, where it performs at least as well. Table V shows the MSE for fixed hospital types averaged across the disease and severity level factors, which yield little variation.

The two procedures produce similar rankings of the extreme hospitals, and both have difficulty detecting extreme hospitals under likely scenarios. The empirical Bayes intervals, however, have the correct coverage properties when applied to samples of hospitals identified as extreme by their estimated death rates. This is because the Bayes intervals explicitly account, with the prior distribution, for the numerous estimates. The current MMPS estimation procedure, in contrast, produces misleading significance levels and confidence intervals for these hospitals because the procedure fails to account for the large - multiple comparison problem.

To demonstrate these claims, r - f",,,. was used to select the 50 worst hospitals out of the 2000 hospitals simulated with the standard deviation of the distribution of 6 equal to 0.25 (consistent with gur current knowledge of this value). Figure 1 displays histograms of PH - FN selected using r - pNm,., and the 50 most extreme values 0f_FH - PN among the 2000 simulated hospitals. The 50 worst hospitals selected using Z = (r - pNm,*)/SE, have a very similar distribution.

The nominal 95 per cent confidence intervals computed using SE, for the 50 hospitals selected using r - pN,,,. cover 24/50 = 48 per cent of the generating population values, while the Bayes 95 per cent interval covers 48/50 = 96 per cent of these hospitals. The poor coverage properties of the MMPS procedure for selected samples occur despite its good coverage properties for unselected samples. In practice, most attention is focused on apparently extreme hospitals and medical conditions, so coverage properties of the statistical procedures are most important in this situation.

The worst 50 hospitals were also selected based on the Bayes posterior means and the Bayes-based ranking method of Laird and Louis.24 The performance of these methods was also very similar to the selection based on r - pN,,,., shown in Figure 1.

-

900

20 -

15 -

10 -

N. THOMAS, N. LONGFORD AND J. ROLPH

5 l 0

-0.05 0.00 0.05 0.10 0.15 0.20

20 -

15

10 -

5 -

0 , 1

-0.05 0.00 0.05 0.10 0.15 0.20

- FN -

Figure 1. The upper plot is a histogram of the 50 val_ues of_PH - PN selected using - pN,,.. The lower plot is the histogram of the 50 most extreme values of PH - PN from the simulation sample of 2000 hospitals

6. DISCUSSION

The analysis of the national data in Section 4.1 together with the results of Park et al.’ and Jencks et aL5 provide very strong evidence that hospitals vary in their underlying death rates after accounting for sampling variation. Our logit-normal model using the PPS data confirms the earlier findings from the MMPS datas that individual patient outcomes vary with patient severity, and is consistent with other analyses of the PPS Although the PPS design is superior to the MMPS design for estimating the severity-adjusted between-hospital variation in hospital death rates, o6 is still poorly estimated. Our data suggest different values of od for the four medical conditions and the two time periods. If we assume different 06, analysis of the PPS data only limits the values of od to between 0 and 0.6. Indeed, only the estimates of o d for congestive heart failure differ significantly from zero (p < 0.05).


6.1. The need for an improved national sample Improved estimates of 0 8 would serve three important policy goals.

First, if 6 6 is very close to zero, that is differences in the adjusted mortality rates are small, then there is not a demonstrated need for any mortality reporting system. The evidence about even this basic question is not definitive, with the strongest evidence being substantial quality differences after severity adjustment across different types of

Second, even if cd is moderate, suggesting possible quality differences, the expense involved in collecting severity information at each hospital may not be cost effective owing to the meagre information available from the relatively small samples at most hospitals. Thomas et a l l6 show that in a typical disease/severity setting with 50 patients in a medical condition at a hospital (which is more than the number of such patients at most hospitals), the reduction in the standard error obtained by collecting severity and outcome data at the hospital is only 12 per cent. Improvements in this dire situation may be obtained by pooling data from several years and across medical conditions. The Bayes/empirical-Bayes approach presented here can be extended to pooling information across several medical conditions.

The third goal achieved by an improved estimate of cd would be to provide a stronger empirical basis for specifying a prior or mixing distribution for the Bayes procedure which produces improved estimation of individual hospital rates.

6.2. Implications for individual hospital reporting HCFA currently reports Medicare death rates and standard errors for each hospital in 17 diagnostic categories from national data. HCFA also supplies the MMPS to assist hospitals calculate their own severity-adjusted death rates and corresponding standard errors. For all but the largest hospitals and most prevalent medical conditions, the information contained in the hospital-specific mortality data is relatively uninformative compared with national data sources which inform us about the likely range of hospital mortality differences. In short, the current practice of using a flat prior distribution (on the log-odds scale) is a poor choice whether the distribution is invoked explicitly or implicitly.

The simulation comparisons of the MMPS and the Bayes-based procedure, while for hypothetical hospital populations, reveal some interesting conclusions. As expected, the Bayes shrinkage estimator performs well for typical hospitals (small ISl), and relatively less well for more extreme ones (large IS[). This pattern holds for coverage probabilities and mean square error of estimation. The Bayes estimator produces biased estimates of hospital differences but has a lower variance than the MMPS estimator. By shrinking large estimates of hospital differences strongly towards zero, the increased stability of the Bayes estimator gives much better estimates for the hospitals that are not demonstrably extreme.

An important use of the estimators of hospital quality is for identifying exceptional hospitals. The Bayes procedure is more conservative than the MMPS estimator in judging that a hospital is extreme. Overall, the two procedures perform similarly and rather poorly, identifying the best and worst hospitals. The current MMPS estimation procedure gives incorrect and misleading significance levels and confidence intervals for hospitals selected on their estimated mortality rates and it exaggerates the evidence about the extreme nature of these hospitals. The Bayes-based procedure, in contrast, gives correct significance levels and confidence intervals for the hospitals identified by their extreme estimates.

ACKNOWLEDGEMENTS

We acknowledge David Draper’s contribution to writing the proposal for funding this work and his assistance with some data processing tasks. Ellen Harrison and Daniel Relles made


substantial contributions to the data management necessary to carry out this research. Charles Lewis and Donald Rubin provided helpful advice about the numerical evaluation of the empirical Bayes estimates. Emmett Keeler’s excellent review of an earlier draft and the comments of two anonymous reviewers led to a much improved presentation. This work was supported by a cooperative agreemet between the Health Care Financing Administration of the U.S. Depart- ment of Health and Human Services and the RAND Corporation.

REFERENCES

1. Kahn, K., Brook, R., Draper, D., Keeler, E., Rubenstein, L., Rogers, W. and Kosecoff, J. ‘Interpreting hospital mortality data: how can we proceed?’, Journal of the American Medical Association, 260(24),

2. Dubois, R., Brook, R. and Rogers, W. ‘Adjusted hospital death rates: a potential screen for quality of medical care’, American Journal of Public Health, 77(9), 1162-1 166 (1987).

3. Green, J., Wintfeld, N., Sharkey, P. and Passman, L. ‘The importance of severity of illness in assessing hospital mortality’, Journal of the American Medical Association, 263(2), 241-246 (1990).

4. Daley, J., Jencks, S., Draper, D., Lenhart, G., Thomas, N. and Walker, J. ‘Predicting hospital-associated mortality for Medicare patients’, Journal ofthe American Medical Association, 260(24), 3617-3624 (1988).

5. Jencks, S., Daley, J., Draper, D., Thomas, N., Lenhart, G. and Walker, J. ‘Interpreting hospital mortality data: the role of clinical risk adjustment’, Journal of the American Medical Association, 260(24),

6. Health Care Financing Administration. Medicare Hospital Mortality Information, 1988, Government Printing Office, Washington D.C., 1989.

7. Park, E., Brook, R., Kosecoff, J., Keesey, J., Rubenstein, L., Keeler, E., Kahn, K., Rogers, W. and Chassin, M. ‘Explaining variations in hospital death rates: randomness, severity of illness, quality of care’, Journal of the American Medical Association, 264(4), 484-490 (1990).

8. Kahn, K., Rubenstein, L., Draper, D., Kosecoff, J., Rogers, W., Keeler, E. and Brook, R. ‘The effects of the DRG-based Prospective Payment System on quality of care for hospitalized Medicare patients’, Journal ofthe American Medical Association, 264(15), 1953-1955 (1990).

9. Kahn, K., Keeler, E., Sherwood, M., Rogers, W., Draper, D., Bentow, S., Reinisch, E., Rubenstein, L., Kosecoff, J. and Brook, R. ‘Comparing outcomes of care before and after implementation of the DRG-based Prospective Payment System’, Journal of the American Medical Association, 264( 15),

10. Kahn, K., Rogers, W., Rubenstein, L., Sherwood, M., Reinisch, E., Keeler, E., Draper, D., Kosecoff, J. and Brook, R. ‘Measuring quality of care with explicit process criteria before and after implementation of the DRG-based Prospective Payment System’, Journal ofthe American Medical Association, 264( 15),

11. Kosecoff, J., Kahn, K., Rogers, W., Reinisch, E., Sherwood, M., Rubenstein, L., Draper, D., Roth, C., Chew, C. and Brook, R. ‘Prospective Payment System and impairment at discharge’, Journal of the American Medical Association, 264( 15), 1980-1983 (1990).

12. Rubenstein, L., Kahn, K., Reinisch, E., Sherwood, M., Rogers, W., Kamberg, C., Draper, D. and Brook, R. ‘Changes in quality of care for five diseases measured by implicit review, 1981 to 1986’, Journal ofthe American Medical Association, 264(15), 1974-1979 (1990).

13. Keeler, E., Kahn, K., Draper, D., Sherwood, M., Rubenstein, L., Reinisch, E., Kosecoff, J. and Brook, R. ‘Changes in sickness at admission following the introduction of the Prospective Payment System’, Journal of the American Medical Association, 264( 15), 1962-1968 (1990).

14. Draper, D., Kahn, K., Reinisch, E., Sherwood, M., Carney, M., Kosecoff, J., Keeler, E., Rogers, W., Savitt, H., Allen, H., Wells, K., Reboussin, D. and Brook, R. ‘Studying the effects of the DRG-based Prospective Payment System on quality of care: design, sampling, and fieldwork’, Journal of the American Medical Association, 264(15), 1956-1961 (1990).

15. Lemeshow, S., Teres, D., Avrunin, J. and Pastides, H. ‘Predicting the outcome of intensive care unit patients’, Journal of the American Statistical Association, 83, 348-356 (1988).

16. Thomas, N., Longford, N. and Rolph, J. A Statistical Framework for Severity Adjustment of Hospital Mortality, the RAND Corporation, Santa Monica, CA, N-3501-HCFA, 1992.

17. Longford, N. ‘Logistic regression with random coefficients’, to appear in Computational Statistics and Data Analysis (1993).

3625-3628 (1988).

3611-3616 (1988).

1984-1988 (1990).

1969-1973 (1990).


18. Morris, C. ‘Parametric empirical Bayes inference: theory and applications’, Journal of the American

19. Gelfand, A. E. and Smith, A. F. M. ‘Sampling-based approaches to calculating marginal densities’,

20. Wong, W. and Bing, L. ‘Laplace expansion for posterior densities of nonlinear functions of parameters’,

21. Hosmer, D. and Lemeshow, S. Applied Logistic Regression, Wiley, New York, 1989. 22. Hartz, A,, Krakauer, H., Kuhn, E., Young, M., Jacobsen, S., Gay, G., Muenz, L., Katzoff, R., Bailey, C.

and Rimm, A. ‘Hospital characteristics and mortality rates’, New England Journal of Medicine, 321 (23,

Statistical Association, 78, 47-65 (1983).

Journal of the American Statistical Association, 85, 398-409 (1990).

Biometrika, 79, 393-398 (1992).

1720-1725 (1989). 23. Box, G. and Tiao, G. Bayesian Inference in Statistical Analysis, Addison-Wesley, New York, 1973. 24. Laird, N. and Louis, T. ‘Empirical Bayes ranking methods’, Journal of Educational Statistics, 14, 29-46

(1989). 25. Rogers, W., Draper, D., Kahn, K., Keeler, E., Rubenstein, L., Kosecoff, J. and Brook, R. ‘Quality of care

before and after implementation of the DRG-based Prospective Payment System’, Journal of the American Medical Association, 264(15), 1989-1994 (1990).

26. Keeler, E., Rubenstein, L., Kahn, K., Draper, D., Harrison, E., McGinty, M., Rogers, W. and Brook, R. ‘Hospital characteristics and quality of care’, Journal of the American Medical Association, 268( 13). 1709- 17 14 (1 992).

empirical bayes methods for estimating hospital-specific mortality rates

Documents