predicting the outcome of intensive care unit patients · 2017-03-31 · onary care, cardiac...

10
Predicting the Outcome of Intensive Care Unit Patients Author(s): Stanley Lemeshow, Daniel Teres, Jill Spitz Avrunin and Harris Pastides Source: Journal of the American Statistical Association, Vol. 83, No. 402 (Jun., 1988), pp. 348- 356 Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association Stable URL: http://www.jstor.org/stable/2288849 . Accessed: 18/11/2014 22:09 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PM All use subject to JSTOR Terms and Conditions

Upload: others

Post on 05-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

Predicting the Outcome of Intensive Care Unit PatientsAuthor(s): Stanley Lemeshow, Daniel Teres, Jill Spitz Avrunin and Harris PastidesSource: Journal of the American Statistical Association, Vol. 83, No. 402 (Jun., 1988), pp. 348-356Published by: Taylor & Francis, Ltd. on behalf of the American Statistical AssociationStable URL: http://www.jstor.org/stable/2288849 .

Accessed: 18/11/2014 22:09

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserveand extend access to Journal of the American Statistical Association.

http://www.jstor.org

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 2: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

Predicting the Outcome of Intensive Care Unit Patients

STANLEY LEMESHOW, DANIEL TERES, JILL SPITZ AVRUNIN, and HARRIS PASTIDES*

Statisticians are being asked with increasing frequency to develop models for occurrences in medical environments. Until recently, only subjective models were available to predict mortality for patients in an intensive care unit (ICU). These models were based on variables and associated weights determined by panels of medical "experts." This article shows how multiple logistic regression (MLR) can be used to develop an objective model for prediction of hospital mortality among ICU patients. An MLR model to be applied when a patient is admitted to the ICU was developed on 737 ICU patients. The final model is based on the following variables: presence of coma or deep stupor at admission, emergency admission, cancer part of present problem, probable infection, cardiopulmonary resuscitation (CPR) prior to admission, age, and systolic blood pressure at admission. To validate this model, a new cohort of 1,997 consecutive ICU patients was entered into the study. Information was collected for the variables in the MLR model and, in addition, the variables necessary to evaluate the "subjective" models. The admission mortality prediction model [MPMO(cpR)] was validated on this new cohort of patients using goodness-of-fit tests. It was found that this model had excellent fit (p = .74). In addition, the overall correct classification for this model in the validation data set was 86.1%. The predictive values for dying and surviving, sensitivity, and specificity were 71.3%, 88.5%, 50.2%, and 95.0%, respectively. The direct comparison of MPMo(CPR) and the subjective systems based on the new cohort demonstrated that although all methods considered demonstrated comparable sensitivity, specificity, predictive values, and total correct classification rates, goodness-of-fit tests suggest that the probabilities of hospital mortality as produced by the statistical model best fit the observed mortality experience. One of the commonly used subjective models tended to overestimate the probabilities of hospital mortality, and the other tended to underestimate these probabilities. To be useful, a severity index should be based on simple calculations using readily available data, and be independent of medical treatment in the ICU. Given the widespread availability of microcomputers, the newly developed statistical model seems to satisfy these criteria.

KEY WORDS: Multiple logistic regression; Maximum likelihood estimate; Mortality prediction model; Goodness-of-fit test; Sensitivity; Specificity; Relative risk.

1. INTRODUCTION

Classification and aggregation of patients according to differential care needs have been accepted as normal fea- tures of contemporary hospital practice. The intensive care unit (ICU) is a site where these classifications may be useful.

Patients admitted to an ICU are either extremely ill or considered to be at great risk of serious complications requiring the special technology and highly skilled care available in an ICU. The criteria for ICU admission vary widely, however, depending on patient mix and hospital type. In some hospitals without house staff training pro- grams and with limited intermediate or telemetry options, many patients are admitted for (a) observation (especially cardiac, neurological, or neurosurgical patients), (b) non- invasive monitoring, or (c) either more intensive or spe- cific nursing services not available in general units. In other hospitals, such as tertiary care centers that have greater general ward capability, intermediate units, and a well- trained critical-care house and attending staff, more dis- criminating decisions can be made about admission.

There are no widely accepted criteria for distinguishing between patients who should be admitted to an ICU and

* Stanley Lemeshow is Professor of Biostatistics, Jill Spitz Avrunin is Research Associate, and Harris Pastides is Associate Professor of Epi- demiology, Division of Public Health, School of Health Sciences, Uni- versity of Massachusetts, Amherst, MA 01003. Daniel Teres is Director of Critical Care Services, Baystate Medical Center, Springfield, MA 01199. This work was supported by National Center for Health Services Research Grant HS 04833. Portions of this article appeared in Critical Care Medicine and are reproduced here by permission of the publisher.

those for whom admission to other hospital units would be appropriate. Thus among different ICU's there are wide ranges in a patient's chances of survival. When studies are done comparing different treatment modalities or the ef- fectiveness of ICU care, therefore, it is critical to have a reliable means of assessing the comparability of the dif- ferent patient populations.

Several classification systems have been devised for ICU patients. They can be divided into two main groups, de- pending on the strategy used for their derivation. In one group a panel of experts identifies and assigns weights to clinical variables believed to be associated with outcomes of interest. This method has been described as a multiple attribute utility model (see Gustafson et al. 1983), or a scoring system.

The second strategy uses statistical modeling to relate empirical data for many patient variables to outcomes of interest. This is sometimes referred to as an actuarial ap- proach (see Gustafson et al. 1983). Typically, the first step in the statistical-modeling process is data reduction; from all available variables, only those most associated with outcome are selected for inclusion in the final model. For binary-outcome variables (e.g., mortality), multiple logis- tic regression analysis is appropriate for describing the relationship of the possible predictors to the outcome. With this technique, the maximum likelihood method is used to determine objectively the statistical weights to be assigned to each variable.

? 1988 American Statistical Association Journai of the American Statistical Association

June 1988, Vol. 83, No. 402, Applications and Case Studies

348

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 3: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

Lemeshow, Teres, Avrunin, and Pastides: ICU Predictive Models 349

The research presented here represents the first study explicitly designed to compare a scoring-system technique with a statistical-modeling technique for critically ill pa- tients. In this article, we present results of the following:

1. Generation and validation of a statistical model for predicting hospital mortality of ICU patients, based on information available at the time of ICU admission. This information is independent of the treatment the patient receives in the ICU.

2. Comparison of the predictive efficacy of this model, known as the mortality prediction model (MPM), to two currently available and widely used scoring systems using a common cohort of ICU patients. The scoring systems are the acute physiology score (APS) from the acute phys- iology and chronic health evaluation (APACHE) system introduced by Knaus, Zimmerman, Wagner, and Draper (1981) and the simplified acute physiology score (SAPS) derived by Le Gall et al. (1984).

2. GENERATING AND VALIDATING THE MORTALITY PREDICTION MODEL

2.1 Methods

Data were initially collected on 755 consecutive patients admitted to the adult general medical/surgical ICU at Baystate Medical Center in Springfield, Massachusetts, between February 1 and August 15, 1983. Baystate Med- ical Center, the second-largest treatment facility in Mas- sachusetts, is a 700-bed hospital with 33 general medical/ surgical ICU beds serving a population of 500,000. Cor- onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years of age. These patients have special problems that make it inappropriate to combine their data with data collected on the typical patient treated in a general medical/surgical ICU. For example, although coronary bypass surgery pa- tients appear extremely unstable immediately following surgery, they tend to improve rapidly, and the overwhelm- ing majority survive to hospital discharge. Inclusion of such patients would distort the profile of general medical/ surgical ICU patients and would make prediction of out- come more difficult. Specific models have been developed to assess severity of illness and/or to predict outcome within these subgroups (e.g., Feller, Tholen, and Cornell 1980; Kennedy, Kaiser, Fisher, Maynard, and Fritz 1980; Pier- pont, Kruse, Ewald, and Weir 1985; Pollack, Ruttimann, Getson, and members of the Multi-Institutional Study Group 1987; Pozen, D'Agostino, Selker, Sytkowski, and Hood 1984).

Data were collected by trained critical-care nurses work- ing exclusively on this project. Information was collected at five different times: (a) ICU admission, (b) 24 hours, (c) 48 hours, (d) ICU discharge, and (e) hospital discharge. Data collected at admission included demographics; in- formation on prior ICU admissions; numerous condition variables, such as specific organ-system failures, measures of functional status, cancer-related variables, blood gases,

and renal, neurologic, and respiratory functions; and a few treatment variables, such as units of blood transfused prior to ICU admission and inspired oxygen fraction at time of admission (if applicable). In all, there were 137 background, condition, and treatment variables in the ad- mission form. At 24 hours, and then again at 48 hours, there were 75 condition and treatment variables. At ICU discharge these same 75 variables were collected if the patient had been in the ICU less than 48 hours. In addition, information on length of ICU stay, reason for discharge, and vital status at ICU discharge were routinely collected. At hospital discharge variables included vital status, length of hospital stay, and number of ICU admissions. The data- collection forms used in this study contained 377 possible variables. A patient could have anywhere from 227 to 302 of these variables measured, depending on length of time spent in the ICU.

Though all variables were reviewed for accuracy and reliability, subsequent quality-control activities focused on the variables requiring the most judgment on the part of the data collectors. These included determination of the primary organ system in failure, primary precipitating fac- tor for ICU admission, and shock category. All data col- lectors were asked to abstract records that had previously been abstracted for the study by a primary data collector. This interrater reliability was judged to be high; differ- ences observed were not statistically significant as mea- sured by the kappa statistic (Fleiss 1981), which ranged from .7 to 1.0, depending on the variable. To assess in- trarater reliability, data collectors were also asked to ab- stract medical records of a sample of patients they had reviewed more than three months before. Again, relia- bility was high, and differences were not statistically sig- nificant for each data collector (kappa ranged from .85 to 1.0, depending on the variable). Reliability testing was done regularly, and discrepancies were discussed in meet- ings with the doctors and data collectors. These meetings helped identify several variables that required subjective judgment on the part of data collectors and were ultimately excluded from the model-building process.

Tests of association of each of the study variables to vital status at hospital discharge were carried out, using the Student-t test and Wilcoxon rank-sum test with con- tinuous variables and the chi-squared test of independence with categorical variables.

For the model presented here, only demographic vari- ables and condition variables available at the time of ICU admission were used in the modeling process. Two differ- ent multivariate techniques were performed to reduce this large set of characteristics to a smaller group that, when considered together, had the highest possible degree of accuracy in predicting hospital survival and mortality. First, a stepwise linear discriminant function analysis with for- ward stepping was used to differentiate broadly between those who survive to hospital discharge and those who die in the hospital. The statistical software program BMDP7M (Dixon 1983) was used for this purpose. This substan- tially reduced the number of significant variables. Next,

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 4: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

350 Journal of the American Statistical Association, June 1988

a multiple logistic regression model was developed using BMDPLR (Dixon 1983). This model has the general form

Pr(Y = 1 Xl, X2, ... ,Xk)

exp(fl0 + flAX1 + f2X2 + + flkXk) 1 + exp(flo + flAX1 + fl2X2 + *- + flkXk)

where Y = 1 if the patient dies in the hospital and Y = 0, otherwise. Use of a forward stepwise approach allows each variable to be considered sequentially in relation to other potentially significant variables. Only significant variables were kept in the model. Final estimates of the weights were based on maximum likelihood solutions. Es- timated relative risks and 95% confidence intervals were calculated from the coefficients (Lemeshow and Hosmer 1984).

A classification table comparing predicted outcome to actual vital status at hospital discharge was generated using .50 as the predictive cut point; that is, all patients with mortality probabilities c50% were predicted to live, and those with probabilities >50% were predicted to die. This level was chosen to be consistent with comparable research in the medical literature and was acceptable for these data, since .50 was within 2 percentage points of "optimal" (i.e., the highest possible total correct classification). Sensitivity (correct classification of patients who died), specificity (correct classification of patients who lived), predictive value of positive (proportion of patients predicted to die who died), predictive value of negative (proportion of pa- tients predicted to live who lived), and total correct clas- sification were assessed.

2.2 Results

Of the 755 patients in the study, 12 had medical records that could not be retrieved. Another six were excluded because of missing data. This left 737 patients for devel- oping the multiple logistic regression (MLR) model. Ta- bles 1 and 2 present demographic variables and ICU ad- mission condition variables that were significantly related to hospital mortality.

Stepwise linear discriminant function and MLR analyses reduced the 26 possible admission condition and demo- graphic variables to 7: (a) presence of coma or deep stupor at admission, (b) emergency admission, (c) cancer part of present problem, (d) probable infection, (e) number of

Table 1. Results at ICU Admission (cohort 1)-Continuous Variables

Vital status at Variable hospital discharge Mean No. (%/6)

Age Alive 56.9 592 (79.8) Dead 68.6* 150 (20.2)

Systolic blood Alive 139.2 593 (79.9) pressure Dead 118.1* 149 (20.1)

Heart rate Alive 95.7 574 (80.6) Dead 106.2* 138 (19.4)

Number of organ Alive 1.4 593 (79.8) systems in failure Dead 2.3* 150 (20.2)

* .001 .

Table 2. Results at ICU Admission (cohort 1)-Discrete Variables

Hospital Variable No. of mortality

Variable coding patients no. (%/6)

Service at admission Medical 297 90 (30.3) Surgical 446 60 (13.5)a

Infection at admission No 454 54.(11.9) Probable 289 96 (33.2)a

CPR prior to admission No 599 119 (17.0) Yes 44 31 (70.5)S

Type of admission Elective 226 11 (4.9) Emergency 517 139 (26.9)a

P02 (torr) >60 683 128 (18.7) -60 60 22 (36.7)b

Bicarbonate (mEq/L) -18 705 136 (19.3) <18 38 14(36.8)c

Creatinine (mg/dl) -2.0 709 136 (19.2) >2.0 34 14 (41.2)b

Level of consciousness Neither coma nor deep stupor 690 106 (15.4)

Coma or deep stupor 49 40 (81.6)a

a .001 . b .01. c.05.

organ systems in failure at admission, (f) age, and (g) systolic blood pressure at admission (see Lemeshow, Teres, Pastides, Avrunin, and Steingrub 1985).

In the second phase of data collection, 1,997 consecutive new ICU patients were entered into the study. Information was collected for the same variables as in the initial data collection (admission, 24 hours, 48 hours, ICU discharge, and hospital discharge) plus APACHE and SAPS variables at 24 hours and 48 hours.

The admission model was validated on this new cohort of patients (Teres, Lemeshow, Avrunin, and Pastides 1987) using goodness-of-fit tests (Hosmer and Lemeshow 1980; Lemeshow and Hosmer 1982). This model had excellent fit (p = .74). The overall correct classification (i.e., cor- rectly predicting mortality and survival) for the admission MPM in the validation data set was 86.1 %. The predictive values for dying and surviving, sensitivity, and specificity were 71.3%, 88.5%, 50.2%, and 95.0%, respectively.

Preliminary testing of this model in the Baystate Medical Center ICU, using ICU nursing staff instead of our trained data collectors, suggested that the variable "number of organ systems in failure" was relatively subjective and difficult to collect. In addition, this single variable is really eight variables in one, making our model a 14-variable model rather than a 7-variable model. To simplify the data- collection process we used the original cohort to generate an admission model with no organ-system-failure vari- ables. All possible demographic and admission condition variables, other than those pertaining to organ-system fail- ures, were allowed to enter discriminant function and MLR analyses. The result was a seven-variable model having the same variables as the original model, with the excep- tion of "cardiopulmonary resuscitation (CPR) prior to ICU admission" entering the model instead of "number of or- gan systems in failure."

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 5: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

Lemeshow, Teres, Avrunin, and Pastides: ICU Predictive Models 351

Table 3. Admission MPM Containing "CPR Prior to ICU Admission"

95% Confidence intervals

Variable A SE(/) RR Lower Upper

Level of consciousness 2.44 .485 11.47 4.43 29.68 0: No coma/deep stupor 1: Coma or deep stupor

Type of admission 1.81 .403 6.11 2.77 13.46 0: Elective 1: Emergency

Cancer part of present problem 1.49 .431 4.44 1.91 10.33 0: No 1: Yes

CPR prior to ICU admission .974 .464 2.65 1.07 6.58 0: No 1: Yes

Infection .965 .239 2.62 1.64 4.19 0: No or not probable 1: Probable

Age .0368 .0071 10-year odds ratio 1.44 1.26 1.66

Systolic blood pressure (SBP) -.0606 .0191 SBP squared .000175 6.73E-05

Constant -1.370

For ease of reference, we refer to this model as MPMO(CPR), where 0 designates time of admission. For each variable Table 3 gives (a) the estimated logistic coefficients (/B), (b) estimated standard errors (SE), (c) estimated relative risks (RR), and (d) 95% confidence intervals (CI) for the relative risks. The RR estimates the likelihood that a pa- tient with a given factor will die in the hospital relative to a patient without that factor, controlling simultaneously for all of the other variables in the model. For example, a patient admitted to the ICU in a coma or deep stupor was 11 times as likely to die in the hospital as a patient without coma or deep stupor, after controlling for the other six variables in MPMO(CPR). For continuous variables such as age, RR estimates the risk of dying relative to an incremental change in that variable, such as a rise of 10 years in age.

The MPMO(CPR) performed comparably to the original admission model when validated in the second group of ICU patients (see Teres, Lemeshow, Avrunin, and Pas- tides 1987). The p value for the goodness-of-fit test was .90; classification rates were 84.8%, 68.8%, 87.0%, 42.4%, and 95.2% for overall correct classification and predictive value for dying and surviving, sensitivity, and specificity, respectively.

3. COMPARISON OF PREDICTIVE EFFICACY OF MPM, APS, AND SAPS

3.1 Description of APS and SAPS

The original APACHE system developed by Knaus et al. (1981) contained 33 variables reflecting the extent of a patient's physiologic or chemical derangement within the first 24 hours in the ICU. A panel of experts identified the variables to be measured and assigned a score ranging from 0 to 4 to quantify the given abnormality. The APS

is determined by summing the assigned points over the 33 variables. Clearly, patients with high scores have worse prognoses than patients with low scores. The APS and a chronic health component constitute the APACHE method. The APS has been validated in multihospital set- tings (Knaus et al. 1982a).

Le Gall and his colleagues, concerned about the num- ber of variables necessary for calculating APS, developed SAPS. This system is based on 13 of the 33 original vari- ables (plus age) and uses essentially the same weights or point allocations for those variables selected as the orig- inal.

3.2 Methods for Comparing the Prediction Models

Between August 16, 1983, and January 10, 1985, we studied a second cohort of 2,028 consecutive patients en- tering the adult general medical/surgical ICU at Baystate Medical Center. Of these 31 had to be eliminated from the study because of missing records or missing values on key variables, leaving 1,997 patients for comparing the predictive models. For each patient, sufficient data were collected to permit computation of the APS, SAPS, and MPMO(CPR) probabilities. (This cohort of 1,997 patients also served as the validation data set for the MPMO(CPR); see Sec. 2.2.)

To compare the methods on a common scale of mea- surement it was necessary to convert the scores obtained from the APS and SAPS to probabilities of hospital mor- tality. Converting the APS to a probability was accom- plished with an MLR equation of the ICU research team at George Washington University (Knaus and Wagner, personal communication, May 1986). Unlike similar models in the literature (Knaus et al. 1982a,b), this equation was

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 6: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

352 Journal of the American Statistical Association, June 1988

Table 4. Converting APS to Probability of Hospital Mortality

Constant -3.5605 APS33 .1184 Age over 40 (patients under 40 = 0) .0304 Emergency surgery .8034 Sepsis .0383 Cardiac arrest .6952

If nonoperative patient Allergy -2.1748 Respiratory cancer 1.0559 Hypertension -1.3571 Congestive heart failure -.3471 Multiple trauma -1.005 Cranial hemorrhage 1.207 Overdose -2.9009 Diabetes - 2.3623 Gastrointestinal bleed .0782 Respiratory infection 0

If none of the above, select primary organ-system failure Renal, metabolic or hematologic -.8884 Respiratory -.1146 Neurologic -.1696 Cardiovascular .4519 Gastrointestinal .425

If postoperative patient Peripheral vascular surgery -1.5632 Surgery for multiple trauma -1.7032 Valve surgery -1.6554 Renal surgery -.9986 Gastrointestinal perforation -.2727 Other neurological surgery (e.g., laminectomy,

craniotomy) - .7452 Other cardiovascular surgery -.9929 Other respiratory surgery -.7413 Other gastrointestinal surgery -.6364 Other renal, metabolic or hematologic surgery - .3517

appropriate for all general medical/surgical ICU patients and did not require the incorporation of chronic health status. Table 4 gives the details of this model. Converting the SAPS to probabilities of hospital mortality was accom- plished with a procedure suggested by Le Gall, whereby patients are stratified into three groups: (a) medical, (b) emergency surgical, or (c) elective surgical. Observed death rates were provided for specific SAPS intervals within these strata (Le Gall, personal communication, May 1986). This process was used because no statistical models were avail- able to transform these scores into probabilities. Table 5 gives these empirical death rates.

Statistical analysis included descriptive as well as infer- ential methods. Distributions of probabilities of mortality over the cohorts were described, with particular attention to differences in probabilities for patients who actually lived to hospital discharge as compared with those who died. The hypothesis that the three methods produce com- parable probabilities of hospital mortality was tested with Friedman's nonparametric analysis of variance. The prob- abilities produced by the various methods were ranked from smallest to largest for each patient. Averaging these ranks over all patients provided the basis for the statistical test. This analysis was done separately for patients who lived and patients who died.

The fit of the probabilities of hospital mortality to the

Table 5. Converting SAPS to Probability of Hospital Mortality

Surgical

Medical Unscheduled Scheduled SAPS Pr(Mortality) Pr(Mortality) Pr(Mortality)

0-4 1.8% 6.8% .0% 5-9 7.9% 8.3% .9%

10-14 14.5% 16.8% 3.1% 15-19 34.9% 38.1% 13.3% 20-24 50.3% 61.0% 13.0% 25-29 76.1% 88.9% 66.7% 30+ 82.4% 77.8%

observed mortality experience in the cohort was assessed through the use of the Hosmer-Lemeshow goodness-of- fit tests. These tests compute expected numbers of deaths by grouping the patients into strata defined by the esti- mated probability of mortality. By summing the proba- bilities of mortality for all patients in a particular stratum, an expected number of deaths is determined. By compar- ing observed and expected numbers of patients who sur- vive and who die in each stratum, an assessment is made of the fit of the predictive model. Clearly, a model is effective if the actual number of deaths is close to the predicted number of deaths in each stratum and is inad- equate if there is little correspondence.

Classification rates of the methods were assessed with a probability of .50 as the cut point for predicting hospital mortality. Log-linear analysis was used to compare clas- sification rates of the three methods simultaneously, and McNemar's test was used to compare in greater detail the classification rates produced by two of the methods.

Since comparison of these methods was clearly tied to the ability of the data collectors to abstract the relevant information from patient charts consistently and accu- rately, the same careful attention was given to quality control as in the first phase of data collection.

3.3 Results of the Comparison

In Figure 1 the proportion of patients with probabilities of hospital mortality less than or equal to particular cut points (i.e., .05, .10, .15, etc.) is plotted for MPMO(CPR), APS, and SAPS. It is clear from these cumulative distri- butions that the highest probabilities of mortality are pro- duced with APS, and the lowest probabilities are produced with SAPS. The MPMO(CPR) probabilities were typically between these other methods.

Tables 6 and 7 present this finding in a somewhat dif- ferent way. In Table 6, for both the 1,601 patients who lived and the 396 patients who died the mean probabilities of mortality were highest with APS and lowest with SAPS. The difference in mean probabilities between the highest and lowest method for the nonsurvivors was sizable, with APS yielding probabilities that averaged 15.8 percentage points higher than SAPS.

Table 7 gives the results of Friedman's two-way analysis of variance. Separate analyses were performed for patients who lived and patients who died. Results suggest that the methods differ significantly with respect to mean ranks,

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 7: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

Lemeshow, Teres, Avrunin, and Pastides: ICU Predictive Models 353

100

80

70~~~~~~

m 50

% 40

30 - -*- MPMO(CPR)

20 - . /.0- APS

10 1- SAPS

0.11111111111 ll ll l .00 .10 .20 .30 .40 .50 .60 .70 .80 .90 1.00

Probability of Hospital Mortality

Figure 1. Cumulative Distributions of Probability of Hospital Mortality for MPM, APS, and SAPS.

with APS having significantly higher probabilities than MPMO(CPR), and MPMO(CPR) having significantly higher probabilities than SAPS. This is true for both patients who lived and patients who died.

Not all of the models fit the validation data set equally well. Table 8 details the results of goodness-of-fit tests for MPMO(CPR), APS, and SAPS. It is clear from this table that for all methods, patients with the highest probabilities of dying tended to die in the hospital, and those with the lowest probabilities tended to survive. Comparison of ob- served numbers of patients who survived and died to the expected numbers using the MPMO(CPR) suggests an ex- tremely high level of fit, as measured by the Hosmer- Lemeshow goodness-of-fit test (p = .90). Of the 1,997 patients, 1,601 lived and 396 died. The model predicted 1,589 and 408, respectively. Comparisons within strata of observed-to-expected frequencies are, typically, very close. The same cannot be said for the other methods: For ex- ample, with the APS method 469 patients were expected to die and only 396 actually died. This complements the previous observation that the probabilities produced by APS on this cohort of patients were too high. Comparison of observed frequencies with expected frequencies within strata suggests again that APS typically underestimates the number of survivors and overestimates the number of pa- tients who will die. For SAPS, the opposite observation may be made. Table 8 shows that the number of survivors is generally overestimated, and the number who die is generally underestimated. Again, this is not surprising, because the probabilities of mortality with this method were lower than with the other methods. The p values

Table 6. Comparison of MPMO(cpR), APS, and SAPS

Patients who lived Patients who died

Model Mean SD n Mean SD n

MPMO(CPR) .139 .160 1,601 .470 .291 396 APS probability .152 .181 1,601 .571 .265 396 SAPS probability .114 .131 1,601 .413 .232 395*

NOTE: SD represents standard deviation. *The SAPS could not be computed for one patient who was missing information on service

at admission.

Table 7. Friedman's Analysis of Variance for MPMO(cpR,, APS, and SAPS

Mean ranks

Model Patients who liveda Patients who diedb

MPMO(CPR) 2.08 1.95 APS probability 2.20 2.43 SAPS probability 1.72 1.62

an = 1,601, p < .0001, and Z2(2 df) = 197.89. b n = 395, p < .0001, and y2(2 df) = 133.43.

associated with APS and SAPS suggest that these prob- abilities do not accurately portray the mortality experience in this cohort of patients.

Table 9 compares MPMO(CPR) to APS and SAPS with respect to sensitivity, specificity, predictive value for dying, predictive value for surviving, and total correct classifi- cation, using a .50 predictive cut point. Sensitivity ranged from 42.4% with MPMO(CPR) to 58.8% with APS, and spec- ificity ranged from 93.4% with APS to 97.0% with SAPS. Results of the log-linear analysis performed to compare the overall correct classification rates suggest no difference between MPM, APS, and SAPS. The MPMO(CPR) had somewhat lower overall correct classification than the other two methods. Although this may seem to be a con- tradiction of the goodness-of-fit results, it is a consequence of APS overestimating the probability of hospital mortality (yielding higher sensitivity than the other methods) and SAPS underestimating the probabilities (yielding higher specificity).

The method used to convert SAPS to probabilities re- sulted in discrete rather than continuous probabilities (see Table 5). We felt that the resulting probabilities were not comparable to those produced by the logistic APS or MPM's. For this reason we performed a more focused analysis, comparing the MPMO(CPR) to the APS probabili- ties, first for patients who survived and then for patients who died. As is seen in Table 10, for patients who lived there is strong agreement between methods with respect to predictions, using a probability of .50 as the predictive cut point. That is, of the 1,601 patients who lived to hos- pital discharge, the two methods predicted the same out- come for 1,472 (92%) of them. There was disagreement on only 129 patients (8%); of these, MPMO(CPR) correctly predicted that the patient would survive for 79 patients (61%), and APS predicted correctly for 50 patients (39%). This represents a significant difference (p < .05) in pre- dictive efficacy of the MPM as compared with the APS probabilities, for patients for whom there was a difference in prediction between the two methods. For patients who ultimately died, there was less agreement between the methods. Of the 396 patients who did not survive to hos- pital discharge, the two methods predicted the same out- come for only 243 (61%) of them. There was disagreement on 153 patients (39%); of these, MPMO(CPR) correctly pre- dicted the patient would die for 44 patients, and APS predicted correctly for 109 patients. This difference is highly significant (p < .001), as reflected in McNemar's test.

Despite these differences, each of the methods (MPM,

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 8: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

354 Journal of the American Statistical Association, June 1988

Table 8. Goodness-of-Fit Tests MPM, APS, and SAPS

MPMO(CPR) APS SAPS

Survive Die Survive Die Survive Die

Pr(Dying)a 0 E 0 E 0 E 0 E 0 E 0 E

.000-.099 929 925.1 38 41.9 913 890.3 15 37.7 1,023 1,005.2 25 42.8

.100-.199 313 312.8 52 52.2 285 264.0 24 45.0 363 377.6 80 65.4

.200-.299 144 145.8 50 48.2 145 136.5 37 45.5

.300-.399 92 90.2 47 48.8 94 91.5 46 48.6 167 180.7 113 99.3

.400-.499 47 49.0 41 39.0 59 54.8 41 45.2

.500-.599 30 26.0 26 30.0 40 37.4 43 45.6 34 65.1 97 65.9

.600-.699 21 20.1 35 35.9 27 22.7 38 42.3 5 5.0 8 8.0

.700-.799 13 12.0 35 36.0 22 19.3 54 56.7 7 13.6 50 43.4

.800-.899 9 5.5 26 29.5 8 7.9 46 46.2 2 3.5 22 20.5

.900-.999 3 2.3 46 46.7 8 3.3 52 56.8 Total 1,601 1,588.8 396 408.2 1,601 1,527.7 396 469.33 1,601 1,650.7 395b 345.3 Hg* 4.94 38.10 48.91 df 10 10 7 p value .90 <.001 <.001

NOTE: 0 represents the observed number of patients. E represents the expected number of patients based on estimated probability of hospital mortality. a Pr(Dying) is the estimated probability of hospital mortality and will vary for a given patient according to the procedure used. b The SAPS could not be computed for one patient who was missing information on service at admission.

APS, and SAPS) would be effective in analyzing the case mix in a given ICU. As is seen in Table 8, as the estimated probabilities of hospital mortality increased, the percent- age of patients who died increased as well, with relatively little uncertainty as to outcome for patients with extreme probabilities.

Analyses comparing the original MPM (containing the number of organ systems in failure) with APS and SAPS produced similar results (see Lemeshow, Teres, Avrunin, and Pastides 1987).

4. DISCUSSION Severity-of-illness classification systems have been sug-

gested for improving staging of disease for case-mix anal- ysis, comparison of ICU treatment and performance (qual- ity assessment), and possibly clinical decision making by providing an objective estimate of hospital mortality. Al- though numerous classification systems have been sug- gested (Bertram et al. 1982; Fetter, Shin, Freeman, Av- erill, and Thompson 1980; Gonnella, Hornbrook, and Louis 1984; Horn, Sharkey, and Bertram 1983; Jelinek, Haussmann, Hegyvary, and Newman 1974; Young 1984), very few are applicable to general medical/surgical ICU patients. These patients typically have complex conditions involving numerous organ systems and hemodynamic, neurological, and respiratory instability.

Table 9. Comparison of MPMO(cpR), APS, and SAPS Classification Rates

MPMo(CPR) APS SAPS

Sensitivity 42.4% 58.8% 44.8% Specificity 95.2% 93.4% 97.0% Predictive value for dying 68.8% 68.9% 87.7% Predictive value for surviving 87.0% 90.2% 78.7% Total correct classification 84.8% 86.6% 86.7%

NOTE: All classifications are based on 1,997 patients in the validation data set. MPMO(CPR) is the MPM admission model containing CPR prior to ICU admission. APS is based on the multiple logistic regression model provided by Knaus and associates. SAPS is based on observed death rates associated with SAPS scores as provided by Le Gall.

Models that reflect the seriousness of illness or predict mortality could be very useful for comparing the perform- ance of various ICU's, because they make it possible to control for, or stratify on, severity of illness. The per- formances could then be analyzed for determinants of dif- ferential performance. Although APACHE collected at 24 hours has been used successfully for this purpose (Knaus, Draper, Wagner, and Zimmerman 1986), it would seem that MPM, which provides an estimate of hospital mor- tality at the time of ICU admission and is independent of medical treatment in the ICU, should be seriously consid- ered for this quality-assessment approach.

Before an ICU severity system can be accepted on a widespread basis, it is important that it undergo careful scrutiny with respect to validation (Wasson, Sox, Neff, and Goldman 1985) and comparison with other available systems. The results of this study demonstrate that differ- ences between the three methods, with respect to standard criteria, such as sensitivity, specificity, and total correct classification, are not major.

Most of the analyses performed, including determina- tion of sensitivity and specificity, involved identifying a particular threshold for predicting outcome. We feel, how- ever, that the truest assessment of adequacy of a predictive model is through goodness-of-fit tests, because this tech-

Table 10. Comparison of Discrepancies in Classification Between APS and MPM

APS

Patients who lived to Patients who died hospital dischargea in the hospitalb

MPMo(CpR) Predict live Predict die Total Predict live Predict die Total

Predict live 1,446 79 1,525 119 109 228 Predict die 50 26 76 44 124 168

Total 1,496 105 1,601 163 233 396

an = 1,601, p = .014, and McNemar's chi square is 6.08. b n = 396, p < .001, and McNemar's chi square is 26.77.

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 9: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

Lemeshow, Teres, Avrunin, and Pastides: ICU Predictive Models 355

nique compares expected frequencies with observed fre- quencies. Results of the goodness-of-fit tests suggest that, in this population of 1,997 patients in a single institution, APS overestimated and SAPS underestimated the prob- ability of hospital mortality. The MPMO(CPR) most closely matched the observed outcomes. Note, however, that this comparison is somewhat biased, since the development of MPM and comparison of the methods took place in the same institution.

The results of this study do not suggest that any one method will always perform better than the others if tested in a wide range of intensive-care settings. Each of the available methods has particular characteristics that may influence the choice of one method over another, de- pending on its intended application.

Potential users of predictive models should be cautioned that it may be inappropriate to use these methods in sit- uations where they have not been carefully tested. For example, we do not advocate using the MPM admission model beyond the time of admission to the ICU, nor do we suggest using this model for predicting which patients will experience long ICU stays. It was designed for adult general medical/surgical patients and may not work in specialized critical-care units. Similar cautions should be made for other systems.

Besides the admission model, a 24-hour model and a 48-hour model have been developed, but these necessarily incorporate ICU therapy, making them less applicable as quality-assessment tools. In addition, they require a higher level of training for data collection than the admission model. Using the probabilities sequentially, however, may enhance prediction. Knowing a patient's changing prob- abilities of hospital mortality at three successive points of observation should provide the clinician with more infor- mation than a probability supplied a single time.

The science of predicting outcome for ICU patients has been rapidly changing, even during the time of this study. The APACHE has been modified to APACHE II (Knaus et al. 1985). This new system is based on 12 of the original APS variables plus a chronic health variable and age. The APACHE II is the most popular scoring system presently in use, with its reduced numbers of variables and increased weights for coma (Teres, Brown, and Lemeshow 1982) and acute renal failure (Sweet, Glenney, Fitzgibbons, Friedmann, and Teres 1981). The correct classification rate reported for APACHE II is similar to that reported for APS and MPM. The MPM system has also evolved, with the construction of a "combined" admission model based on more than 2,600 patients. A direct comparison of this new MPM and APACHE II, both collected at the time of admission to the ICU, might resolve which method is su- perior for quality-assessment purposes (both being inde- pendent of ICU treatment).

All of the methods compared have a high error rate that makes them unacceptable for individual-patient clinical decision making. Continuing studies of MPM data include a detailed clinical evaluation of patients predicted to live who ultimately died. We hope that an understanding of why these patients were misclassified will lead to a reduc-

tion of the current 15% error rate. We hypothesize that the higher-than-anticipated number of deaths may be due to catastrophic events that occurred later in the ICU stay and were not apparent early in the ICU stay. They could not, therefore, be foreseen by our models or any other predictive system based on admission data. We are in the process of testing this hypothesis on the patients in our study who were incorrectly predicted to live.

Note that a comparison between the 24-hour MPM and APACHE II or APS would be of great clinical interest, because these systems use information up to 24 hours in the ICU. It was not possible for us to do such a compar- ison, because the 24-hour MPM was generated on the two cohorts combined. Thus there was not an independent data set on which to perform the comparisons.

This study has shown that it is possible to develop a statistical multivariate model that accurately reflects the mortality experience of ICU patients. The method used reduced many potentially important variables to a signif- icant subset and objectively assigned a weight to each vari- able. The admission model resulting from this approach performed comparably to APS and SAPS with respect to overall classification rates (i.e., sensitivity and specificity) and performed better with respect to goodness-of-fit (i.e., correspondence between observed and expected number of deaths within deciles of risk) on a common cohort of patients at Baystate Medical Center, even though the lat- ter two methods were collected over a longer time period (the first 24 hours of ICU stay). The same statistical meth- odology holds great promise for providing insights into patient prognosis after 24 or 48 hours in the ICU.

[Received January 1987. Revised August 1987.]

REFERENCES

Bertram, D. A., Schumacher, D. N., Horn, S. D., Clopton, C. J., Lord, J. G., and Chan, C. (1982), "Hospital Case Mix Groupings and Ge- neric Algorithms," Quality Review Bulletin, 1, 24-30.

Dixon, W. J. (ed.) (1983), BMDP Statistical Software, Berkeley: Uni- versity of California Press.

Feller, I., Tholen, D., and Cornell, R. G. (1980), "Improvements in Burn Care, 1965 to 1979," Journal of the American Medical Associ- ation, 244, 2074-2078.

Fetter, R. B., Shin, Y., Freeman, J. L., Averill, R. F., and Thompson, J. D. (1980), "Case Mix Definition by Diagnosis-Related Groups," Medical Care, 18 (supp.), 1-53.

Fleiss, J. L. (1981), Statistical Methods for Rates and Proportions (2nd ed.), New York: John Wiley.

Gonnella, J. S., Hornbrook, M. C., and Louis, D. Z. (1984), "Staging of Disease: A Case-Mix Measurement," Journal of the American Med- ical Association, 251, 637-646.

Gustafson, D. H., Fryback, D. G., Rose, J. H., Prokop, C. T., Detmer, D. E., Rossmeissl, F. C., Taylor, C. M., Alemi, F., and Carnazzo, A. J. (1983), "An Evaluation of Multiple Trauma Severity Indices Created by Different Index Development Strategies," Medical Care, 21, 674-691.

Horn, S. D., Sharkey, P. D., and Bertram, D. A. (1983), "Measuring Severity of Illness: Homogeneous Case Mix Groups," Medical Care, 21, 14-25.

Hosmer, D. W., and Lemeshow, S. (1980), "Goodness of Fit Tests for the Multiple Logistic Regression Model," Communications in Statis- tics, Part A-Theory and Methods, 9, 1043-1069.

Jelinek, R. C., Haussmann, R. K. D., Hegyvary, S. T., and Newman, J. F., Jr. (1974), A Methodology for Monitoring Quality of Nursing Care, Bethesda, MD: U.S. Department of Health, Education, and Welfare (publication HRA 76-25).

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions

Page 10: Predicting the Outcome of Intensive Care Unit Patients · 2017-03-31 · onary care, cardiac surgery, and burn patients were ex- cluded from the study, as were patients under 14 years

356 Journal of the American Statistical Association, June 1988

Kennedy, J. W., Kaiser, G. C., Fisher, L. D., Maynard, C., and Fritz, J. K. (1980), "Multivariate Discriminant Analysis of the Clinical and Angiographic Predictors of Operative Mortality From the Collabo- rative Study in Coronary Artery Surgery (CASS)," Journal of Thoracic and Cardiovascular Surgery, 80, 876-887.

Knaus, W. A., Draper, E. A., Wagner, D. P., and Zimmerman, J. E. (1985), "APACHE II: A Severity of Disease Classification," Critical Care Medicine, 13, 818-829.

(1986), "An Evaluation of Outcome From Intensive Care in Major Medical Centers," Annals of Internal Medicine, 104, 410.-418.

Knaus, W. A., Draper, E. A., Wagner, D. P., Zimmerman, J. E., Birn- baum, M. L., Cullen, D. J., Kohles, M. K., Shin, B., and Snyder, J. V. (1982a), "Evaluating Outcome From Intensive Care: A Prelim- inary Multihospital Comparison," Critical Care Medicine, 10,491-496.

Knaus, W. A., Le Gall, J.-R., Wagner, D. P., Draper, E. A., Loirat, P., Campos, R. A., Cullen, D. J., Kohles, M. K., Glaser, P., Granthil, C., Mercier, P., Nicolas, F., Nikki, P., Shin, B., Snyder, J. V., Wattel, F., and Zimmerman, J. E. (1982b), "A Comparison of Intensive Care in the U.S.A. and France," Lancet, 2, 642-646.

Knaus, W. A., Zimmerman, J. E., Wagner, D. P., and Draper, E. A. (1981), "APACHE-Acute Physiology and Chronic Health Evaluation: A Physiologically Based Classification System," Critical Care Medi- cine, 9, 591-597.

Le Gall, J.-R., Loirat, P., Alperovitch, A., Glaser, P., Granthil, C., Mathieu, D., Mercier, P., Thomas, R., and Villers, D. (1984), "A Simplified Acute Physiologic Score for ICU Patients," Critical Care Medicine, 12, 975-977.

Lemeshow, S., and Hosmer, D. W. (1982), "A Review of Goodness of Fit Statistics for Use in the Development of Logistic Regression Mod- els," American Journal of Epidemiology, 115, 92-106. -- (1984), "Estimating Odds Ratios With Categorically Scaled Co-

variates in Multiple Logistic Regression Analysis," American Journal of Epidemiology, 119, 147-151.

Lemeshow, S., Teres, D., Avrunin, J. S., and Pastides, H. (1987), "A

Comparison of Methods to Predict Mortality of Intensive Care Unit Patients," Critical Care Medicine, 15, 715-722.

Lemeshow, S., Teres, D., Pastides, H., Avrunin, J. S., and Steingrub, J. S. (1985), "A Method for Predicting Survival and Mortality of ICU Patients Using Objectively Derived Weights," Critical Care Medicine, 13, 519-525.

Pierpont, G. L., Kruse, M., Ewald, S., and Weir, E. K. (1985), "Practical Problems in Assessing Risk for Coronary Artery Bypass Grafting," Journal of Thoracic and Cardiovascular Surgery, 89, 673-682.

Pollack, M. M., Ruttimann, U. E., Getson, P. R., and members of the Multi-Institutional Study Group (1987), "Accurate Prediction of a Pediatric Intensive Care Outcome: A New Quantitative Method," New England Journal of Medicine, 316, 134-139.

Pozen, M. W., D'Agostino, R. B., Selker, H. P., Sytkowski, P. A., and Hood, W. B., Jr. (1984), "A Predictive Instrument to Improve Cor- onary-Care-Unit Admission Practices in Acute Ischemic Heart Dis- ease: A Prospective Multicenter Clinical Trial," New England Journal of Medicine, 310, 1273-1278.

Sweet, S. J., Glenney, C. U., Fitzgibbons, J. P., Friedmann, P., and Teres, D. (1981), "Synergistic Effect of Acute Renal Failure and Res- piratory Failure in the Surgical Intensive Care Unit," American Journal of Surgery, 141, 492-496.

Teres, D., Brown, R. B., and Lemeshow, S. (1982), "Predicting Mor- tality of Intensive Care Unit Patients: The Importance of Coma," Critical Care Medicine, 10, 86-95.

Teres, D., Lemeshow, S., Avrunin, J. S., and Pastides, H. (1987), "Val- idation of the Mortality Prediction Model for ICU Patients," Critical Care Medicine, 15, 208-213.

Wasson, J. H., Sox, H. C., Neff, R. K., and Goldman, L. (1985), "Clin- ical Prediction Rules: Applications and Methodological Standards," New England Journal of Medicine, 313, 793-799.

Young, W. W. (1984), "Incorporating Severity of Illness and Comorbidity in Case-Mix Measurement," Health Care Financing Review, 6 (annual supp.), 23-31.

This content downloaded from 202.28.119.47 on Tue, 18 Nov 2014 22:09:04 PMAll use subject to JSTOR Terms and Conditions