the sickness impact profile...sickness in the content of the sickness impact profile. a procedure...

7
The Sickness Impact Profile Development of an Outcome Measure of Health Care BETTY S. GILSON, MD JOHN S. GILSON, MD MARILYN BERGNER, PhD RUTH A. BOBBITT, PhD SHIRLEY KRESSEL, MPH WILLIAM E. POLLARD, PhD MICHAEL VESSELAGO, MD The Sickness Impact Profile, a behaviorally based measure of sickness-related dysfunction, is being developed to provide an appropriate and sensitive measure of health status for use in assessing the effects of health care services. Introduction The development of methods for evaluating health care services is one of the most urgent concerns in the field of health services research. Among the major stimuli for this concern are the public accountability that accompanies increased government participation in health care financing and the growing public interest in the quality of increasingly Dr. Betty Gilson is Associate Professor and Associate Dean, Department of Health Services, School of Public Health and Community Medicine, University of Washington, Seattle, Wash- ington 98195. Dr. John Gilson is Director of Medical Education, Group Health Cooperative of Puget Sound, Seattle, Washington. Dr. Bergner is Assistant Professor, Department of Health Services, School of Public Health and Community Medicine, University of Washington, Seattle, Washington. Dr. Bobbitt is Research Profes- sor, Department of Health Services, School of Public Health and Community Medicine, University of Washington, Seattle, Washing- ton. Ms. Kressel is Senior Administrative Analyst, Health Policy Program, San Francisco, California. Dr. Pollard is a postdoctoral fellow, Department of Psychology, Northwestern University, Evanston, Illinois. Dr. Vesselago's address is: 2012 Tenth Avenue East, Seattle, Washington. This investigation was supported by the HMO Service of the Health Services and Mental Health Adminis- tration, Contract HSM 110-72-420. This paper was presented, in abbreviated form, at the American Public Health Association An- nual Meeting, San Francisco, 1974. It was accepted for publication July 21, 1975. costly services. The proliferation of innovative organiza- tional patterns for providing health services makes it neces- sary to obtain data demonstrating the relative benefits of available alternatives. Evaluators use three types of measures to assess health care services: measures of structure, measures of process, and measures of outcome. 1 2 Measures of structure or process assess factors that are presumably directly related to outcome. Measures of outcome are designed to assess the effects of the health care services on the population served. Often, structure or process measures are used because no adequate or efficient measure of outcome is available. While it has been assumed that these three types of evaluation measures are highly related and that structure and process measures can serve as proxies for outcome measures, the substitution will be legitimate only when the relationship between structure or process and outcome has been estab- lished. For example, one can assess the outcome of a program such as polio immunization by examining the number of immunizations administered (a process mea- sure), since it has been demonstrated that such immuniza- tion leads to less polio (an outcome measure). On the other hand, since it is not known whether the number of clinician visits decreases illness, measuring numbers of visits does not provide knowledge of outcome. 1304 AJPH DECEMBER, 1975, Vol. 65, No. 12

Upload: others

Post on 08-Mar-2021

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

The Sickness Impact Profile

Development of an Outcome Measureof Health Care

BETTY S. GILSON, MDJOHN S. GILSON, MD

MARILYN BERGNER, PhDRUTH A. BOBBITT, PhDSHIRLEY KRESSEL, MPH

WILLIAM E. POLLARD, PhDMICHAEL VESSELAGO, MD

The Sickness Impact Profile, a behaviorally basedmeasure of sickness-related dysfunction, is being

developed to provide an appropriate and sensitive measureof health status for use in assessing the effects of health

care services.

Introduction

The development of methods for evaluating health careservices is one of the most urgent concerns in the field ofhealth services research. Among the major stimuli for thisconcern are the public accountability that accompaniesincreased government participation in health care financingand the growing public interest in the quality of increasingly

Dr. Betty Gilson is Associate Professor and Associate Dean,Department of Health Services, School of Public Health andCommunity Medicine, University of Washington, Seattle, Wash-ington 98195. Dr. John Gilson is Director of Medical Education,Group Health Cooperative of Puget Sound, Seattle, Washington.Dr. Bergner is Assistant Professor, Department of Health Services,School of Public Health and Community Medicine, University ofWashington, Seattle, Washington. Dr. Bobbitt is Research Profes-sor, Department of Health Services, School of Public Health andCommunity Medicine, University of Washington, Seattle, Washing-ton. Ms. Kressel is Senior Administrative Analyst, Health PolicyProgram, San Francisco, California. Dr. Pollard is a postdoctoralfellow, Department of Psychology, Northwestern University,Evanston, Illinois. Dr. Vesselago's address is: 2012 Tenth AvenueEast, Seattle, Washington. This investigation was supported by theHMO Service of the Health Services and Mental Health Adminis-tration, Contract HSM 110-72-420. This paper was presented, inabbreviated form, at the American Public Health Association An-nual Meeting, San Francisco, 1974. It was accepted for publicationJuly 21, 1975.

costly services. The proliferation of innovative organiza-tional patterns for providing health services makes it neces-sary to obtain data demonstrating the relative benefits ofavailable alternatives.

Evaluators use three types of measures to assess healthcare services: measures of structure, measures of process,and measures of outcome. 1 2 Measures of structure orprocess assess factors that are presumably directly relatedto outcome. Measures of outcome are designed to assess theeffects of the health care services on the population served.Often, structure or process measures are used because noadequate or efficient measure of outcome is available. Whileit has been assumed that these three types of evaluationmeasures are highly related and that structure and processmeasures can serve as proxies for outcome measures, thesubstitution will be legitimate only when the relationshipbetween structure or process and outcome has been estab-lished. For example, one can assess the outcome of aprogram such as polio immunization by examining thenumber of immunizations administered (a process mea-sure), since it has been demonstrated that such immuniza-tion leads to less polio (an outcome measure). On the otherhand, since it is not known whether the number of clinicianvisits decreases illness, measuring numbers of visits doesnot provide knowledge of outcome.

1304 AJPH DECEMBER, 1975, Vol. 65, No. 12

Page 2: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

TABLE 1-Categories and Selected Items of the Sickness Impact Profile

Items DescribingCate- Behaviors Involved Scalegory in or Related to Selected Items Values

A Social Interaction

B Ambulation orLocomotion Activity

C Sleep and RestActivity

D Taking Nutrition

E Usual Daily Work

F Household Management

G Mobility andConfinement

H Movement of the Body

I CommunicationActivity

J Leisure Pastimesand Recreation

K IntellectualFunctioning

L Interaction withFamily Members

M Emotions, Feelings,and Sensations

N Personal Hygiene

I make many demands, for example,insist that people do things forme, tell them how to do things

I am going out less to visit peopleI am walking shorter distancesI do not walk at allI lie down to rest more often

during the dayI sit around half asleepI am eating no food at all, nutrition

is taken through tubes or intravenousfluids

I am eating special or different food,for example, soft food, bland diet,low salt, low fat foods

I often act irritable toward my workassociates, for example, snap at them,give sharp answers, criticize easily

I am not working at allI have given up taking care of personal

or household business affairs, for example,paying bills, banking, working on budget

I am doing less of the regular daily workaround the house that I usually do

I stay within one roomI stop often when traveling because

of health problemsI am in a restricted position all

the timeI sit down, lie down, or get up only

with someone's helpcommunicate only by gestures, forexample, moving head, pointing, signlanguage

I often lose control of my voice whentalk, for example, my voice gets louder,starts trembling, changes pitcham doing more physically inactivepastimes instead of my other usualactivities

I am going out for entertainmentless often

I have difficulty reasoning and solvingproblems, for example, making plans,making decisions, learning new things

sometimes behave as if I were confusedor disoriented in place or time, forexample, where I am, who is around,directions, what day it is

I isolate myself as much as I can fromthe rest of the family

I am not doing the things I usually doto take care of my children or family

I act irritable and impatient withmyself, for example, talk badlyabout myself, swear at myself, blamemyself for things that happen

I laugh and cry suddenly for noreason

I dress myself, but do so veryslowly

I do not have control of my bowels

7.7

5.23.39.24.6

8.112.3

5.6

7.1

8.66.9

3.9

9.94.2

13.6

10.4

11.3

6.4

3.9

2.8

8.3

11.2

8.9

6.8

5.4

8.1

4.6

11.2

Page 3: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

The importance of developing widely applicable out-come measures of health care is evident. The few welldesigned studies of health care that have used outcomemeasures to evaluate programs have shown for the mostpart no difference between control and experimental groupsor have shown inconclusive or contradictory results.* Theseresults would be acceptable if one were confident that theoutcome measures used were appropriate to the programgoal and were sufficiently sensitive to discern differenceswhere differences occur over a relatively short-time span.New developments should be directed toward overcomingthe inadequacies of outcome measures used in the past:inappropriateness and insensitivity. This is especially im-portant if the measures are to be used to assess the effects ofcomprehensive health care programs having relatively glo-bal objectives with regard to patient welfare and servingpopulations with heterogeneous health problems.

The Sickness Impact ProfileThe Sickness Impact Profile (SIP), a behaviorally based

measure of sickness-related dysfunction, is being developedin an effort to provide an appropriate, valid, and sensitivemeasure of health status that will aid in assessing theoutcome of health care services.

A measure of behavioral dysfunction in usual dailyactivities could provide a valid and practical indicator ofhealth status and be of potential use in health care outcomeevaluation. Obviously, no single measure can provide acomplete assessment of the quality of health services andthe SIP would be applied in conjunction with other mea-sures appropriate to a particular study situation.

Before this measure of health status was developed, anassumption was made regarding the desired outcomes ofcomprehensive health care programs, thus defining whatwas to be measured. It was assumed that the ultimatelysought product of health services is the reduction ofsickness. Disease was differentiated from sickness. Diseasewas used to denote a professional or provider definition ofillness based on clinical or clinically related observations ofa patient or a population. Sickness was used to denote thenonprofessional definition based on lay observations.

For example, an individual may define certain signs orsymptoms as sickness-related. Observation of such signsand symptoms in himself may result in his perceivinghimself as sick. Having done so, he may or may not seekmedical care. If he does not and his sickness persists, henonetheless experiences impacts of sickness. If he does seekmedical care, he enters the medical care process and hissickness may then be defined as disease by a clinician. Theimpact the sickness has on the individual is influenced bythe health care provider's definition of the illness, themedical care process, and his own sickness perception.

* Elinson,3 Wilner et al.,4 and Cumming and Cumming' pro-vide examples of experimentally designed studies that were unableto show differences between experimental and control groups. Morerecently, Brook and Appel2 cast doubt on the appropriateness ofoutcome measures currently considered acceptable, as well as ofprocess measures.

TABLE 2-SIP Item Scaling: Correlation of Each Judge'sRatings with the Mean Ratings for 312 Items

Judge No. Correlation

1 0.832 0.733 0.804 0.745 0.816 0.827 0.718 0.769 0.8410 0.7711 0.5812 0.6113 0.6514 0.761 5 0.7316 0.771 7 0.6618 0.7919 0.6320 0.8321 0.8122 0.7623 0.7124 0.7425 0.75

Whether or not an individual seeks medical care, theimpacts of sickness, as perceived by him, form the basis forhis response to the SIP.

Although sickness can be assessed in terms of clinicalindices or on the basis of subjective or "feeling state"descriptions,6 the behavioral or performance dimension ofsickness as perceived by the individual provides a singularlyappropriate basis for an outcome measure of health care forseveral reasons.

First, the behavior of an individual is a manifestation,at a given time, of the overall impact of illness, reflectingthe effects of both the clinical and subjective dimensions, aswell as their interactive effects on daily life activities.

Second, the effect of sickness on an individual's social,mental, and physical activities is perceived and appreciatedby both providers and consumers of health services. If asingle evaluation measure is to be useful, the inclusion ofboth perspectives is essential. Such a measure must beacceptable to those providing services and yet must beresponsive to consumers who demand a larger share in thequality assessment processes applied to health services.

Third, a measure of behavioral dysfunction, independ-ent of clinical examination, is particularly appropriate inevaluation of health care systems that are responsible forthe health maintenance of heterogeneous population groupsover extended periods of time. It is more feasible andeconomical when applied to a whole population than aremost clinical measures, more reliable than measures offeelings and emotions, and more sensitive to the compre-hensive impact of a health care system than are morbidity

1306 AJPH DECEMBER, 1975, Vol. 65, No. 12

Page 4: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

and mortality measures. An instrument measuring level ofdysfunction can be used to classify individuals "at a timewhen knowledge about cause and pathogenesis is notadvanced enough to permit measurement in the latterterms."7 In addition, such an instrument can be used as acommon basis for comparing persons in diagnosticallyhomogeneous groups or across selected diagnostic groups. Itcan be applied whether or not medical care has beenreceived or clinical information is available.

Some behaviorally based instruments are currentlyused to evaluate very specific and circumscribed patientgroups but are not designed to be applicable to a segment ofthe general population such as might be served by acomprehensive health care program.7 8 Certain other be-haviorally based instruments are widely applicable, but donot provide sufficient detail to be useful in evaluation andplanning of services.9' 0 A recent refinement by researchershas been the focus on performance of usual daily activitiesin function assessment measures. 1-'3

Methodology of SIP Instrument Construction

The aim of instrument construction was to incorporateboth professional and lay perspectives of the impacts ofsickness in the content of the Sickness Impact Profile. Aprocedure was devised to obtain statements describingbehavioral dysfunction from patients, health care profes-sionals, individuals caring for patients, and the apparentlyhealthy. These descriptions were obtained by using an openended request form to elicit specific statements that de-scribe sickness-related changes in behavior. Over 1000completed request forms were collected. In addition, func-tion assessment instruments that have been designed for the

evaluation of circumscribed patient groups were reviewedfor statements of behavioral dysfunction. From thesesources, 1250 specific statements of behavioral change wereobtained. These statements were subjected to standardgrouping techniques according to a set of criteria. Thisprocess yielded 312 unique statements or items, each itemdescribing a behavior or activity and specifying a dysfunc-tion. A standard sorting procedure yielded 14 groups orcategories of items, each of which appears to describedysfunction in an area of living or a type of activity. Thesecategories and selected items in each are shown in Table 1.

Before the items and categories were field tested, astandardized and structured interview instrument was de-veloped. It included the 312 items, several questions aboutthe personal characteristics of the subject, and a requestthat the subject list any additional changes in his behaviorthat were related to his health and were not covered by theitems.

Subjects were asked to respond positively only to thoseitems that they were sure described them and were relatedto their health. The total pattern of positive responses, ordysfunctions, provided a detailed profile of sickness impactsor, as it became known, a protocol.

In order to: (1) interrelate individual items and tocompare them, (2) provide a base for scoring profile patternswithin and across categories, (3) validate the construct ofdysfunction, and (4) determine the extent to which SIPscores relate to a more global assessment of dysfunction, twoseparate approaches to scaling were applied to the SIP.They were item scaling and protocol scaling.

The item scaling procedure was employed to interrelateindividual items and compare them, and to provide a basisfor scoring. The SIP items were rated by a group of 25

TABLE 3-SIP Protocol Scaling: Median Correlation and Range of Correlation of EachJudge's Ratings with the Mean Ratings for Each Group of Judges*

Judging Group

1 2 3 4

Category Median Range Median Range Median Range Median Range

A 0.86 0.69-0.96 0.85 0.18-0.92 0.89 0.65-0.96 0.91 0.59-0.98B 0.91 0.76-0.99 0.88 0.72-0.97 0.96 0.88-0.99 0.90 0.75-0.97C 0.90 0.70-0.98 0.87 0.67-0.93 0.86 0.71-0.94 0.91 0.55-0.95D 0.87 0.42-0.97 0.89 0.79-0.95 0.91 0.58-0.97 0.91 0.65-0.98E 0.96 0.79-0.99 0.95 0.41-0.99 0.98 0.89-1.00 0.97 0.83-1.00F 0.90 0.70-0.97 0.92 0.57-0.97 0.93 0.78-0.98 0.93 0.71-0.99G 0.92 0.73-0.97 0.93 0.78-0.97 0.94 0.74-0.99 0.92 0.73-0.96H 0.95 0.62-0.99 0.88 0.78-0.96 0.93 0.71-0.98 0.93 0.83-0.971 0.85 0.41-0.95 0.80 0.48-0.95 0.90 0.67-0.97 0.90 0.79-0.97J 0.82 0.49-0.96 0.87 0.68-0.95 0.82 0.57-0.95 0.83 0.55-0.96K 0.89 0.82-0.96 0.85 0.62-0.97 0.93 0.53-0.97 0.93 0.62-0.98L 0.94 0.66-0.97 0.77 0.24-0.96 0.83 0.61-0.95 0.88 0.59-0.96M 0.77 0.45-0.95 0.85 0.64-0.94 0.76 0.56-0.95 0.93 0.71-0.98N 0.93 0.72-0.99 0.94 0.79-0.97 0.95 0.85-0.99 0.92 0.77-0.98

Overall 0.91 0.75-0.97 0.92 0.81-0.96 0.92 0.81-0.97 0.92 0.80-0.95

* Each group of judges rated the protocols of 50 subjects.

SICKNESS IMPACT PROFILE 1307

Page 5: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

judges. The judges were seven graduate nursing students,eight medical students, six health services administrationstudents, and four physicians. The scaling consisted of twosteps. In the first step, using the method of equal appearingintervals, judges rated each item within each category on an11-point scale, ranging from "minimally dysfunctional" to"severely dysfunctional." Judges were asked to rate theseverity of the dysfunction described by an item withoutregard for what might be causing it, i.e., without regard forany specific health condition, prc'gnosis, or personal charac-teristics, in the context of which the behavior might seemmore or less dysfunctional. As is shown below, the meanscale values of the items were stable and there was highagreement among judges. Since the items in all categorieshad been rated in terms of the same concept of dysfunction,the 25 judges were asked in the second step of item scalingto place those items that had been judged to be the mostdysfunctional and least dysfunctional within each categoryon a single 15-point scale. Again, there was high agreementamong the judges. The average scale value for each of theseitems was calculated. This process provided a set ofcommonly scaled endpoints within which the 15-point scalevalue for each of the remaining items in each category couldbe mathematically derived.

The protocol scaling procedure was employed to vali-date the construct of dysfunction and determine the extentto which SIP scores relate to a more global assessment ofdysfunction. Four groups of 25 judges* each rated 50 proto-cols of subjects obtained in a field trial of the SIP. This pro-vided 25 ratings on 200 protocols. In the first step of this pro-cedure, the judges were asked to rate each subject's protocolof responses in each category on an 11-point scale. Thepoints on the scale ranged from "minimally dysfunctional"to "severely dysfunctional." As in item scaling, judges wereasked to make their ratings without regard to the cause ofthe dysfunction. The mean scale value assigned to subjectprotocols was stable and there was high agreement amongthe judges. In the second step, the judges were asked to rateeach subject's complete protocol on a 15-point scale.

An analysis of each of these scaling procedures indi-cated that there was a high level of agreement among judgeson both the ratings of items and the ratings of protocols.With respect to item scaling, results were analyzed in twoways. First, the correlation of each judge's ratings of 312items with the mean of the 25 judges' ratings of these itemswas generally high and indicated that this approach pro-duced reliable scale values (Table 2).

Second, the agreement among the judges on each itemscaled was generally high. Items were scaled with a meanstandard deviation of 2.0 scale points with a standarddeviation of the standard deviations of 0.45. The largest 95per cent confidence interval for the mean scale value of anyitem retained was approximately two scale points. Twenty-nine items were omitted from subsequent scoring analyses

* None of these judges was involved in the item scaling;however, all judges were chosen from the same population sub-groups. There were fewer physicians and nurses for protocol scalingthan for item scaling.

TABLE 4-SIP Protocol Scaling: Mean Standard Deviation ofJudgments within Judging Groups for Category andOverall Judgments

Judging Group

1 2 3 4

Category ScalingMeanSD 1.6 1.7 1.6 1.5SD of SD 0.5 0.4 0.4 0.4

Overall ScalingMean SD 2.0 1.9 1.9 1.9SDofSD 0.8 0.7 0.7 0.7

because the 95 per cent confidence interval for these itemswas greater than two scale points. Results of the scaling ofthe endpoint items of each category were comparable.

Third, an analysis was made of the difference betweenjudges considered to be differentially sophisticated in healthmatters (clinically experienced nursing students and physi-cians, as opposed to medical and health services adminis-tration students). t-tests showed no significant differencebetween the two types of judges; this suggests that thehealth care backgrounds of the judges may not be soinfluential a factor as to invalidate the obtained scale valuesfor use in developing the instrument.t A validation of thescaling using consumer judges is planned.

With respect to protocol scaling, the results for each ofthe four groups of judges were analyzed in two ways. First,the correlations of each judge's ratings with mean ratings ofsubject protocols were high for each of the groups of judges.The median correlation for each group of judges is shown inTable 3. Second, the agreement among judges on eachprotocol scaled was consistently high for each of the fourgroups of judges. The mean standard deviation for the fourgroups of judges ranged from 1.6 to 1.7 in judging subjects'protocols by category on the 11-point scale, and from 1.9 to2.0 in judging overall protocols on the 15-point scale (Ta-ble 4).

Four scoring methods were tested7, in order to select amethod that would best reflect factors that may have beentaken into account by protocol judges while being suffi-ciently simple to allow interpretation and disaggregation ofscores. Each of the methods reflects differently the patternof dysfunction and the item scale values represented in aprotocol. Each permitted calculation of a score for eachcategory and for the overall SIP. The methods tested were:

* A mean of the scale values of items checked. A meanscore represents an average of the dysfunctionweights of the items checked in a protocol;

* A mean of the squared scale values of items checked.

t It should be noted that no difference was found by Patrick,Bush, and Chen between nurses and students in their ratings ofcase descriptions (personal communication with James W. Bush,MD, January, 1974). A published account of their work incorrectlyreported a significant difference between nurses' ratings andstudents' ratings. 12

1308 AJPH DECEMBER, 1975, Vol. 65, No. 12

Page 6: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

This represents an average of the dysfunctionweights of items checked, but increases the relativeweight of items that have high scale values;

* A percentage of total possible dysfunction, which isthe sum of the scale values of items checked dividedby the sum of the scale values for all items, multi-plied by 100. This method of scoring provides a rela-tive frequency that is weighted by the magnitude ofthe scale value as well as by the number of itemschecked;

* A profile indicating the number of items checkedwithin one of four scale-point groupings. The deter-mination of scale-point groupings is based on thedistribution of items scaled across 15 points. Whilethe profile represents a frequency distribution, itssize as a number or score may also relate in somesystematic way to the method of judging that takesinto account protocol patterning from maximum tominimum as well as number of items.

All scoring methods related sufficiently well to protocoljudging to give evidence of the validity of the values derivedin item scaling and of the construct of dysfunction.

The profile score showed the best relationship withprotocol judgments. Since further analysis indicated thatjudges took into account both number of items checked andthe item checked with the highest scale value in makingtheir ratings, both the profile scoring method and the percent scoring method were retained (Table 5).

TABLE 5-Correlation of SIP Percentage and Profile Scoreswith Mean Protocol Ratings by Category and Over-all

Correlations

Category % Score* Profile Scoret

A 0.90 0.95B 0.82 0.92C 0.89 0.96D 0.86 0.93E -0.29t 0.58F 0.82 0.96G 0.75 0.97H 0.86 0.97I 0.90 0.94J 0.79 0.86K 0.83 0.94L 0.85 0.95M 0.88 0.88N 0.86 0.98

Overall 0.93 0.94

* Pearson product-moment correlation.t Spearman rank-difference correlation.t A positive response to "I am not working at all" precluded response

to any of the remaining items in Category E. The percentage score in thiscase does not accurately reflect the degree of dysfunction relative to otherresponse patterns. This will be taken into account in revising and refiningthe SIP.

Pilot Study of the SIP

The limited field trial conducted as part of the initialdevelopment of the SIP provided useful albeit preliminarydata about the feasibility, reliability, and validity of theSIP.

In the field trial, 246 group practice enrollees wereinterviewed. This represented 71 per cent of a sample of 357drawn from five categories of medical care: inpatients, homecare patients, walk-in clinic patients, outpatients, andnonpatients. Analysis of the completed interviews showedthat 98 per cent of the items were used at least once, that225 subjects of the 246 interviewed found at least one itemthat described them, and that the mean number of itemschecked per subject was 30. In view of the sampling design,this provides evidence of the broad applicability of theinstrument. In addition, no subject refused to complete theinterview once begun, and of the 357 people contacted andthe 246 interviewed, no complaints of any kind wereregistered with the group practice or the SIP project. Of therefusals, 36 per cent were not interviewed because of healthreasons. Most of these were among the elderly, the inpa-tients, and the home care patients. These refusals came notonly from subjects but from medical care personnel whosepermission was requested before inpatients were inter-viewed. Interviewers were instructed not to schedule orreschedule interviews for a time when the subject's healthwould be improved. Since this was a pilot study of feasibili-ty, and patients' tolerance of the interview process wasunknown, interviewers did not urge participation.

Reliability estimates based on two administrations ofthe SIP to 31 subjects showed that overall scores were highlyreliable. Test-retest correlations using the various scoringmethods ranged from 0.80 to 0.88.

When subjects were grouped by the category of medicalcare they were receiving at the time of the field trial (asindicated by their patient classification as described above),SIP scores were related to these categories in the expecteddirection. These data, along with the positive relationshipbetween a self-assessment of sickness obtained from eachsubject and his SIP score, provide preliminary evidence ofvalidity.

(Subsequently, the SIP was revised on the basis of dataobtained in the pilot study. A long form of the SIP,consisting of 235 items, and a short form, consisting of 146items, were developed. A second field trial was conductedthat provided a more definitive evaluation of the reliabilityand validity of the SIP. The results of this field trial arereported elsewhere.1" 15 In general, the results were positiveand warrant further revision and refinement of the SIP.)

Summary and Discussion

The SIP is a scaled measure of health-related dysfunc-tion. It is being designed for use, in conjunction with otherkinds of assessment, in evaluation of health care servicesand particularly of comprehensive health care programs. Itis a behavioral measure, independent of diagnostic criteria,which relies solely on an individual's perception of the

SICKNESS IMPACT PROFILE 1309

Page 7: The Sickness Impact Profile...sickness in the content of the Sickness Impact Profile. A procedure was devised to obtain statements describing behavioral dysfunction from patients,

impacts of sickness on his usual daily activities. It isintended to provide a quantitatively sensitive and qualita-tively detailed measure without imposing the limitationsand uncertainties of diagnostic classification. It should benoted that no considerations relating to prognosis have beenincorporated into the SIP. The imperfect and constantlychanging state of medical science makes prognostication souncertain that however attractive it may seem conceptuallyto include future risk in a health indicator, this is methodo-logically feasible in very few situations. Further, since theSIP is designed to measure "sickness" in a population at agiven point in time, change in SIP from one administrationto another should in itself be a valid and useful indicator ofchange in the health of the population under consideration.The examination of health in terms of separate dimensions,such as function, diagnosis, and prognosis, facilitates thestudy of each and its relationship to the others whileallowing for the meaningful combination of these into aunified health status index.

Since one of the objectives in developing the SIP was toconstruct a measure capable of detailed and comprehensivedescription of dysfunction, a comment is appropriate on thecompleteness of the SIP dysfunction catalog. There is evi-dence that the method used for eliciting descriptions of be-havior related to sickness, and the adaptation of numerousdescriptions found in existing instruments, have producedan adequately extensive catalog of dysfunction descriptions:(1) the yield of new, useful items from the open ended ques-tionnaires used to collect descriptions of health-related dys-function decreased sharply toward the end of the data col-lection period; (2) continuing review of the literaturerevealed no new descriptions for inclusion in the SIP; (3)field trial subjects, who were asked at the close of the inter-view to list any changes in their behavior that were not cov-ered by the items in the SIP, added no new dysfunction de-scriptions to the SIP compendium. Thus, it is evident thatthe preliminary SIP contained a relatively complete inven-tory of items from which to revise and refine the instrument.

Although the primary concern with regard to SIPcontent has been the inclusion of dysfunction descriptionsin a wide variety of activity areas spanning a broad range ofseverity, the important issue of the level of detail desirablein such an instrument has not yet been dealt with in asystematic way. For practical reasons, the shortest possibleinstrument is desirable, yet condensation of content mustnot reduce qualitative and quantitative discriminativecapability or reliability of results. Since the discriminativecapacity of the SIP will depend in part on the descriptivedetail retained, this issue will be a major focus of the

refinement process. Statistical results must be interpretedin terms of the use of the instrument and in terms of thedescriptive information desired. Since there are no testedand documented guidelines, the finalization of the SIP willrequire the expertise of health care practitioners, as well asof consumers and other evaluation researchers.

REFERENCES

1. Donabedian, A. Evaluating the Quality of Medical Care.Milbank Mem. Fund Q. 44:166-206, 1966.

2. Brook, R. H., and Appel, F. A. Quality of Care Assessment:Choosing a Method for Peer Review. N. Engl. J. Med. 288:1323-1329, 1973.

3. Elinson, J. Effectiveness of Social Action Programs in Healthand Welfare. In Assessing the Effectiveness of Child HealthServices, edited by Bergman, A. B., pp. 77-81. Ross Labora-tories, Columbus, 1967.

4. Wilner, D. M., et al. The Housing Environment and FamilyLife: A Longitudinal Study of the Effects of Housing onMorbidity and Mental Health. Johns Hopkins Press, Bal-timore, 1962.

5. Cumming, E., and Cumming, J. Closed Ranks. HarvardUniversity Press, Cambridge, 1957.

6. Baumann, B. Diversities in Conceptions of Health and PhysicalFitness. J. Health Hum. Behav. 2:39-46, 1960.

7. Katz, S., Downs, T. D., Cash, H. R., and Grotz, R. C. Progressin Development of the Index of ADL. Gerontologist 10:20-30,1970.

8. Kelman, H. R., and Willner, A. Problems in Measurement andEvaluation of Rehabilitation. Arch. Phys. Med. Rehabil.43:172-181, 1962.

9. Belloc, N., Breslow, L., and Hochstein, J. R. Measurement ofPhysical Health in a General Population Survey. Am. J. Epi-demiol. 93:328-336, 1970.

10. Sullivan, D. F. Conceptual Problems in Developing an Index ofHealth. Vital and Health Statistics: Data Evaluation andMethods Research. National Center for Health Statistics,Series 2, No. 17. U.S. Government Printing Office, Washing-ton, DC, 1966.

11. Fanshel, S., and Bush, J. W. A Health Status Index and ItsApplication to Health Services Outcomes. Operations Res.18:1021-1066, 1970.

12. Patrick, D. L., Bush, J., and Chen, M. Toward an OperationalDefinition of Health. J. Health Soc. Behav. 14:6-23, 1973.

13. Spitzer, W. O., Sackett, D. L., Sibley, J. C., Roberts, R. S.,Gent, M., Kergin, D. J., Hackett, B. C., and Olynich, A. TheBurlington Randomized Trial of the Nurse Practitioner. N.Engl. J. Med. 290:251-256, 1973.

14. Pollard, W. E., Bobbitt, R. A., Bergner, M., Martin, D. P.,and Gilson, B. S. The Sickness Impact Profile: Reliability ofa Health Status Measure. Med. Care, in press, February,1976.

15. Bergner, M., Bobbitt, R. A., Pollard, W. E., Martin, D. P.,and Gilson, B. S. The Sickness Impact Profile: Validation of aHealth Status Measure. Med. Care, in press, January, 1976.

1310 AJPH DECEMBER, 1975, Vol. 65, No. 12