the barthel index:

9
http://cre.sagepub.com/ Clinical Rehabilitation http://cre.sagepub.com/content/10/4/301 The online version of this article can be found at: DOI: 10.1177/026921559601000407 1996 10: 301 Clin Rehabil Alan Tennant, Joanna ML Geddes and M Anne Chamberlain The Barthel Index: an ordinal score or interval level measure? Published by: http://www.sagepublications.com can be found at: Clinical Rehabilitation Additional services and information for http://cre.sagepub.com/cgi/alerts Email Alerts: http://cre.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://cre.sagepub.com/content/10/4/301.refs.html Citations: at Sheffield Hallam University on October 18, 2010 cre.sagepub.com Downloaded from

Upload: qualified-physio

Post on 14-Mar-2016

216 views

Category:

Documents


1 download

DESCRIPTION

The Barthel Index: an ordinal score or interval level measure?

TRANSCRIPT

Page 1: The Barthel Index:

http://cre.sagepub.com/ 

Clinical Rehabilitation

http://cre.sagepub.com/content/10/4/301The online version of this article can be found at:

 DOI: 10.1177/026921559601000407

1996 10: 301Clin RehabilAlan Tennant, Joanna ML Geddes and M Anne Chamberlain

The Barthel Index: an ordinal score or interval level measure?  

Published by:

http://www.sagepublications.com

can be found at:Clinical RehabilitationAdditional services and information for     

  http://cre.sagepub.com/cgi/alertsEmail Alerts:

 

http://cre.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

http://cre.sagepub.com/content/10/4/301.refs.htmlCitations:  

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 2: The Barthel Index:

301-

The Barthel Index: an ordinal score or intervallevel measure?Alan Tennant, Joanna ML Geddes and M Anne Chamberlain Rheumatology and Rehabilitation Research Unit,University of Leeds, Leeds

Address for correspondence: Alan Tennant, Rheumatologyand Rehabilitation Research Unit, University of Leeds, 36Clarendon Road, Leeds LS2 9NZ, UK.

The Barthel Index is one of the most widely used activities of daily living(ADL) measures in stroke rehabilitation and there has been some debate

recently about whether or not the Index is an ordinal score or an interval levelmeasure. An audit of 192 consecutive patients undergoing inpatientrehabilitation following stroke has provided the opportunity to examine thisquestion with recently developed mathematical techniques based on the workof George Rasch. Rasch models define the criteria which data must follow toproduce an interval level measure. It thus becomes possible to test the dataderived from the audit against the Rasch model.

Calibration of the 10 items in the Index shows considerable differences inthe degree of difficulty (weight), and these differences are not compensatedfor by the current scoring. Thus adding together the items produces a scalewhose intervals vary considerably, particularly between intervals at the loweror upper ends of the scale, and those at the centre. This can give rise toconsiderable differences between the change score based on the Raschtransformation (taking into account item difficulty) and the change score basedon raw scores. These findings confirm the ordinal nature of the Barthel Index.Further questions are raised about the unidimensionality of the Index, and thecontext in which it should be used.

Introduction

For many years the Barthel Index’ has been themainstay of measuring functional ability in reha-bilitation. Developed to indicate the extent of

nursing care required by patients in institutions,it is essentially a measure of dependency in self-care. It has nevertheless been utilized as a ubiq-uitous measure of activities of daily living (ADL)used both in the management of individual

patients (for example to determine when to dis-charge home), and, at the aggregate level, to

show the efficacy of various rehabilitation pro-grammes.2-6 Recently Wade argued that it was

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 3: The Barthel Index:

302

’probably the most widely used and best standardmeasure of ADL 7

The Barthel Index has 10 items, and in theoriginal version, used in this study, each item isscored 0 (unable to perform the task) and then avariable range of points, scoring 5, 10 or 15 toreflect independence or the intervals on the wayto independence (Table 1). Sometimes thesescores are simplified by dividing by 5 to give anoverall score of 20.1 The items are weighted so,for example, in the original version, indepen-dence on chair/bed transfer scores 15, whereasindependence in bathing scores only 5 points.Such scores, whether 5, 10 or 15, or 1, 2 or 3, arecharacteristic of many ADL scales,9 and are ordi-nal in their level of measurement.The characteristics of ordinal scales are such

that increasing scores reflect a trend, for examplein the level of independence. Intervals betweenthe points are, however, not equal in value, sim-ply indicative of the underlying trend. Thus anincrease in independence on a single item ofbetween 0 and 5 (or 1 to 2) is not necessarily ofthe same magnitude as a change from 10 to 15(or 2 to 3).Such inequality in the intervals within items on

ordinal scales is usually complicated by the lackof equality in the degree of difficulty betweenitems. This is not to say that the items do not allcontribute to a single underlying construct (i.e.unidimensionality), but rather that identicalscores on different items cannot be considered

equal. Yet despite this, most ADL scales addtogether item scores, as though a score of 5 on

Table 1 Items and scoring for the Barthel Index

ascore only if ambulation = 0.

one item was the same as a score 5 on another.Even scales which are based on a set of yes/noitems face this problem. Adding such scores

together implies equality in item difficulty. TheBarthel Index implicity acknowledges differencesin its own items in that some are given a greaterweight (score for achieving independence) thanothers.There has been some criticism of the Barthel

Index of late,lO-12 one aspect of which has centredon whether or not the Index is an ordinal scoreor interval level measure. This is important for itdetermines the way in which the Barthel can beused. For example, can the Index (say on admis-sion) be used as an independent predictor forlength of stay in a multiple regression model?Such criticism has led to a robust defence 13 of theIndex, in which it has been suggested that the dis-tance between a score of 10 and 20 is approxi-mately the same as the distance between a scoreof 80 and 90. From this it should follow that the

scaling is approximately evenly spaced and con-sistent with requirements for interval level mea-surement ; ’thus all statistical functions carried outwill be valid’.13 In other words, this argument sug-gests that the Index does indeed have the prop-erty of an interval level measure.However, clinical experience suggests that

intervals at opposite ends of the scale are notequivalent. For example, some would considerthat achieving an improvement in score from 70to 90 is a lot harder than, say, an improvementfrom 10 to 30. Wright and Stone’4 tell us that such’boundary effects’ (i.e. where, for example, it is

increasingly difficult to accrue further points asone approaches the limit of the scale) cause anyfixed differences in points to vary in meaningover the score range of the test.Some attempt has been made to improve the

sensitivity of the Barthel Index.l5 So, for exam-ple while individual item scores for achievingindependence remain fixed, the intervals betweendependence and independence have beenincreased to reflect clinical experience; ratherthan [0, 5, 10], intervals are increased to [0, 2, 5,8, 10]. However, improving sensitivity to changedoes not necessarily convert an ordinal score intoan interval level measure and may not addressthe difference in degree of difficulty of the dif-ferent items in a scale. Achieving independence

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 4: The Barthel Index:

303

in one task, resulting in a gain of 10 points, mayimply a greater change in ability than an increaseof 10 points on another item. Thus we return tothe question, are the intervals on the BarthelIndex equal?

Design and setting

Patients undergoing inpatient rehabilitation at

the National Demonstration Centre for Rehabil-itation in Leeds have Barthel scores recorded at

two-weekly intervals. This practice has beenestablished for five years and forms part of a

comprehensive audit of rehabilitation. Duringthis time 192 consecutive patients undergoingrehabilitation following stroke were routinelyassessed, most recently with the modified indexproposed by Shah et al.’S For the purposes of thisarticle, the data are presented as scores of 0, 5,10, 15. Both the original and modified versionshave a range from 0-100.The 192 patients had a mean age on admission

of 57 years (95% CI ± 1.5) and mean stay of 67days (95% CI ± 5.0). Mean Barthel score onadmission was 54.4 (median 50.0) and upon dis-charge was 72.8 (median 80.0). Thus the overallmean change in Barthel score was 18.4 (median30.0).

Rasch analysis

George Rasch was a Danish mathematicianwhose work in the 1950s has been described asthe point ’at which psychometrics moved frombeing purely descriptive to become a science ofobjective measurement’.16 Much use has beenmade of the Rasch models in the field of educa-tional testing, 16,17 and more recently in the fieldof rehabilitation.’s~’9 The Rasch model defines thecriteria which data must follow to qualify formaking (interval level) measures. 20 If the data fitthe model it is possible to see how the raw scorecompares with the transformed measure arisingfrom the model, and it is this comparison whichwill indicate whether the raw score is at the inter-val level.The principle behind the approach is that it is

possible to determine the probability that a per-

son with a particular level of ability will ’pass’ (orbecome independent at) a certain item. Both

’person ability’ and ’item difficulty’ can beaddressed in this way. The type of distributionthat one looks for on item difficulty is akin to theGuttman scaling&dquo; principle, which has occasion-ally been adopted for hierarchical ADL scales,22including the Barthel Index itself.23 In thesescales achieving independence on one difficultitem means that independence must have beenachieved on all easier items. Clinical experience,however, shows that there are always patientswho cannot do some specific tasks when theiroverall level of ability would suggest otherwise.This is perhaps why a recent application ofGuttman scaling to the Barthel Index found thatthe scale failed to meet the necessary (rigid) cri-teria for scalability. 23 Probability, the basis of theRasch model, offers a more elegant approach,and allows for unexpected results. It can beviewed as an ’imperfect Guttman scale’ where’the probability of responding increases graduallywith more of the trait, rather than jumping froma probability of 0 to 100 percent’. 24Linacre at the University of Chicago has

described the process of working out person abil-ity and item difficulty by using the analogy of anarchery competition in Sherwood Forest. 21 Usingthis example, imagine that Robin Hood, LittleJohn and Will Scarlet are shooting at three trees.Robin hits the first tree 11 out of 12 times, LittleJohn 8 out of 12, Will Scarlet 4 out of 12. Fromthis some idea about the relative ability of thethree archers can be seen by computing theirprobability of success. Then Robin, who hit thefirst tree 11 out of 12 times, only scores 8 out of12 on a second tree which is further away, and 4out of 12 on a third tree even further away. Inthis way the different degrees of difficulty of thetargets can be perceived.This is essentially what the Rasch analysis is

doing - calibrating the ability of patients to

achieve independence on a given item, and cali-brating the degree of difficulty of the items. Inpractice a computer algorithm26 looks at the scaleitems, and the range of person abilities, andthrough an iterative procedure seeks to combinethe two by their difference. Ideally, this differ-ence should govern the probability of what is sup-posed to happen when a person of a given ability

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 5: The Barthel Index:

304

uses that ability against a given task.14 Accordingto Wright and Stone 14 there is no alternativemathematical formulation that allows estimationof person measures and item calibration to be

independent of one another. This is important forwe would not want, for example, a temperaturemeasure to be dependent on the particular ther-mometer used, or on whether the patient is achild or adult.

Results of the Rasch transformation are

reported in ’logits’ which are the natural log-oddsfor succeeding on items of the kind chosen todefine ’zero’ (the mid-point) on the scale. Thekey aspect of using Rasch analysis is that the datamust ’fit’ the mathematical model. Various sta-tistics are provided for this purpose, but an

important one is the standardized information

weighted-fit statistic (INFIT), which is distributedas a t-value. This determines whether or notitems belong to a single underlying construct, thatis, whether or not the scale is unidimensional.Items which have values outside the range ± 2.0

(5% significance) need to be examined for mis-fit. The recent demise of Cronbach’s alpha as ameasure of unidimensionality emphasizes the

importance of this alternative approach. 27

Results .

Model fitThe Rasch analysis reported an INFIT with

mean of -0.4 and a standard deviation of 2.4 forthe 10 items comprising the Barthel Index. Thisindicates a fair fit of the data to the model, butsuggests that there may be some lack of fit to themodel. One item, bladder control, had a stan-dardized INFIT value of +3.0. This would indi-cate that there is not always a strong associationbetween bladder control and the ’physical depen-dency’ construct determined by the BarthelIndex. If a new index were being developed withthese items it would be worth considering omit-ting bladder control, as it weakens the unidi-mensional construct. However, it is important toremember that no data fit a mathematic model

perfectly, and that under the normal distributionitems at the margins of the distribution would beexpected (i.e. 5 in every 100).Another fit statistic, the adjusted test standard

deviation, was reported as 3.09, some 13 timesgreater than the root-mean-square calibration of0.23, indicating satisfactory separation of itemsalong the underlying construct. This is also

important for it shows that the items are mea-

suring different points on the underlying con-struct of dependency - 13 strata in this case. Alower separation factor might indicate redundantitems measuring the same point on the construct,or the possible lack of discrimination for the

underlying construct.

Item calibration

Figure 1 shows the item calibration for the 10items of the Barthel Index for the 192 patientsunder study on discharge from the rehabilitationward. Items are identified along the side (y-axis)and the logit scale along the bottom (x-axis). Thelogit scale runs from the easiest item (that is theone with the greatest probability of achievingindependence), which in this group of patients isindependence in bowel control, to the most diffi-cult, independence in bathing. Achieving inde-pendence in personal hygiene is the item of

average difficulty for this group of patients,reflected by its score of zero. Although not

applicable to the current analysis, it has recentlybeen suggested that wherever higher communi-cation skills are being assessed (for example withthe Functional Independence Measure Scale

[FIMS]),28 discharge calibration is more reliable,as shy patients may appear less functional onadmission than their true underlying ability.29

Items such as ambulation, and particularlystairs and bathing, are shown to be much moredifficult than other items, and bladder and bowelcontrol, and feeding, are easier than other items,that is independence is likely to be achieved (ifnot already present on admission) much soonerthan on other items. The order of item calibra-tion derived from this dataset is remarkably sim-ilar to that identified in the attempt to create ahierarchical index using Guttman scaling.23 Thiswould be expected, as the basic approach is thesame in that frequency of steps achieved on eachitem is calculated and used as the basis for a hier-archical ordering. However, the Rasch approachthen uses a logistic function of the odds ratio(log-odds) to determine its person and item

weights. Unfortunately the original weighting of

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 6: The Barthel Index:

305

Figure 1 Calibration of Barthel items on discharge

items (the score for independence is shown on

Figure 1) in no way reflects their degree of

difficulty (the Rasch-derived weighting) for thisgroup of patients.

Using the raw score and the RaschtransformationWhat happens when items of different diffi-

culty are added together? Figure 2 shows thechange in Barthel score for patients during theirstay on the rehabilitation ward, set against thechange in the logit measure. The latter has beenset so that its range is similar to that of the rawscore change and, in each case, a negative scorereflects a deterioration.

It is clear that a single change score on the

Figure 2 Comparison between changes in Barthel raw score and Rasch measure

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 7: The Barthel Index:

306

Barthel Index masks a broad range of change onthe logit measure derived from the Rasch trans-formation. Most patients have improved duringtheir rehabilitation programme, but, for example,those who had improved by just 15 points on theraw score show a range of change on the logitmeasure (recalibrated to the same range as theraw score change) from 5 to 25.How can this come about? When a five-point

reduction in one item matches a five-pointincrease in another, the resultant change score iszero. What though if those items represent con-siderable differences in degree of difficulty, notcompensated for by the ordinal weighting? Herethe Rasch transformation shows the change inability level when the raw score fails to do so.There is a clinical significance to this discrep-

ancy : rehabilitation staff see differences betweenthe (lack of) change implied by the raw score,and their own perception of the patient’s changein ability. While improving the sensitivity of theIndex by expanding the range of steps on anyitem may help, it will not overcome the problemas long as there is a discrepancy between the itemweighting vis-à-vis the true difficulty of the task.Thus the raw score may mask an underlyingdecrease or increase in ability, and this can

reduce acceptability of the instrument for reha-bilitation staff.

Discussion

The Barthel Index is used extensively in rehabil-itation as a measure of functional outcome. It is

commonly used in our own rehabilitation facilityto indicate when a patient is ready to go home;it has been shown elsewhere to predict function-ing at home,3° and the ability to live indepen-dently.31

Criticism about using ordinal scales as intervallevel measures needs to be addressed directly andRasch analysis provides one way in which thistask can be approached. By doing so evidence isfound to suggest that the items which comprisethe Index represent different degrees of difficultythat are not compensated for by the originalweighting. The Rasch analysis shows that a five-point change in score at the upper end of thescale has a logit distance three times greater than

a similar change in the middle of the scale. Thisindicates that the change in ability at the upperlevel of the scale is three times greater, unit forunit, than an apparent similar change in the mid-dle of the scale. Thus the Barthel Index is anordinal score, and should not be used as an inter-val level measure. These results do not imply thatthe Barthel should be discarded but it must beused as an ordinal scale and appropriate non-parametric statistics applied. If necessary it canbe transformed (e.g. through Rasch analysis) intoan interval level measure and parametric statis-tics applied.However, two other important implications

arise from these findings. First, there is the ques-tion of unidimensionality. If items are added

together to produce a single score then it needsto be demonstrated that they do belong to a sin-gle underlying trait or construct. Bladder controlappears to lay outside the construct expressed byother items in the scale. Similar lack of fit forincontinence items has been observed with theFunctional Independent Measure,32 and it maynot be coincidence that these items are impair-ments, while the rest of the items in the scale aredisabilities.33 Further work needs to be done toexamine the dimensionality of the scale.We have also shown that the original weight-

ing gives fewer points for achieving indepen-dence on the hardest items. While the existingweighting system may be fully compatible withindicating the need for nursing care, which is initself an important input in the rehabilitation

process, if there is any association between item

difficulty and therapy input, then it must be con-sidered whether the current weighting will giverise to change scores which reflect that input. Inother words, is the Barthel Index a valid instru-ment for measuring the efficacy of a rehabilita-tion programme?

Conclusion

The Rasch analysis confirms that the intervals ofthe overall score are not equal. These results sug-gest that the raw score should be treated withcaution, and, without transformation, should onlybe subjected to nonparametric statistics. Further-more, although the Rasch model is robust in its

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 8: The Barthel Index:

307

tolerance of deviation, there is an indication thatunidimensionality is compromised by the ’blad-der’ item.

It is also important to remember that theseresults apply only to stroke patients. Fit ofBarthel Index data to the Rasch model, exami-nation of unidimensionality and resulting itemcalibration should be investigated for other diag-noses, and may not be the same as presentedhere.

Finally, the lack of relationship between thecurrent weighting and item difficulty, as deter-mined by the Rasch model, suggests the need forcaution in application. This brings to mind thestrictures of Silverstein and colleagues32: ’The

question of validity should not be &dquo;Is an instru-ment valid or not?&dquo; It is more properly phrased,&dquo;How valid is it for a given purpose?&dquo; ’

References

1 Mahoney FI, Barthel DW. Functional evaluation:the Barthel Index. Md State Med J 1965; 14: 61-65.

2 Granger CV, Albrecht GL, Hamilton BB. Outcomeof comprehensive rehabilitation: measurement bythe PULSES Profile and the Barthel Index. Arch

Phys Med Rehabil 1979; 60: 145-51.3 Granger CV, Dewis LS, Peters NC, Sherwood CC,

Barrett JC. Stroke rehabilitation: analysis ofrepeated Barthel Index measures. Arch Phys MedRehabil 1979; 60: 14-17.

4 Wade DT, Skilbeck CE, Langton-Hewer R, WoodVA. Therapy after stroke: amounts, determinantsand effects. Int Rehabil Med 1984; 6: 105-10.

5 Shah S, Cooper B, Maas F. The Barthel Index andADL evaluation in stroke rehabilitation in Australia,Japan, the UK and the USA. Aust J Occup Ther1992; 39: 5-13.

6 Geddes JML, Chamberlain MA. Outcome of strokerehabilitation - observing current practice: aprerequisite for targets and standards. Clin Rehabil1992; 6: 253-60.

7 Wade DT. Measurement in neurologicalrehabilitation. Oxford: Oxford University Press,1992.

8 Collin C, Wade DT, Davies S, Horne V. TheBarthel ADL Index; a reliability study. Int DisabilStud 1988; 10: 61-63.

9 Law M, Letts L. A critical review of scales of ADL.Am J Occup Ther 1989; 43: 522-28.

10 Murdock C. A critical evaluation of the BarthelIndex, part 1. Br J Occup Ther 1992; 55: 109-11.

11 Murdock C. A critical evaluation of the BarthelIndex, part 2. Br J Occup Ther 1992; 55: 153-56.

12 Bowling A. Measuring health. Milton Keynes: OpenUniversity Press, 1991.

13 Shah S, Cooper B. Commentary on ’A criticalevaluation of the Barthel Index’. Br J Occup Ther1993; 56: 70-72.

14 Wright BD, Stone MH. Best test design. Chicago, IL:Messa Press, 1979.

15 Shah S, Vanclay F, Cooper B. Improving thesensitivity of the Barthel Index for strokerehabilitation. J Clin Epidemiol 1989; 42: 703-709.

16 Rasch G. Probabilistic models for some intelligenceand attainment tests. Chicago, IL: University ofChicago Press, 1980.

17 Boone WJ. Using item calibration to improveteacher education. Rasch Measurement Trans 1992;5: 180-81.

18 McArthur DL, Cohen MJ, Schandler SL. Raschanalysis of functional assessment scales: an exampleusing pain behaviours. Arch Phys Med Rehabil 1991;72: 296-304.

19 Granger CV, Hamilton BB, Linacre JM, HeinemannAW, Wright BD. Performance profiles of theFunctional Independence Measure. Am J Phys MedRehabil 1993; 72: 84-89.

20 Wright B. IRT in the 1990s: Which models workbest? Rasch Measurement Trans 1992; 6: 196-200.

21 Guttman L. The basis of scalogram analysis. In:Stouffer SA, Osborne F eds. Measurement andprediction. New York: Wiley, 1950.

22 Nouri FM, Lincoln NB. An extended activities of daily living scale for stroke patients. Clin Rehabil1987; 1: 301-305.

23 Barer DH, Murphy JJ. Scaling the Barthel: a 10-point hierarchical version of the activities of dailyliving index for use with stroke patients. ClinRehabil 1993; 7: 271-77.

24 Streiner DL, Norman GR. Health measurementscales. Oxford: Oxford University Press, 1994.

25 Linacre JM. Log-odds in Sherwood Forest. RaschMeasurement Trans 1991; 5: 162-63.

26 Wright BD, Linacre JM. A user’s guide toBIGSTEPS. Chicago, IL: Messa Press, 1992.

27 Cortina JM. What is coefficient alpha? Anexamination of theory and applications. J ApplPsychol 1993; 78: 98-104.

28 Granger CV, Hamilton BB, Sherwin FS. Guide forthe use of the uniform data set for medicalrehabilitation. New York: Uniform Data System forMedical Rehabilitation Project Office, BuffaloGeneral Hospital, 1986.

29 Magalhaes L, Velozo C, Pan A-W, Weeks D.Medical multidimensionality. Rasch MeasurementTrans 1993; 7: 265-66.

30 Wade DT, Leigh-Smith J, Hewer RL. Socialactivities after stroke: measurement and natural

history using the Frenchay Activities Index. IntDisabil Stud 1985; 7: 176-81.

31 DeJong G, Branch LG. Predicting the stroke

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from

Page 9: The Barthel Index:

308

patient’s ability to live independently. Stroke 1982;13: 648-55.

32 Silverstein B, Fisher WP, Kilgore KM, Harley JP,Harvey RF. Applying psychometric criteria tofunctional assessment in medical rehabilitation: II.

Defining interval measures. Arch Phys Med Rehabil1992; 73: 507-18.

33 World Health Organization. The InternationalClassification of Impairments, Disabilities andHandicaps. Geneva: WHO, 1980.

at Sheffield Hallam University on October 18, 2010cre.sagepub.comDownloaded from