severity scoring of atopic dermatitis: a comparison of three scoring systems

7
Aiiergy 1997:52: 944-949 Printed in UK - ait rights reserved Copyrigiit © Mimicsgaard 1997 ALLERGY ISSN 0105-4538 Short communication Severity scoring of atopic dermatitis: a comparison of three scoring systems Sprikkelman AB, Tupker RA, Burgerhof H, Schouten JP, Brand PLP, Heymans HSA, van Aalderen WMC. Severity scoring of atopic dermatitis: a comparison of three scoring systems. Allergy 1997; 52; 944-949. © Munksgaard 1997. In studies on atopic dermatitis (AD), different scoring systems are used to evaluate the severity of the disease. The objective of this study was to investigate agreement between observers in the assessment of the overall severity of AD, and interobserver variation in the assessment of severity of AD for each scoring item separately, using the Simple Scoring System (SSS), the Scoring Atopic Dermatitis (SCORAD) index, and the Basic Clinical Scoring System (BCSS), and, furthermore, to investigate agreement between these three scoring systems in the assessment of the overall severity of AD. Eighty-two patients (42 male) with AD, mean age 13.4 years (range 0.2- 67.0), were included. Agreement between observers in assessitig the overall AD severity scores, and interobserver variation in assessing AD severity of each scoring item separately were determined in 34 of these 82 patients by two physicians scoring the severity of AD by the three scoring systems. To determine agreement between the scoring systems, one physician scored the severity of AD in all patients with the three scoring systems. Agreement between observers and agreement between the three scoring systems was calculated by Cohen's kappa (K) and by the measure of agreetnent according to Bland & Altman. K>0.4 represents fair agreement; K>0.75 excellent agreement. In addition, interobserver variation for each scoring item separately was calculated by the Wilcoxon signed rank test. The mean differences (d) and the limits of agreement (d+2 SD of the differences) between observers by the SSS and the SCORAD were -0.82±5.58 and -0.28±7.49, respectively, K between observers for the BCSS was 0.90 (95% CI 0.79-1.03). By the SSS, significant interobserver variation was found in assessing the severity of excoriations {P=0.02) and scales (_P=G.O2). By the SCORAD, significant interobserver variation was found in assessing the severity of edema/papulation (P=0.04), erythema (P=0.04), and excoriations (P=0.01). No significant interobserver variation was found in assessing the extent of AD. The mean difference and the limits of agreement between the SSS and the SCORAD were -4.17±9.52. K between the SSS and the BCSS was 0.21 (95% CI 0.09-0.33), and K between the SCORAD and the BCSS was 0.38 (95% CI 0.26-0.51). We found good agreement between observers assessing the overall severity of AD in the lower and higher scoring rates by the SSS and the SCORAD, and excellent agreement by the BCSS. Significant interobserver variation was found on the isolated intensity items scales, excoriations, edema/papulation, and erythema. We found poor agreement between the three scoring systems in assessing the overall severity of AD, indicating that the SSS, the SCORAD, and the BCSS cannot be used interchangeably to assess the overall severity of AD. A. B. SprikkelmanV R. A. Tupker^, H. Burgerhof^, J. P. Schouten^ P. L. P. Brand\ H. S. A. Heynians\ W. M. C. van Aalderen^ 'Beatrix Children's Hospital and ^Department of Dermatology, University Hospital, Groningen; ^Department of Epidemiology and Statistics, University of Groningen, Groningen, The Netherlands Key words: atopic dermatitis; severity scoring. A. B. Sprikkelman, MD Department of Pediatric Pulmonology Beatrix Children's Hospital University Hospital Groningen Hanzepiein 1 PO. Box 30.001 9700 RB Groningen The Netherlands Accepted for publication 1 April 1997 In 1980, Hanifin & Rajka outlined the diagnostic criteria of atopic dermatitis (AD), and their outline has proven to be useful to practitioners and investi- gators (1). However, for research on AD, especially 944

Upload: independent

Post on 26-Nov-2023

1 views

Category:

Documents


0 download

TRANSCRIPT

Aiiergy 1997:52: 944-949Printed in UK - ait rights reserved

Copyrigiit © Mimicsgaard 1997

ALLERGYISSN 0105-4538

Short communication

Severity scoring of atopic dermatitis: acomparison of three scoring systems

Sprikkelman AB, Tupker RA, Burgerhof H, Schouten JP, Brand PLP,Heymans HSA, van Aalderen WMC. Severity scoring of atopic dermatitis:a comparison of three scoring systems.Allergy 1997; 52; 944-949. © Munksgaard 1997.

In studies on atopic dermatitis (AD), different scoring systems are used toevaluate the severity of the disease. The objective of this study was toinvestigate agreement between observers in the assessment of the overallseverity of AD, and interobserver variation in the assessment of severity ofAD for each scoring item separately, using the Simple Scoring System (SSS),the Scoring Atopic Dermatitis (SCORAD) index, and the Basic ClinicalScoring System (BCSS), and, furthermore, to investigate agreement betweenthese three scoring systems in the assessment of the overall severity of AD.Eighty-two patients (42 male) with AD, mean age 13.4 years (range 0.2-67.0), were included. Agreement between observers in assessitig the overallAD severity scores, and interobserver variation in assessing AD severity ofeach scoring item separately were determined in 34 of these 82 patients bytwo physicians scoring the severity of AD by the three scoring systems. Todetermine agreement between the scoring systems, one physician scored theseverity of AD in all patients with the three scoring systems. Agreementbetween observers and agreement between the three scoring systems wascalculated by Cohen's kappa (K) and by the measure of agreetnent accordingto Bland & Altman. K>0.4 represents fair agreement; K>0.75 excellentagreement. In addition, interobserver variation for each scoring itemseparately was calculated by the Wilcoxon signed rank test. The meandifferences (d) and the limits of agreement (d+2 SD of the differences)between observers by the SSS and the SCORAD were -0.82±5.58 and-0.28±7.49, respectively, K between observers for the BCSS was 0.90 (95%CI 0.79-1.03). By the SSS, significant interobserver variation was found inassessing the severity of excoriations {P=0.02) and scales (_P=G.O2). By theSCORAD, significant interobserver variation was found in assessing theseverity of edema/papulation (P=0.04), erythema (P=0.04), and excoriations(P=0.01). No significant interobserver variation was found in assessing theextent of AD. The mean difference and the limits of agreement between theSSS and the SCORAD were -4.17±9.52. K between the SSS and the BCSSwas 0.21 (95% CI 0.09-0.33), and K between the SCORAD and the BCSSwas 0.38 (95% CI 0.26-0.51). We found good agreement between observersassessing the overall severity of AD in the lower and higher scoring ratesby the SSS and the SCORAD, and excellent agreement by the BCSS.Significant interobserver variation was found on the isolated intensity itemsscales, excoriations, edema/papulation, and erythema. We found pooragreement between the three scoring systems in assessing the overall severityof AD, indicating that the SSS, the SCORAD, and the BCSS cannot be usedinterchangeably to assess the overall severity of AD.

A. B. SprikkelmanV R. A. Tupker̂ ,H. Burgerhof̂ , J. P. Schouten^P. L. P. Brand\ H. S. A. Heynians\W. M. C. van Aalderen^'Beatrix Children's Hospital and ^Departmentof Dermatology, University Hospital, Groningen;^Department of Epidemiology and Statistics,University of Groningen, Groningen,The Netherlands

Key words: atopic dermatitis; severity scoring.

A. B. Sprikkelman, MDDepartment of Pediatric PulmonologyBeatrix Children's HospitalUniversity Hospital GroningenHanzepiein 1PO. Box 30.0019700 RB GroningenThe Netherlands

Accepted for publication 1 April 1997

In 1980, Hanifin & Rajka outlined the diagnosticcriteria of atopic dermatitis (AD), and their outline

has proven to be useful to practitioners and investi-gators (1). However, for research on AD, especially

944

in drug-effect studies, a more objective, quantita-tive, and accurate assessment of the severity of ADand its clinical eourse was needed. Therefore, otherassessment methods have been developed. Atpresent, different scoring systems are being used toevaluate AD. For several years, the Simple ScoringSystem (SSS) of Costa et aL has been used to assessthe severity of AD and has proved to be easy toperform, quick, and reproducible (2). Recently, theScoring Atopic Dermatitis (SCORAD) index wasintrodueed. This scoring system claims to be com-prehensive and reliable (3). However, several fea-tures of the SCORAD index, including its validity,accuracy, sensitivity, and biologic relevance are notyet fully established (3). Both the SSS and theSCORAD consist of item scores and overall sever-ity scores. At our university outpatient elinic andin primary health-care settings in the province ofGroningen, The Netherlands, the Basie ClinicalScoring System (BCSS) is being used. This scoringsystem is fast and easy to execute, with littleinterobserver variation (4).

It seems obvious that there are variations inseverity ratings between the three scoring systems.

Severity scoring of AD

since they are intended to measure differentthings. Nevertheless, it is not always clear fromscientific papers using a certain scoring system forAD why investigators choose that particular sys-tem. We hope with this study to contribute to amore informed choice. Moreover, little is knownabout the interobserver variation of each scoringsystem and the agreement between the scoringsystems.

In this study, we investigated agreement betweenobservers in the assessment of the overall severityof AD, and interobserver variation in the assess-ment of severity of AD for each scoring itemseparately, with the SSS, the SCORAD, and theBCSS. Furthermore, we investigated agreementbetween these three scoring systems in the assess-ment of the overall severity of AD.

Material and methodsSubjects

Eighty-two AD patients (42 males, 40 females)were studied. Tlieir mean age was 13.4 years (range

8tn

6 -

U 4

Q

1 2o

f 0i20

Q)

XIOC(D0)

ib

-2 -

-4 -

-6 -

- 4.757

-0.824

-6.404

10 20 30 40

Mean overall AD severity scores

Fig. L Agreement betweeti observers in assessmetit of overall severity of AD by Sitnple Scoring System. d±2 SD = -0.82±5 58d=mean difference; 2 SD = two times standard deviation, limits of agreement.

945

Sprikkeltnan et al.

8

a5(U

o0

S2

o

IIf)

5 -

-5 -

- 7.211

-0.282

-7.775

10 20 30 40 50

Mean overall AD severity scores

Fig. 2. Agreement between observers in assessment of overall severity of AD by SCORAD. d+2 SD=-0.28±7.49. d = meandifference; 2 SD = two times standard deviation, limits of agreement.

0.2-67 years). Patients were recruited from theoutpatient clinics of the Beatrix Children's Hospitaland the department of dermatology at our UniversityHospital. The severity of AD was assessed by theSSS, the SCORAD, and the BCSS in all thesepatients by a single physician (A.B.S.). Agreementbetween observers in the assessment of the overallAD severity and interobserver variation in theassessment of severity of AD for each scoring itemseparately were assessed in 34 of these 82 patientsby having two physicians (A.B.S., R.A.T.) score theseverity of AD simultaneously. No communicationbetween the two physicians was allowed during theexamination until all study forms were completed.Both physicians had received specific training inassessing the severity of AD. The use of theSCORAD was practiced on patients, and the scorersreviewed slides from the atlas photographs pub-lished in the Consensus Report of the EuropeanTask Force on Atopic Dermatitis, and followedthe recommendations published in this consensusreport (3).

Scoring methods

The SSS scores 10 intensity criteria and 10 topo-graphic sites. The intensity criteria are erythema,edema, vesicles, crusts, excoriations, scales, licheni-fications, pigmentation/depigmentation, pruritus,and loss of sleep: each is scored from 0 (no lesion)to 7 (extremely severe). The most severely affectedareas are chosen for evaluation of each criterion.In addition, 10 topographic sites are scored from 0to 3 according to the extent of the involvetnent.These include five symmetric areas: feet, knees,legs, hands, and arms (one value for both), and fiveasymmetric areas: face, scalp, buttocks, anteriortrunk, and posterior trunk. Tlie maximum score forthe severity and extent is 100 (2).

The SCORAD scores the extent, six intensity cri-teria, and subjective symptoms of AD. Tlie extent(0-100) is assessed by the rule of nine applied toa front-back drawing of children's inflammatorylesions; dry skin is not taken into account. Theintensity criteria are erythema, edema/papulation.

946

Severity scoring of AD

OoCO

cro

to b^ ^

i ̂CJ 5

^ Q

CO 2

£ oc (ud) x:

5 -

0 -

0XI

O

b

-15

-20

- 5.348

I Mi

10 20

-4.172

-13.692

30 40

Mean overall AD severity scores

Fig. 3. Agreement between Simple Scoring System and SCORAD in assessment of overall severity of AD. d+2 SD = -4.f 7±9.52.d = tnean difference; 2 SD = two times standard deviation, hmits of agreement.

oozing/crust, excoriation, lichenification, and dryness,each item being scored from 0 (absent) to 3 (severe).Subjective symptoms, pruritus, and sleep loss areevaluated with regard to the last 3 days and nights,and all are scored by the patients or, in case ofchildren less than 7 years of age, by the parent(s)on an analog scale (0-10). Finally, the total scorefor the severity of AD is mathematically assessedas the sum of the scores for extent, intensity, and sub-jective symptoms. Tlie maximum score is 100 (3).

Tlie BCSS scores the extent of AD by evaluatingthe number of sites (head/neck, anterior trunk andposterior trunk, arms, and legs) of the lesions. Eachsite is scored 0 (no lesion) or 1 (lesion) (4). TheBCSS ranges from 0 (no lesion) to 5 (lesions on allsites).

Statistics

Agreement between observers using the SSS and tbeSCORAD, and agreement between these scoringsystems in the assessment of overall AD severitywere calculated by the measurement of agreement

according to Bland & Altman (5). Agreementbetween observers using the BCSS, and agreementbetween the BCSS and the other scoring systemsin the assessment of overall AD severity were cal-culated by Cohen's kappa (K) (6). K>0.4 representsfair agreement; K>0.75 excellent agreement. Tliemaximum value of K is 1. To calculate kappa as ameasure of agreement, we adapted the scoringscales of the SSS and the SCORAD, recoding theAD severity scores in six categories. In the SSS andin the SCORAD, scores of 0-5 received the value0, scores of 6-10 received the value 1, scores of11-15 received the value 2, scores of 16-20received the value 3, scores of 21-25 received thevalue 4, and scores of 26 and higher receivedthe value 5. In addition, interobserver variation foreach scoring item separately was calculated by theWilcoxon signed rank test.

ResultsAgreement in the assessment of the overall ADseverity between observers using the SSS and

947

Sprikkelman et al.

SCORAD is illustrated in Figs. 1 and 2, respec-tively. By the SSS, the mean difference (d) was-0.82 and the limits of agreement (d±2 SD of thedifferences) were -0.82±5.58. By the SCORAD,the mean difference and the limits of agreementwere -0.28±7.49. The mean differences and thelimits of agreement are depicted in Figs. 1-3. Goodagreement in the assessment of the overall ADseverity was found between observers in the lowerand higher scoring rates using the SSS and theSCORAD, whereas more variation in severityscores was found in the middle scoring rates (Figs.1 and 2). Excellent agreement (K 0.90; 95% CI0.79-1.03) in the assessment of the overall AD sev-erity was found between observers using the BCSS.

Agreement in the assessment of the overallseverity of AD between the SSS, the SCORAD,and the BCSS is shown in Figs. 3-5. Poor agree-ment, mean difference of -4.17, and limits ofagreement of -4.17±9.52 were found between theSSS and the SCORAD (Fig. 3).

Poor agreement (K 0.21; 95% CI 0.09-0.33)was also found between the BCSS and the SSS(Fig. 4). Likewise, poor agreement (K 0.38; 95% Cl0.26-0.51) was found between the BCSS and theSCORAD (Fig. 5).

Using the SSS and considering each scoringitem separately, we found statistically significant inter-observer variation in assessing excoriations (F=0.02)and.scales (P=0.02). For the SCORAD, statisticallysignificant interobserver variation was found inassessing edema/papulation (P=0.04), erythema(P=0.04), and excoriations (P^O.Ol). No signifi-cant interobserver variation was found in assessingthe extent of AD.

DiscussionThis study shows good agreement between observersassessing the overall severity of AD in the lowerand higher scoring rates by the SSS and the SCO-RAD. However, more variation between observersin assessing the severity of AD was found in themiddle scoring rates. Excellent agreement in theassessment of the overall AD severity was foundbetween observers using the BCSS. Tliis agreeswith other studies (2, 3). However, interobservervariation was found in the severity assessment ofisolated intensity items by the SSS and the SCO-RAD. Tliis suggests that these scoring systems arenot suitable for follow-up of isolated intensityitems. We are aware that these results wereobtained from a relatively small subject population,with mainly mild to moderate AD severity scores.For extension of these results to the whole subjectpopulation with AD, a study with subjects withmore severe AD is needed.

40 -

"in

COD)coo

_E

Q

O

30 -

20 -

10 -

0 -

Overall AD severity score; Basic Clinical Scoring System

Fig. 4. Agreement between Basie Clinical Scoring System andSimple Scoring System in assessment of overall severity of AD.K=0.2t (95% CI 0.09-0.33). K=kappa; CI=confidence interval.

Another conclusion that can be drawn from ourdata is that the agreement between the threescoring systems in the assessment of the overallseverity of AD is poor. This lack of agreement canbe explained by a difference in approach of thedifferent scoring systems. On one hand, the SSSand the SCORAD are more elaborate than theBCSS. In the former two scoring systems, the extentof the AD lesions, the intensity of the lesions, andsubjective symptoms, pruritus, and sleep loss aretaken into account. On the other hand, the BCSSscores only the extent of AD. It is thereforepossible that the AD of a patient attains themaximum score in the BCSS, because AD ispresent in all sites, whereas the scores are low bythe SCORAD and the SSS, because the intensityscore of AD in each site is low. The lack ofagreement between the SSS and the SCORAD is

948

Severity scoring of AD

50 -

ooCO

8in

tQ

1

30 -

20 -

10 -

0 -

Overall AD severity score; Basic Clinical Scoring System

Fig. 5. Agreement between Basic Clinical Scoring System andSCORAD in assessment of overall severity of AD. K=0.3CS(95% CI 0.26-0.51). K=kappa; CI = eonfidence interval.

suffice. Because AD is a chronically relapsingdisease, wide variation of the disease state at thetime of follow-up is possible. A simple, rapid, andreproducible scoring system such as the BCSS maybe sufficient for this purpose. However, for detec-tion of subtle modifications in the severity of AD,as in drug-effect studies, or for follow-up of theprogression of AD, a more elaborate scoring sys-tem, such as the SSS or the SCORAD, is moreappropriate. In these cases, one is interested notonly in the absence or presence of AD lesions, butalso in variation in the different intensity itemsand variation in the extent of the lesions. In thecurrent study, we found that one should be awareof significant interobserver variation in the assess-tnent of isolated intensity items such as scales,excoriations, edema/papulation, and erythema, whenusing these more elaborate systems. Therefore, forresearch purposes, it is advisable to have thesescores assessed by a single observer as much aspossible.

In conclusion, the BCSS is an excellent andsimple score to detect the development of AD,whereas the SSS and the SCORAD are suitable tofollow up the severity of AD. There is significantinterobserver variation in scoring isolated itemsof these elaborate scoring systems. Tlierefore, forstudies assessing tnodifications in the severity ofAD over time, a single observer is essential.Because poor agreement was found betweenthe three scoring systems in assessing the overallseverity of AD, they should not be used inter-changeably to assess the severity of AD.

AcknowledgmentThis study was supported by a research grant from Tlie NetherlandsAsthma Foundation.

probably explained by the fact that the intensity ofthe lesions is evaluated for each item on the mostseverely affected areas in the SSS, while in theSCORAD each item is evaluated at its averageintensity. These differences itnply that the threescoring systems cannot be used interchangeably,and that it is not possible to compare results ofstudies on the overall severity or follow-up data ofAD when different scoring systems have beenused.

Thus, it appears that each scoring system has itsown application in clinical practice and research.When only a rough estimation of the severity ofAD is needed, as in long-tenn follow-up studies onthe developtnent of allergic symptotns, a tnorelimited scoring system such as the BCSS would

References1. Hanifin JM, Rajka G. Diagnostic features of atopic dermati-

tis. Aeta Derm Venereol Suppl (Stockh) 1980;92:44-7.2. Costa C, Rilliet A. Nicolet M, Saurat JH. Scoring atopic

dermatitis; the simpler the better? Aeta Derm Venereol1989;69:4!-5.

3. Stalder JF, Tateb A. Severity scoring of atopic dermatitis:the SCORAD index. Dermatology 1993:t86:23-31.

4. Verwimp JJM, Bindels JG, Barents M, Heytiians HSA.Symptomatology and growth in infants with cow's milkprotein intolerance using two different whey-proteinhydrolysate based formulas in a primary health care setting.Eur J Clin Nutr 1995;49:S39.

5. Bland JB, Altman DG. Statistical methods for assessingagreement between two methods of clinical measurement.Lancet 1986; 1:307-10.

6. Cohen J. A coefficient of agreement for nominal scales.Educ Psychol Meas t960;20:37-46.

949