the neer classification system for proximal humeral fractures

$: The Neer classification system for proximal humeral fractures$
The PDF of the article you requested follows this cover page.

This is an enhanced PDF from The Journal of Bone and Joint Surgery

1993;75:1745-1750. J Bone Joint Surg Am.ML Sidor, JD Zuckerman, T Lyon, K Koval, F Cuomo and N Schoenberg

assessment of interobserver reliability and intraobserver reproducibilityThe Neer classification system for proximal humeral fractures. An

This information is current as of April 18, 2011

Reprints and Permissions

Permissions] link. and click on the [Reprints andjbjs.orgarticle, or locate the article citation on

to use material from thisorder reprints or request permissionClick here to

Publisher Information

www.jbjs.org20 Pickering Street, Needham, MA 02492-3157The Journal of Bone and Joint Surgery

http://www2.ejbjs.org/misc/reprints_perms.dtl

http://www.jbjs.org

http://www.jbjs.org

Copyright 1993 by The Journal ofBone andfoin: Surgery, Incorporated

VOL. 75.A, NO. 12. DECEMBER 1993 1745

The Neer Classification System for Proximal Humeral FracturesAN ASSESSMENT OF INTEROBSERVER RELIABILITY AND INTRAOBSERVER �

BY MICHAEL L. SIDOR, M.D.t, JOSEPH D. ZUCKERMAN, M.D.t, TOM LYON, B.S.t, KENNETh KOVAL, M.D.t,

FRANCES CUOMO, M.D.t, AND NORMAN SCHOENBERG, M.D.t, NEW YORK, N.Y.

Investigation performed at the Shoulder Service, Hospitalfor Joint Diseases Orthopaedic Institute, New York City

ABSTRACT: The radiographs offifty fractures of theproximal part of the humerus were used to assess theinterobserver reliability and intraobserver reproduc-ibility of the Neer classification system. A trauma seriesconsisting of scapular anteroposterior, scapular lateral,and axillary radiographs was available for each frac-ture. The radiographs were reviewed by an orthopaedicshoulder specialist, an orthopaedic traumatologist, askeletal radiologist, and two orthopaedic residents, intheir fifth and second years of postgraduate training.The radiographs were reviewed on two different occa-sions, six months apart.

Interobserver reliability was assessed by compari-

son of the fracture classifications determined by thefive observers. Intraobserver reproducibility was eval-

uated by comparison of the classifications determinedby each observer on the first and second viewings.Kappa (K) reliability coefficients were used.

All five observers agreed on the final classificationfor 32 and 30 per cent of the fractures on the first andsecond viewings, respectively. Paired comparisons be-tween the five observers showed a mean reliabilitycoefficient of 0.48 (range, 0.43 to 0.58) for the firstviewing and 0.52 (range, 0.37 to 0.62) for the secondviewing. The attending physicians obtained a slightlyhigher kappa value than the orthopaedic residents(0.52 compared with 0.48). Reproducibility rangedfrom 0.83 (the shoulder specialist) to 0.50 (the skeletalradiologist), with a mean of 0.66. Simplification of theNeer classification system, from sixteen categories to

six more general categories based on fracture type, didnot significantly improve either interobserver reliabil-ity or intraobserver reproducibility.

Fractures of the proximal part of the humerus are

most commonly classified with use of the system intro-duced by Neer in 1970314. This system is based on thepresence of displacement of at least one of the fouranatomical parts of the proximal part of the humerus.Decisions regarding treatment are determined mainly

by the type of fracture that is present.

*No benefits in any form have been received or will be received

from a commercial party related directly or indirectly to the subjectof this article. No funds were received in support of this study.

tShoulder Service, Hospital for Joint Diseases Orthopaedic In-stitute, 301 East 17th Street, New York, N.Y. 10003.

For any system for the classification of fractures,

excellent reliability and reproducibility among all re-

viewers in the interpretation of the radiographs and the

classification of the injuries are desirable features. Inter-observer reliability refers to the level of agreement be-

tween different observers for the classification of a

specific fracture. Reproducibility, or intraobserver reli-ability, indicates the level of agreement for one observer

for the classification of a specific fracture on separate

occasions. Interobserver reliability and intraobserverreproducibility of even the most widely accepted classi-

fication systems have been assessed only infrequently.

Recently, classification systems for fractures of the prox-

imal part of the femur”6, the ankl&5”7, and the carpal

scaphoid4 have been reviewed, and, in general the de-

gree of interobserver reliability has been disappointing.

Kristiansen et al. assessed the Neer classification system

in a similar manner and found the results to be highly

dependent on the level of experience of the observer9.

However, they used only anteropostenor and lateral

radiographs rather than the standard trauma series, and

they classified the fractures according to a simplified

system. They did not assess reproducibility.The purpose of the current study was to assess the

degree of interobserver reliability and intraobserver re-

producibility of the Neer classification system for prox-imal humeral fractures with use of the standard trauma

series of radiographs.

Materials and Methods

The radiographs of fifty proximal humeral fractures

or fracture-dislocations in adults who had been seen inthe emergency room or offices at our institution were

chosen for inclusion in the study. A standard traumaseries - scapular anteroposterior, scapular lateral, and

axillary radiographs - of good quality was available for

each patient. The radiographs were made with use of

a standardized technique. In nearly all patients, theproximal humeral fracture was an isolated injury. There-

fore, the scapular anteroposterior and scapular lateralradiographs were made with the patient standing, with

the arm on the chest (usually in a sling or shoulder-immobilizer). This allowed the patient to be positioned

oblique to the x-ray beam. The axillary radiograph was

made with the patient supine, with the arm abductedapproximately 60 to 70 degrees in the plane of the scap-

1746 M. L. SIDOR ET AL.

THE JOURNAL OF BONE AND JOINT SURGERY

ula but remaining in the same position of rotation as for

the scapular anteroposterior and scapular lateral radio-

graphs. In most situations, because of the acute nature

of the injury, positioning of the patient for the axillary

radiograph was performed by an orthopaedic surgeon.

If the patient was unable to stand, all radiographs were

made with the patient supine.

The acceptability of the quality of the radiographs

(for both projection and clarity) was determined by two

orthopaedic surgeons who did not serve as observers for

this study. They agreed that the fifty fractures included

most of the different patterns of proximal humeral frac-

tures. All identifying data (other than labels indicating

the right or left side) were obscured on the radiographs.

The series was arranged in random order and numbered

as Cases 1 through 50.

The radiographs were reviewed by five observers: an

orthopaedic surgeon subspecializing in problems of the

shoulder (a shoulder specialist), an orthopaedic surgeon

subspecializing in traumatology (a traumatologist), a ra-

diologist specializing in imaging of the musculoskeletal

system (a skeletal radiologist), and two orthopaedic res-

idents (one in the fifth year of postgraduate training and

one in the second year). None of the observers were

informed of the other participants in the study. This was

done to avoid any discussion of the radiographs after the

testing sequence.

Each observer was familiar with the Neer classifica-

tion system and had used it clinically. However, to stand-

ardize the information available to the observers at the

time of testing, we provided each one with a typed sum-

mary of the classification system. The Neer scheme for

the classification of proximal humeral fractures com-

prises four segments: the articular segment, the greatertuberosity, the lesser tuberosity, and the humeral shaft.

For a segment to be considered displaced, it must be

displaced by more than 1 .0 centimeter or angulated

more than 45 degrees. Displaced fractures are classified

as two, three, or four-part fractures on the basis of the

number of displaced segments. There are separate cate-

gories for fracture-dislocations (anterior or posterior)

and for fractures of the articular surface (so-called

impression fracture or head-splitting fracture). During

testing, each observer was given a diagram of the frac-

ture classification system that has appeared often in the

literature3 (Fig. 1). Each observer indicated a classifica-

tion for each fracture on the diagram by choosing one

of the sixteen different possibilities.

The radiographs were reviewed by each observer on

two separate occasions, six months apart. The observers

were not provided with any feedback after the first test-

ing. The radiographs were not available to any of the

observers between the first and second viewings. In ad-

dition, the observers’ classification choices made at the

first testing were not available during the second testing.

During the first review, the observers did not know that

they would be retested.

During the first testing, the observers were given the

typed summary of the Neer classification system and

were allotted a maximum of five minutes to read and

review it. They were not allowed to ask questions con-

cerning the information contained in the summary. They

were then given a data form, which included the dia-

gram of the classification system mentioned earlier. A

metric ruler and a goniometer were available for use

during testing. The trauma series (scapular anteropos-

tenor, scapular lateral, and axillary radiographs) was

reviewed for each fracture. Decisions about classifica-

tion were made on the basis of the entire trauma series,

with the observers indicating their choices on the dia-

gram. The observers were given an unlimited amount

of time to make their decisions. After a decision had

been made, the radiographs for the next fracture were

presented. The observers were not permitted to ask

questions of the proctor during or after review of theradiographs.

The second testing was performed in an identicalmanner, except that the series of radiographs was shown

in reverse order to inhibit the observers’ recall of the

decisions made during the first testing.

The interobserver reliability was assessed by com-

parison of the classifications decided on by the five

different observers for each of the fifty fractures. The

intraobserver reproducibility was determined by corn-

parison of the classifications decided on by each mdi-

vidual observer for the first and second testing sessions.

Statistical Analysis

Computer-generated kappa statistics (PC-Agree,

version 2.5; McMaster University, Hamilton, Ontario,

Canada) were used to analyze interobserver reliability

and intraobserver reproducibility. This analysis involves

adjustment of the observed proportion of agreement

between or among observers by correction for the pro-

portion of agreement that could have occurred by

chance. Hence, the adjusted values are almost always

lower than the observed values for the proportion of

agreement. The kappa coefficients range from +1.0( complete agreement) through 0 (chance agreement) to

less than 0 (less agreement than expected by chance).The lower boundary of kappa in this study, with use of

five observers, was -O.25�.

We used the guidelines proposed by Landis and

Koch for interpretation of these values to categorize the

kappa coefficients. Values of less than 0.00 indicated

poor reliability; 0.00 to 0.20, slight reliability; 0.21 to

0.40, fair reliability; 0.41 to 0.60, moderate reliability;0.61 to 0.80, substantial agreement; and 0.81 to 1.00,

excellent or almost perfect agreement. The kappa coef-

ficients for agreement among the two orthopaedic res-

idents were compared with those among the three

attending physicians with use of a Student t test that

incorporated the standard errors of kappa for these two

groups.

FIG. 1

THE NEER CLASSIFICATION SYSTEM FOR PROXIMAL HUMERAL FRACTURES 1747

VOL. 75.A, NO. 12. DECEMBER 1993

Displaced Fractures

Diagram showing the Neer classification system for proximal humeral fractures. (Modified from Neer, C. S., II: Displaced proximal humeralfractures. Part I. Classification and evaluation. J. Bone and Joint Surg., 52-A: 1079, Sept. 1970.)

We did not attempt to assess accuracy - that is, how

close an experimental observation lies to a true value -

because that would have required a known correct clas-sification for each fracture that was assessed. These data

were not available because the classification of each

fracture is a matter of interpretation by observers.

Rather, we assessed the level of agreement among dif-

ferent observers. It is important to note that agreement(reliability and reproducibility) does not necessarily re-

flect accuracy, for the reasons just stated.

Interobserver Reliability

Results

All five observers agreed on the classification for 32

per cent of the fractures during the first testing and for

30 per cent during the second testing. At least four ob-

servers agreed on the classification for 54 per cent of the

fractures during the first testing and for 62 per cent

during the second testing. When we decreased the ex-tent of agreement to at least three of the five observers,

there was 88 per cent agreement for both the first and

second testings. (Examples of fractures for which therewas poor agreement are shown in Figures 2 and 3.)

We analyzed the results on the basis of a pairwise

comparison among the five observers, which produced

ten paired analyses, and according to the over-all agree-

ment among the group. For the first viewing, the in-

terobserver reliability coefficients (kappa) ranged from

0.43 to 0.58, with an over-all value of 0.48. For the second

viewing, the kappa values ranged from 0.37 to 0.62, with

an over-all value of 0.52. There were no significant differ-

ences between the results of the first and second view-ings. For the first viewing, the best paired comparison was

between the traumatologist and the second-year ortho-

paedic resident and the worst, between the traumatol-ogist and the skeletal radiologist. For the second viewing,the best paired comparison was also between the trau-

matologist and the second-year resident and the worst,

between the skeletal radiologist and the fifth-year resi-

dent. The over-all kappa value for all paired comparisons

was 0.50, which represents moderate interobserver reli-

ability. None of the paired comparisons achieved almost-

perfect interobserver reliability (i � 0.81).

The effect of the level of expertise of the observers

was also assessed. The three attending physicians (the

shoulder specialist, the traumatologist, and the skeletal

FIG. 2

Figs. 2 and 3: Radiographs showing fractures for which there was poor interobserver agreement.Fig. 2: This fracture was classified as an articular surface fracture, a two-part surgical-neck fracture, a two-part fracture of the lesser

tuberosity, and a three-part fracture of the lesser tuberosity.

The Neer classification system includes sixteen pos-

sible categories for any fracture. We believed that the

I� I

...j

FIG. 3



radiologist) achieved reliability coefficients of 0.47 for

the first viewing and 0.56 for the second viewing (mean,

0.52). The reliability coefficients for the orthopaedic res-

idents were 0.44 for the first viewing and 0.51 for the

second viewing (mean, 0.48). This represents moderate

reliability for both the attending physicians and the or-

thopaedic residents.

Intraobserver Reproducibility (Table I)

For all five observers, the reliability coefficient was

0.66, which represents a substantial level of reproducibil-

ity. The only observer to achieve an almost-perfect level

was the shoulder specialist (x� = 0.83); all others were in

the moderate to substantial range (K = 0.50 to 0.68).

There was no significant difference in reproducibility

when the values obtained by the attending physicians

were compared with those obtained by the orthopaedic

residents.

Effect ofSimplification ofthe Classification System

relatively low interobserver reliability and intraob-

server reproducibility might have been caused by the

complexity of the system’7. Therefore, we simplified the

sixteen possible choices by dividing them into six cate-

gories: one-part fractures (type 1), two-part fractures

(types 2 through 5), three-part fractures (types 8 and

9), four-part fractures (type 12), fracture-dislocations

(types 6, 7, 10, 1 1, 13, and 14), and fractures of the

articular surface (types 15 and 16). The original data

were analyzed again on the basis of this simplified sys-

tern; the observers were not re-tested but, rather, their

original classifications were regrouped.

There was no improvement for either interobser-ver reliability or intraobserver reproducibility with use

of this simplified system. Rather, the interobserver

reliability coefficients (kappa) for the first and second

viewings decreased slightly: to 0.42 for the first view-

ing and to 0.48 for the second viewing (compared with

0.48 and 0.52 for the first and second viewings, re-

spectively, with use of the sixteen-category classifica-

tion system). The value for reproducibility remained

the same (0.66) with use of both systems. (All of the

This fracture was classified as a two-part fracture of the greater tuberosity, a two-part surgical-neck fracture. a three-part fracture of thegreater tuberosity, and a four-part fracture.

THE NEER CLASSIFICATION SYSTEM FOR PROXIMAL HUMERAL FRACTURES 1749

VOL. 75-A, NO. 12, DECEMBER 1993

TABLE I

INTRAOBSERVER REPRODUCIBILITY AFTER Two REVIEWS

OF THE Fume FRACTURES. Six MONThS APART

ReproducibilityNon-

Reviewer Adjusted* Adjustedt

Shoulder specialist 0.86 0.83

Orthopaedic traumatologist 0.70 0.64

Skeletal radiologist 0.62 0.50

Fifth-year orthopaedic resident 0.74 0.68

Second-year orthopaedic resident 0.70 0.63

Mean 0.72 0.66

9The proportion of cases that was classified the same on both

viewings.

tReliability coefficient (kappa value).

values are adjusted kappa reliability coefficients.)

Discussion

Systems for the classification of fractures occupy a

central role in the practice of orthopaedic surgery. They

constitute a means for the description of fractures and

fracture-dislocations, and they provide important guide-

lines for treatment. Such systems have been used fre-

quently in the orthopaedic literature to describe the

results of specific treatments. Our ability to compare the

results of various treatments depends in large part on

the assumption that injuries of comparable severity are

being treated. Currently, fracture-classification systems

are the most common mechanism for assessment of the

comparability of different series.

Therefore, it is important that fracture-classification

systems be both reliable and reproducible. However, the

commonly used systems have infrequently been evalu-

ated for interobserver reliability and intraobserver re-

producibility, and the results of these few evaluations

have been disappointing. Nielsen et a!. found a low in-

terobserver reliability and intraobserver reproducibility

for the Lauge-Hansen classification of fractures of the

ankle and concluded that the system was difficult to

apply in a reproducible manner9. Thomsen et al. eval-

uated the Lauge-Hansen and Weber classifications of

ankle fractures and reported similar results. Frandsen et

al. assessed Garden’s classification scheme for fractures

of the femoral neck and found the interobserver re-

liability to be poor. Andersen et al. reported similar

results in their evaluation of Evans’ classification for

intertrochanteric fractures, although the interobserver

reliability was somewhat better than that for Garden’s

system.

The Neer classification is the most widely used

scheme for proximal humeral fractures. It has gained

wide clinical acceptance by orthopaedic surgeons and

radiologists and is considered to have important impli-

cations for both treatment options and outcomes562’4’6.

However, to our knowledge, only one report in the or-

thopaedic literature has dealt with its reliability. Kris-

tiansen Ct a!. reported a low level of interobserver reli-

ability among four observers of varying expertise who

evaluated a series of 100 proximal humeral fractures9.

These authors found the level of expertise to be an

important factor in the prediction of interobserver re-

liability. Their study, however, had a few important

limitations. First, they condensed the classification into

five groups (one-part, two-part, three-part, and four-

part fractures, and all other fractures and fracture-

dislocations), which were somewhat disparate in terms

of fracture type, treatment options, and prognosis. 5cc-ond, they did not use a complete trauma series but relied

only on anteroposterior and lateral radiographs. Finally,

reproducibility was not assessed9.

We used a complete trauma series to evaluate fifty

proximal humeral fractures that were classified on

the basis of the detailed (sixteen-category) system of

Neer. We found a moderate level of interobserver re-

liability among the five observers. When we analyzed

the results using pairs of observers, the reliability co-

efficient ranged from 0.37 to 0.62. An almost-perfect

level of reliability (K � 0.81) was not obtained for

any paired evaluation. It is interesting that the relia-

bility coefficients did not improve when we employed

a simplified (six-category) version of the classification

system.Although we used a complete trauma series, which

we thought would provide the maximum amount of

information that could be obtained from plain radio-

graphs, the reliability did not achieve the high levels thatwould be expected for such a widely used and accepted

classification scheme. We attribute this to several fac-

tors. First, proximal humeral fractures are inherently

complex injuries with multiple fracture lines, making it

difficult to assess the displacement of one segment in

relation to another. Second, although we used only ra-

diographs of good quality, the overlapping of osseous

densities in this anatomical region increases the diffi-

culty of interpretation. Any compromise of radiographic

technique can be expected to exacerbate this problem.

Additional radiographic studies might have provided

information that would have increased the level of reli-

ability. Computerized tomographic scans have been rec-

ommended for evaluation of the degree of displacement

of the tuberosities as well as for assessment of head-

splitting fractures, articular impression fractures, and

chronic fracture-dislocations237’5. However, their use is

not currently considered part of the standard evaluation

of these fractures. Finally, it is possible that the criteria

for displacement (more than one centimeter of displace-

ment or 45 degrees of angulation) are too difficult to

measure accurately on radiographs. None of our ob-

servers used a goniometer or a metric ruler to measure

displacement, although all were given the option to do

so. The level of expertise and experience also can affect

interobserver reliability9’7, although this did not appear

to be a significant factor when we compared the values



obtained by the attending physicians with those ob-

tamed by the orthopaedic residents.

Intraobserver reproducibility was found to be higher

than interobserver reliability in the current study, and

this is consistent with other reports’4’5’7. The level of

expertise and experience was a significant factor. Of

the five observers, the shoulder specialist obtained the

highest correlation coefficient (K = 0.83). This was also

the only value in the entire study that was at the almost-

perfect level (K � 0.81) of reliability. Intraobserver re-

producibility usually exceeds interobserver reliability

because it reflects reproducibility independent of agree-

ment. Therefore, incorrect responses that are repeated

may show good intraobserver reproducibility but poor

interobserver reliability.

Important to any fracture classification system is its

relationship to the choice of treatment. In this respect,

interobserver reliability and intraobserver reproducibil-

ity are major considerations. Differences in classifica-

tion between observers that do not result in different

recommendations for treatment for a particular fracture

are considerably less important. Regardless of whether

a fracture is classified as two-part by one observer or

three-part by another, a procedure to reduce and fix the

fracture will generally be recommended. However, a

fracture classified as minimally displaced by one ob-

server and as three-part by another may be treated dif-

ferently by the two observers. Thus, not all differences

in classification are equal with respect to the implica-

tions for treatment and outcome.

References

1. Andersen, E.; Jorgensen, L. G.; and Hededam, L. T.: Evans’ classification of trochanteric fractures: an assessment of the interobserverand intraobserver reliability. Injury, 21: 377-378, 1990.

2. Bigliani, L U. Fractures of the shoulder. Part I: fractures of the proximal humerus. In Fractures in Adults, edited by C. A. Rockwood, Jr.,D. P. Green, and R. W. Bucholz. Ed. 3, vol. 1, pp. 881-882. Philadelphia, J. B. Lippincott, 1991.

3. Castagno, A. A.; Shuman, W. P.; Kilcoyne, R. F.; Haynor, D. R.; Morris, M. E.; and Matsen, F. A.: Complex fractures of the proximal

humerus: role of CT in treatment. Radiology, 165: 759-762, 1987.

4. Dias, J. J.; Taylor, M.; Thompson, J.; Brenkel, I. J.; and Gregg, P. J.: Radiographic signs of union of scaphoid fractures. An analysis ofinter-observer agreement and reproducibility. J. Bone and Joint Surg., 70-B(2): 299-301, 1988.

5. Fleiss, J. L: Statistical Methodsfor Rates and Proportions. Ed. 2, p. 217. New York, John Wiley and Sons, 1981.6. Frandsen, P. A.; Andersen, E.; Madsen, F.; and Skjadt, T.: Garden’s classification of femoral neck fractures. An assessment of inter-

observer variation. J. Bone and Joint Surg., 70-B(4): 588-590, 1988.7. Kilcoyne, R. F4 Shuman, W. P.; Matsen, F. A., III; Morris, M.; and Rockwood, C. A.: The Neer classification of displaced proximal

humeral fractures: spectrum of findings on plain radiographs and CT scans. AJR: Am. J. Roentgenol., 154: 1029-1033, 1990.8. Kristiansen, B., and Christensen, S. W.: Proximal humeral fractures. Late results in relation to classification and treatment. Acta Orthop.

Scandinavica, 58: 124-127, 1987.

9. Kristiansen, B.; Andersen, U. L.; Olsen, C. A.; and Varmarken, J. E.: The Neer classification of fractures of the proximal humerus. Anassessment of interobserver variation. Skel. Radiol., 17: 420-422, 1988.

10. Ku.hhnan, J. E.; Fishman, E. K.; Ney, D. R.; and Magid, D.: Complex shoulder trauma: three-dimensional CT imaging. Orthopedics, 11:

1561-1563, 1988.11. Landis, J. R., and Koch, G. G. The measurement of observer agreement for categorical data. Biometrics, 33: 159-174, 1977.12. Mills, H. .1., and Home, G.: Fractures of the proximal humerus in adults. J. Trauma, 25: 801-805, 1985.

13. Neer, C. S., II: Displaced proximal humeral fractures. Part I. Classification and evaluation. J. Bone and Joint Surg., 52-A: 1077-1089,

Sept. 1970.

14. Neer, C. S., II: Displaced proximal humeral fractures. Part II. Treatment of three-part and four-part displacement. J. Bone and Joint

Surg., 52-A: 1090-1 103, Sept. 1970.

15. Nielsen, J. 0.; Dons-Jensen, H.; and Sorensen, H. T.: Lauge-Hansen classification of malleolar fractures. An assessment of the reproduc-

ibility in 118 cases. Acta Orthop. Scandinavica, 61: 385-387, 1990.16. Seemann, W. -R.; Siebler, G.; and Rupp, H. -G.: A new classification of proximal humeral fractures. European J. Radio!., 6: 163-167, 1986.

17. Thomsen, N. 0. B.; Overgaard, S.; Olsen, L. H.; Hansen, H.; and Nielsen, S. T.: Observer variation in the radiographic classification of

ankle fractures. J. Bone and Joint Surg., 73-B(4): 676-678, 1991.

the neer classification system for proximal humeral fractures

Documents