the neer classification system for proximal humeral fractures
TRANSCRIPT
The PDF of the article you requested follows this cover page.
This is an enhanced PDF from The Journal of Bone and Joint Surgery
1993;75:1745-1750. J Bone Joint Surg Am.ML Sidor, JD Zuckerman, T Lyon, K Koval, F Cuomo and N Schoenberg
assessment of interobserver reliability and intraobserver reproducibilityThe Neer classification system for proximal humeral fractures. An
This information is current as of April 18, 2011
Reprints and Permissions
Permissions] link. and click on the [Reprints andjbjs.orgarticle, or locate the article citation on
to use material from thisorder reprints or request permissionClick here to
Publisher Information
www.jbjs.org20 Pickering Street, Needham, MA 02492-3157The Journal of Bone and Joint Surgery
Copyright 1993 by The Journal ofBone andfoin: Surgery, Incorporated
VOL. 75.A, NO. 12. DECEMBER 1993 1745
The Neer Classification System for Proximal Humeral FracturesAN ASSESSMENT OF INTEROBSERVER RELIABILITY AND INTRAOBSERVER �
BY MICHAEL L. SIDOR, M.D.t, JOSEPH D. ZUCKERMAN, M.D.t, TOM LYON, B.S.t, KENNETh KOVAL, M.D.t,
FRANCES CUOMO, M.D.t, AND NORMAN SCHOENBERG, M.D.t, NEW YORK, N.Y.
Investigation performed at the Shoulder Service, Hospitalfor Joint Diseases Orthopaedic Institute, New York City
ABSTRACT: The radiographs offifty fractures of theproximal part of the humerus were used to assess theinterobserver reliability and intraobserver reproduc-ibility of the Neer classification system. A trauma seriesconsisting of scapular anteroposterior, scapular lateral,and axillary radiographs was available for each frac-ture. The radiographs were reviewed by an orthopaedicshoulder specialist, an orthopaedic traumatologist, askeletal radiologist, and two orthopaedic residents, intheir fifth and second years of postgraduate training.The radiographs were reviewed on two different occa-sions, six months apart.
Interobserver reliability was assessed by compari-
son of the fracture classifications determined by thefive observers. Intraobserver reproducibility was eval-
uated by comparison of the classifications determinedby each observer on the first and second viewings.Kappa (K) reliability coefficients were used.
All five observers agreed on the final classificationfor 32 and 30 per cent of the fractures on the first andsecond viewings, respectively. Paired comparisons be-tween the five observers showed a mean reliabilitycoefficient of 0.48 (range, 0.43 to 0.58) for the firstviewing and 0.52 (range, 0.37 to 0.62) for the secondviewing. The attending physicians obtained a slightlyhigher kappa value than the orthopaedic residents(0.52 compared with 0.48). Reproducibility rangedfrom 0.83 (the shoulder specialist) to 0.50 (the skeletalradiologist), with a mean of 0.66. Simplification of theNeer classification system, from sixteen categories to
six more general categories based on fracture type, didnot significantly improve either interobserver reliabil-ity or intraobserver reproducibility.
Fractures of the proximal part of the humerus are
most commonly classified with use of the system intro-duced by Neer in 1970314. This system is based on thepresence of displacement of at least one of the fouranatomical parts of the proximal part of the humerus.Decisions regarding treatment are determined mainly
by the type of fracture that is present.
*No benefits in any form have been received or will be received
from a commercial party related directly or indirectly to the subjectof this article. No funds were received in support of this study.
tShoulder Service, Hospital for Joint Diseases Orthopaedic In-stitute, 301 East 17th Street, New York, N.Y. 10003.
For any system for the classification of fractures,
excellent reliability and reproducibility among all re-
viewers in the interpretation of the radiographs and the
classification of the injuries are desirable features. Inter-observer reliability refers to the level of agreement be-
tween different observers for the classification of a
specific fracture. Reproducibility, or intraobserver reli-ability, indicates the level of agreement for one observer
for the classification of a specific fracture on separate
occasions. Interobserver reliability and intraobserverreproducibility of even the most widely accepted classi-
fication systems have been assessed only infrequently.
Recently, classification systems for fractures of the prox-
imal part of the femur”6, the ankl&5”7, and the carpal
scaphoid4 have been reviewed, and, in general the de-
gree of interobserver reliability has been disappointing.
Kristiansen et al. assessed the Neer classification system
in a similar manner and found the results to be highly
dependent on the level of experience of the observer9.
However, they used only anteropostenor and lateral
radiographs rather than the standard trauma series, and
they classified the fractures according to a simplified
system. They did not assess reproducibility.The purpose of the current study was to assess the
degree of interobserver reliability and intraobserver re-
producibility of the Neer classification system for prox-imal humeral fractures with use of the standard trauma
series of radiographs.
Materials and Methods
The radiographs of fifty proximal humeral fractures
or fracture-dislocations in adults who had been seen inthe emergency room or offices at our institution were
chosen for inclusion in the study. A standard traumaseries - scapular anteroposterior, scapular lateral, and
axillary radiographs - of good quality was available for
each patient. The radiographs were made with use of
a standardized technique. In nearly all patients, theproximal humeral fracture was an isolated injury. There-
fore, the scapular anteroposterior and scapular lateralradiographs were made with the patient standing, with
the arm on the chest (usually in a sling or shoulder-immobilizer). This allowed the patient to be positioned
oblique to the x-ray beam. The axillary radiograph was
made with the patient supine, with the arm abductedapproximately 60 to 70 degrees in the plane of the scap-
1746 M. L. SIDOR ET AL.
THE JOURNAL OF BONE AND JOINT SURGERY
ula but remaining in the same position of rotation as for
the scapular anteroposterior and scapular lateral radio-
graphs. In most situations, because of the acute nature
of the injury, positioning of the patient for the axillary
radiograph was performed by an orthopaedic surgeon.
If the patient was unable to stand, all radiographs were
made with the patient supine.
The acceptability of the quality of the radiographs
(for both projection and clarity) was determined by two
orthopaedic surgeons who did not serve as observers for
this study. They agreed that the fifty fractures included
most of the different patterns of proximal humeral frac-
tures. All identifying data (other than labels indicating
the right or left side) were obscured on the radiographs.
The series was arranged in random order and numbered
as Cases 1 through 50.
The radiographs were reviewed by five observers: an
orthopaedic surgeon subspecializing in problems of the
shoulder (a shoulder specialist), an orthopaedic surgeon
subspecializing in traumatology (a traumatologist), a ra-
diologist specializing in imaging of the musculoskeletal
system (a skeletal radiologist), and two orthopaedic res-
idents (one in the fifth year of postgraduate training and
one in the second year). None of the observers were
informed of the other participants in the study. This was
done to avoid any discussion of the radiographs after the
testing sequence.
Each observer was familiar with the Neer classifica-
tion system and had used it clinically. However, to stand-
ardize the information available to the observers at the
time of testing, we provided each one with a typed sum-
mary of the classification system. The Neer scheme for
the classification of proximal humeral fractures com-
prises four segments: the articular segment, the greatertuberosity, the lesser tuberosity, and the humeral shaft.
For a segment to be considered displaced, it must be
displaced by more than 1 .0 centimeter or angulated
more than 45 degrees. Displaced fractures are classified
as two, three, or four-part fractures on the basis of the
number of displaced segments. There are separate cate-
gories for fracture-dislocations (anterior or posterior)
and for fractures of the articular surface (so-called
impression fracture or head-splitting fracture). During
testing, each observer was given a diagram of the frac-
ture classification system that has appeared often in the
literature3 (Fig. 1). Each observer indicated a classifica-
tion for each fracture on the diagram by choosing one
of the sixteen different possibilities.
The radiographs were reviewed by each observer on
two separate occasions, six months apart. The observers
were not provided with any feedback after the first test-
ing. The radiographs were not available to any of the
observers between the first and second viewings. In ad-
dition, the observers’ classification choices made at the
first testing were not available during the second testing.
During the first review, the observers did not know that
they would be retested.
During the first testing, the observers were given the
typed summary of the Neer classification system and
were allotted a maximum of five minutes to read and
review it. They were not allowed to ask questions con-
cerning the information contained in the summary. They
were then given a data form, which included the dia-
gram of the classification system mentioned earlier. A
metric ruler and a goniometer were available for use
during testing. The trauma series (scapular anteropos-
tenor, scapular lateral, and axillary radiographs) was
reviewed for each fracture. Decisions about classifica-
tion were made on the basis of the entire trauma series,
with the observers indicating their choices on the dia-
gram. The observers were given an unlimited amount
of time to make their decisions. After a decision had
been made, the radiographs for the next fracture were
presented. The observers were not permitted to ask
questions of the proctor during or after review of theradiographs.
The second testing was performed in an identicalmanner, except that the series of radiographs was shown
in reverse order to inhibit the observers’ recall of the
decisions made during the first testing.
The interobserver reliability was assessed by com-
parison of the classifications decided on by the five
different observers for each of the fifty fractures. The
intraobserver reproducibility was determined by corn-
parison of the classifications decided on by each mdi-
vidual observer for the first and second testing sessions.
Statistical Analysis
Computer-generated kappa statistics (PC-Agree,
version 2.5; McMaster University, Hamilton, Ontario,
Canada) were used to analyze interobserver reliability
and intraobserver reproducibility. This analysis involves
adjustment of the observed proportion of agreement
between or among observers by correction for the pro-
portion of agreement that could have occurred by
chance. Hence, the adjusted values are almost always
lower than the observed values for the proportion of
agreement. The kappa coefficients range from +1.0( complete agreement) through 0 (chance agreement) to
less than 0 (less agreement than expected by chance).The lower boundary of kappa in this study, with use of
five observers, was -O.25�.
We used the guidelines proposed by Landis and
Koch for interpretation of these values to categorize the
kappa coefficients. Values of less than 0.00 indicated
poor reliability; 0.00 to 0.20, slight reliability; 0.21 to
0.40, fair reliability; 0.41 to 0.60, moderate reliability;0.61 to 0.80, substantial agreement; and 0.81 to 1.00,
excellent or almost perfect agreement. The kappa coef-
ficients for agreement among the two orthopaedic res-
idents were compared with those among the three
attending physicians with use of a Student t test that
incorporated the standard errors of kappa for these two
groups.
FIG. 1
THE NEER CLASSIFICATION SYSTEM FOR PROXIMAL HUMERAL FRACTURES 1747
VOL. 75.A, NO. 12. DECEMBER 1993
Displaced Fractures
Diagram showing the Neer classification system for proximal humeral fractures. (Modified from Neer, C. S., II: Displaced proximal humeralfractures. Part I. Classification and evaluation. J. Bone and Joint Surg., 52-A: 1079, Sept. 1970.)
We did not attempt to assess accuracy - that is, how
close an experimental observation lies to a true value -
because that would have required a known correct clas-sification for each fracture that was assessed. These data
were not available because the classification of each
fracture is a matter of interpretation by observers.
Rather, we assessed the level of agreement among dif-
ferent observers. It is important to note that agreement(reliability and reproducibility) does not necessarily re-
flect accuracy, for the reasons just stated.
Interobserver Reliability
Results
All five observers agreed on the classification for 32
per cent of the fractures during the first testing and for
30 per cent during the second testing. At least four ob-
servers agreed on the classification for 54 per cent of the
fractures during the first testing and for 62 per cent
during the second testing. When we decreased the ex-tent of agreement to at least three of the five observers,
there was 88 per cent agreement for both the first and
second testings. (Examples of fractures for which therewas poor agreement are shown in Figures 2 and 3.)
We analyzed the results on the basis of a pairwise
comparison among the five observers, which produced
ten paired analyses, and according to the over-all agree-
ment among the group. For the first viewing, the in-
terobserver reliability coefficients (kappa) ranged from
0.43 to 0.58, with an over-all value of 0.48. For the second
viewing, the kappa values ranged from 0.37 to 0.62, with
an over-all value of 0.52. There were no significant differ-
ences between the results of the first and second view-ings. For the first viewing, the best paired comparison was
between the traumatologist and the second-year ortho-
paedic resident and the worst, between the traumatol-ogist and the skeletal radiologist. For the second viewing,the best paired comparison was also between the trau-
matologist and the second-year resident and the worst,
between the skeletal radiologist and the fifth-year resi-
dent. The over-all kappa value for all paired comparisons
was 0.50, which represents moderate interobserver reli-
ability. None of the paired comparisons achieved almost-
perfect interobserver reliability (i � 0.81).
The effect of the level of expertise of the observers
was also assessed. The three attending physicians (the
shoulder specialist, the traumatologist, and the skeletal
FIG. 2
Figs. 2 and 3: Radiographs showing fractures for which there was poor interobserver agreement.Fig. 2: This fracture was classified as an articular surface fracture, a two-part surgical-neck fracture, a two-part fracture of the lesser
tuberosity, and a three-part fracture of the lesser tuberosity.
The Neer classification system includes sixteen pos-
sible categories for any fracture. We believed that the
I� I
...j
FIG. 3
1748 M. L. SIDOR ET AL.
THE JOURNAL OF BONE AND JOINT SURGERY
radiologist) achieved reliability coefficients of 0.47 for
the first viewing and 0.56 for the second viewing (mean,
0.52). The reliability coefficients for the orthopaedic res-
idents were 0.44 for the first viewing and 0.51 for the
second viewing (mean, 0.48). This represents moderate
reliability for both the attending physicians and the or-
thopaedic residents.
Intraobserver Reproducibility (Table I)
For all five observers, the reliability coefficient was
0.66, which represents a substantial level of reproducibil-
ity. The only observer to achieve an almost-perfect level
was the shoulder specialist (x� = 0.83); all others were in
the moderate to substantial range (K = 0.50 to 0.68).
There was no significant difference in reproducibility
when the values obtained by the attending physicians
were compared with those obtained by the orthopaedic
residents.
Effect ofSimplification ofthe Classification System
relatively low interobserver reliability and intraob-
server reproducibility might have been caused by the
complexity of the system’7. Therefore, we simplified the
sixteen possible choices by dividing them into six cate-
gories: one-part fractures (type 1), two-part fractures
(types 2 through 5), three-part fractures (types 8 and
9), four-part fractures (type 12), fracture-dislocations
(types 6, 7, 10, 1 1, 13, and 14), and fractures of the
articular surface (types 15 and 16). The original data
were analyzed again on the basis of this simplified sys-
tern; the observers were not re-tested but, rather, their
original classifications were regrouped.
There was no improvement for either interobser-ver reliability or intraobserver reproducibility with use
of this simplified system. Rather, the interobserver
reliability coefficients (kappa) for the first and second
viewings decreased slightly: to 0.42 for the first view-
ing and to 0.48 for the second viewing (compared with
0.48 and 0.52 for the first and second viewings, re-
spectively, with use of the sixteen-category classifica-
tion system). The value for reproducibility remained
the same (0.66) with use of both systems. (All of the
This fracture was classified as a two-part fracture of the greater tuberosity, a two-part surgical-neck fracture. a three-part fracture of thegreater tuberosity, and a four-part fracture.
THE NEER CLASSIFICATION SYSTEM FOR PROXIMAL HUMERAL FRACTURES 1749
VOL. 75-A, NO. 12, DECEMBER 1993
TABLE I
INTRAOBSERVER REPRODUCIBILITY AFTER Two REVIEWS
OF THE Fume FRACTURES. Six MONThS APART
ReproducibilityNon-
Reviewer Adjusted* Adjustedt
Shoulder specialist 0.86 0.83
Orthopaedic traumatologist 0.70 0.64
Skeletal radiologist 0.62 0.50
Fifth-year orthopaedic resident 0.74 0.68
Second-year orthopaedic resident 0.70 0.63
Mean 0.72 0.66
9The proportion of cases that was classified the same on both
viewings.
tReliability coefficient (kappa value).
values are adjusted kappa reliability coefficients.)
Discussion
Systems for the classification of fractures occupy a
central role in the practice of orthopaedic surgery. They
constitute a means for the description of fractures and
fracture-dislocations, and they provide important guide-
lines for treatment. Such systems have been used fre-
quently in the orthopaedic literature to describe the
results of specific treatments. Our ability to compare the
results of various treatments depends in large part on
the assumption that injuries of comparable severity are
being treated. Currently, fracture-classification systems
are the most common mechanism for assessment of the
comparability of different series.
Therefore, it is important that fracture-classification
systems be both reliable and reproducible. However, the
commonly used systems have infrequently been evalu-
ated for interobserver reliability and intraobserver re-
producibility, and the results of these few evaluations
have been disappointing. Nielsen et a!. found a low in-
terobserver reliability and intraobserver reproducibility
for the Lauge-Hansen classification of fractures of the
ankle and concluded that the system was difficult to
apply in a reproducible manner9. Thomsen et al. eval-
uated the Lauge-Hansen and Weber classifications of
ankle fractures and reported similar results. Frandsen et
al. assessed Garden’s classification scheme for fractures
of the femoral neck and found the interobserver re-
liability to be poor. Andersen et al. reported similar
results in their evaluation of Evans’ classification for
intertrochanteric fractures, although the interobserver
reliability was somewhat better than that for Garden’s
system.
The Neer classification is the most widely used
scheme for proximal humeral fractures. It has gained
wide clinical acceptance by orthopaedic surgeons and
radiologists and is considered to have important impli-
cations for both treatment options and outcomes562’4’6.
However, to our knowledge, only one report in the or-
thopaedic literature has dealt with its reliability. Kris-
tiansen Ct a!. reported a low level of interobserver reli-
ability among four observers of varying expertise who
evaluated a series of 100 proximal humeral fractures9.
These authors found the level of expertise to be an
important factor in the prediction of interobserver re-
liability. Their study, however, had a few important
limitations. First, they condensed the classification into
five groups (one-part, two-part, three-part, and four-
part fractures, and all other fractures and fracture-
dislocations), which were somewhat disparate in terms
of fracture type, treatment options, and prognosis. 5cc-ond, they did not use a complete trauma series but relied
only on anteroposterior and lateral radiographs. Finally,
reproducibility was not assessed9.
We used a complete trauma series to evaluate fifty
proximal humeral fractures that were classified on
the basis of the detailed (sixteen-category) system of
Neer. We found a moderate level of interobserver re-
liability among the five observers. When we analyzed
the results using pairs of observers, the reliability co-
efficient ranged from 0.37 to 0.62. An almost-perfect
level of reliability (K � 0.81) was not obtained for
any paired evaluation. It is interesting that the relia-
bility coefficients did not improve when we employed
a simplified (six-category) version of the classification
system.Although we used a complete trauma series, which
we thought would provide the maximum amount of
information that could be obtained from plain radio-
graphs, the reliability did not achieve the high levels thatwould be expected for such a widely used and accepted
classification scheme. We attribute this to several fac-
tors. First, proximal humeral fractures are inherently
complex injuries with multiple fracture lines, making it
difficult to assess the displacement of one segment in
relation to another. Second, although we used only ra-
diographs of good quality, the overlapping of osseous
densities in this anatomical region increases the diffi-
culty of interpretation. Any compromise of radiographic
technique can be expected to exacerbate this problem.
Additional radiographic studies might have provided
information that would have increased the level of reli-
ability. Computerized tomographic scans have been rec-
ommended for evaluation of the degree of displacement
of the tuberosities as well as for assessment of head-
splitting fractures, articular impression fractures, and
chronic fracture-dislocations237’5. However, their use is
not currently considered part of the standard evaluation
of these fractures. Finally, it is possible that the criteria
for displacement (more than one centimeter of displace-
ment or 45 degrees of angulation) are too difficult to
measure accurately on radiographs. None of our ob-
servers used a goniometer or a metric ruler to measure
displacement, although all were given the option to do
so. The level of expertise and experience also can affect
interobserver reliability9’7, although this did not appear
to be a significant factor when we compared the values
1750 M. L. SIDOR ET AL.
THE JOURNAL OF BONE AND JOINT SURGERY
obtained by the attending physicians with those ob-
tamed by the orthopaedic residents.
Intraobserver reproducibility was found to be higher
than interobserver reliability in the current study, and
this is consistent with other reports’4’5’7. The level of
expertise and experience was a significant factor. Of
the five observers, the shoulder specialist obtained the
highest correlation coefficient (K = 0.83). This was also
the only value in the entire study that was at the almost-
perfect level (K � 0.81) of reliability. Intraobserver re-
producibility usually exceeds interobserver reliability
because it reflects reproducibility independent of agree-
ment. Therefore, incorrect responses that are repeated
may show good intraobserver reproducibility but poor
interobserver reliability.
Important to any fracture classification system is its
relationship to the choice of treatment. In this respect,
interobserver reliability and intraobserver reproducibil-
ity are major considerations. Differences in classifica-
tion between observers that do not result in different
recommendations for treatment for a particular fracture
are considerably less important. Regardless of whether
a fracture is classified as two-part by one observer or
three-part by another, a procedure to reduce and fix the
fracture will generally be recommended. However, a
fracture classified as minimally displaced by one ob-
server and as three-part by another may be treated dif-
ferently by the two observers. Thus, not all differences
in classification are equal with respect to the implica-
tions for treatment and outcome.
References
1. Andersen, E.; Jorgensen, L. G.; and Hededam, L. T.: Evans’ classification of trochanteric fractures: an assessment of the interobserverand intraobserver reliability. Injury, 21: 377-378, 1990.
2. Bigliani, L U. Fractures of the shoulder. Part I: fractures of the proximal humerus. In Fractures in Adults, edited by C. A. Rockwood, Jr.,D. P. Green, and R. W. Bucholz. Ed. 3, vol. 1, pp. 881-882. Philadelphia, J. B. Lippincott, 1991.
3. Castagno, A. A.; Shuman, W. P.; Kilcoyne, R. F.; Haynor, D. R.; Morris, M. E.; and Matsen, F. A.: Complex fractures of the proximal
humerus: role of CT in treatment. Radiology, 165: 759-762, 1987.
4. Dias, J. J.; Taylor, M.; Thompson, J.; Brenkel, I. J.; and Gregg, P. J.: Radiographic signs of union of scaphoid fractures. An analysis ofinter-observer agreement and reproducibility. J. Bone and Joint Surg., 70-B(2): 299-301, 1988.
5. Fleiss, J. L: Statistical Methodsfor Rates and Proportions. Ed. 2, p. 217. New York, John Wiley and Sons, 1981.6. Frandsen, P. A.; Andersen, E.; Madsen, F.; and Skjadt, T.: Garden’s classification of femoral neck fractures. An assessment of inter-
observer variation. J. Bone and Joint Surg., 70-B(4): 588-590, 1988.7. Kilcoyne, R. F4 Shuman, W. P.; Matsen, F. A., III; Morris, M.; and Rockwood, C. A.: The Neer classification of displaced proximal
humeral fractures: spectrum of findings on plain radiographs and CT scans. AJR: Am. J. Roentgenol., 154: 1029-1033, 1990.8. Kristiansen, B., and Christensen, S. W.: Proximal humeral fractures. Late results in relation to classification and treatment. Acta Orthop.
Scandinavica, 58: 124-127, 1987.
9. Kristiansen, B.; Andersen, U. L.; Olsen, C. A.; and Varmarken, J. E.: The Neer classification of fractures of the proximal humerus. Anassessment of interobserver variation. Skel. Radiol., 17: 420-422, 1988.
10. Ku.hhnan, J. E.; Fishman, E. K.; Ney, D. R.; and Magid, D.: Complex shoulder trauma: three-dimensional CT imaging. Orthopedics, 11:
1561-1563, 1988.11. Landis, J. R., and Koch, G. G. The measurement of observer agreement for categorical data. Biometrics, 33: 159-174, 1977.12. Mills, H. .1., and Home, G.: Fractures of the proximal humerus in adults. J. Trauma, 25: 801-805, 1985.
13. Neer, C. S., II: Displaced proximal humeral fractures. Part I. Classification and evaluation. J. Bone and Joint Surg., 52-A: 1077-1089,
Sept. 1970.
14. Neer, C. S., II: Displaced proximal humeral fractures. Part II. Treatment of three-part and four-part displacement. J. Bone and Joint
Surg., 52-A: 1090-1 103, Sept. 1970.
15. Nielsen, J. 0.; Dons-Jensen, H.; and Sorensen, H. T.: Lauge-Hansen classification of malleolar fractures. An assessment of the reproduc-
ibility in 118 cases. Acta Orthop. Scandinavica, 61: 385-387, 1990.16. Seemann, W. -R.; Siebler, G.; and Rupp, H. -G.: A new classification of proximal humeral fractures. European J. Radio!., 6: 163-167, 1986.
17. Thomsen, N. 0. B.; Overgaard, S.; Olsen, L. H.; Hansen, H.; and Nielsen, S. T.: Observer variation in the radiographic classification of
ankle fractures. J. Bone and Joint Surg., 73-B(4): 676-678, 1991.