interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., cva, ms,...

21
Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=iptp20 Physiotherapy Theory and Practice An International Journal of Physical Therapy ISSN: 0959-3985 (Print) 1532-5040 (Online) Journal homepage: https://www.tandfonline.com/loi/iptp20 Interrater agreement and reliability of clinical tests for assessment of patients with shoulder pain in primary care Adri T Apeldoorn, Marjolein C Den Arend, Ruud Schuitemaker, Dick Egmond, Karin Hekman, Tjeerd Van Der Ploeg, Steven J Kamper, Maurits W Van Tulder & Raymond W Ostelo To cite this article: Adri T Apeldoorn, Marjolein C Den Arend, Ruud Schuitemaker, Dick Egmond, Karin Hekman, Tjeerd Van Der Ploeg, Steven J Kamper, Maurits W Van Tulder & Raymond W Ostelo (2019): Interrater agreement and reliability of clinical tests for assessment of patients with shoulder pain in primary care, Physiotherapy Theory and Practice, DOI: 10.1080/09593985.2019.1587801 To link to this article: https://doi.org/10.1080/09593985.2019.1587801 Published online: 22 Mar 2019. Submit your article to this journal Article views: 208 View Crossmark data

Upload: others

Post on 30-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Full Terms & Conditions of access and use can be found athttps://www.tandfonline.com/action/journalInformation?journalCode=iptp20

Physiotherapy Theory and PracticeAn International Journal of Physical Therapy

ISSN: 0959-3985 (Print) 1532-5040 (Online) Journal homepage: https://www.tandfonline.com/loi/iptp20

Interrater agreement and reliability of clinicaltests for assessment of patients with shoulderpain in primary care

Adri T Apeldoorn, Marjolein C Den Arend, Ruud Schuitemaker, Dick Egmond,Karin Hekman, Tjeerd Van Der Ploeg, Steven J Kamper, Maurits W Van Tulder& Raymond W Ostelo

To cite this article: Adri T Apeldoorn, Marjolein C Den Arend, Ruud Schuitemaker, DickEgmond, Karin Hekman, Tjeerd Van Der Ploeg, Steven J Kamper, Maurits W Van Tulder &Raymond W Ostelo (2019): Interrater agreement and reliability of clinical tests for assessmentof patients with shoulder pain in primary care, Physiotherapy Theory and Practice, DOI:10.1080/09593985.2019.1587801

To link to this article: https://doi.org/10.1080/09593985.2019.1587801

Published online: 22 Mar 2019.

Submit your article to this journal

Article views: 208

View Crossmark data

Page 2: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Interrater agreement and reliability of clinical tests for assessment of patientswith shoulder pain in primary careAdri T Apeldoorna,b, Marjolein C Den Arendc, Ruud Schuitemakerd, Dick Egmonde, Karin Hekmanf,Tjeerd Van Der Ploegg, Steven J Kamperh, Maurits W Van Tulderi, and Raymond W Osteloi,j

aRehabilitation Department, Noordwest Ziekenhuisgroep, Alkmaar, Netherlands; bBreederode Hogeschool, Rotterdam, Netherlands; cArcusGroup Physiotherapy, Hoeven, Netherlands; dSchuitemaker and van Schaik Physiotherapy and Manual Therapy, Amsterdam, Netherlands;eInstitute for Applied Manual Therapy, Wolfsburg, Germany; fIBC Amstelland, Amstelveen, Netherlands; gHogeschool Inholland, Alkmaar,Netherlands; hSchool of Public Health, University of Sydney, Sydney, Australia; iDepartment of Health Sciences, Faculty of Science, VrijeUniverteit Amsterdam, Amsterdam, Netherlands; jDepartment of Epidemiology and Biostatistics (Amsterdam UMC), location VUmc &Amsterdam Movement Sciences

ABSTRACTBackground: There is limited information about the agreement and reliability of clinical shouldertests.Objectives: To assess the interrater agreement and reliability of clinical shoulder tests in patientswith shoulder pain treated in primary care.Methods: Patients with a primary report of shoulder painunderwent a set of 21 clinical shoulder tests twice on the same day, by pairs of independent physicaltherapists. The outcome parameters were observed and specific interrater agreement for positive andnegative scores, and interrater reliability (Cohen’s kappa (κ)). Positive and negative interrater agree-ment values of ≥0.75 were regarded as sufficient for clinical use. For Cohen’s κ, the followingclassification was used: <0.20 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00very good reliability. Participating clinics were randomized in two groups; with or without a briefpractical session on how to conduct the tests. Results: A total of 113 patients were assessed in 12physical therapy practices by 36 physical therapists. Positive and negative interrater agreement valueswere both sufficient for 1 test (the Full Can Test), neither sufficient for 5 tests, and only sufficient foreither positive or negative agreement for 15 tests. Interrater reliability was fair for 11 tests, moderatefor 9 tests, and good for 1 test (the Full Can Test). An additional brief practical session did not result inbetter agreement or reliability. Conclusion: Clinicians should be aware that interrater agreement andreliability for most shoulder tests is questionable and their value in clinical practice limited.

ARTICLE HISTORYReceived 9 June 2018Revised 3 December 2018Accepted 18 January 2019

KEYWORDSPhysical therapy; diagnostictests; accuracy

Introduction

Shoulder pain is one of the three most common muscu-loskeletal disorders (Picavet and Schouten, 2003; Pope,Croft, Pritchard, and Silman, 1997), and in the top threehealth problems of patients treated by Dutch physicaltherapists (Barten and Koppes, 2016). The incidence ofshoulder pain in the Netherlands has been estimated to be19 per 1000 registered patients per year (Greving et al.,2012). About 13% of persons with shoulder pain who visita general practitioner are referred for physical therapy(Kooijman et al., 2013). Physical therapists also treatpatients without a referral (direct access). In 2015, 51%of patients visited a physical therapist without a referral(Barten and Koppes, 2016). Shoulder pain is often recur-rent and frequently persists long term. Karel et al. (2016)showed a recovery rate of 60% for the total population

and 65% for the working population treated with physicaltherapy after 26 weeks.

The assessment of the shoulder can be challengingas sensitivity and specificity of symptom provocationtests are often insufficient to rule-in, or rule-outresponsibility of the structure for patient symptoms.The relation between signs and symptoms, andimpaired motor control or structural failure observedon imaging or intraoperatively is poor. For example,Tempelhof, Rupp, and Seil (1999) found a high rate ofrotator cuff tears with increasing age in patients withasymptomatic shoulders. Due to these limitations, sev-eral researchers advise to use a comprehensive clinicalexamination including history and a combination ofshoulder tests (Cools, Cambier, and Witvrouw, 2008;Hegedus et al., 2015; Michener, Walsworth, Doukas,and Murphy, 2009). Clusters of tests and classification

CONTACT Adri T Apeldoorn [email protected] Rehabilitation Department, Noordwest Ziekenhuisgroep, Wilhelminalaan 12, Alkmaar 1815 JD,NetherlandsColor versions of one or more of the figures in the article can be found online at www.tandfonline.com/iptp.This article has been republished with minor changes. These changes do not impact the academic content of the article.

PHYSIOTHERAPY THEORY AND PRACTICEhttps://doi.org/10.1080/09593985.2019.1587801

© 2019 Taylor & Francis Group, LLC

Page 3: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

algorithms may be more beneficial for diagnosis inclinical practice compared to the use of single tests(Hegedus et al., 2015).

One of the classification systems that have been pro-posed is the classification algorithm of Cools, Cambier,and Witvrouw (2008). This algorithm uses not onlygenerally accepted symptom-provoking shoulder testslike the Neer Test, Hawkins-Kennedy Test and theEmpty Can Test, but also relatively new symptom alter-ing tests based on movement dysfunction like the mod-ified Scapular Assistance Test (mSAT) and the ScapularRetraction Test (SRT). While several authors have pro-vided valuable analyses of diagnostic accuracy and valid-ity of clinical shoulder tests, agreement and reliability ofthese single tests are also of concern. A recent systematicreview of the intra and interrater reliability of clinicalshoulder tests concluded that only a few tests haveacceptable levels of reliability (Lange et al., 2017). Theauthors stressed that their findings may be inaccurateand need to be interpreted with caution due to theheterogeneity among studies, a lack of high quality stu-dies, and the use of variable estimates of reliability. Theyconcluded that agreement and reliability studies usingappropriate methodology and statistical analysis areneeded (Lange et al., 2017).

It can be argued that for the clinicians the interrateragreement and reliability is not very important, as theygenerally establish diagnosis and treatment strategy bycombining findings of the patient history and clinicalassessment procedures. However, sufficient interrateragreement and reliability are essential requirements aspoor interrater agreement and reliability leads to dif-ferent results on the same tests, different clusters, anddifferent treatment plans. In the absence of adequatereliability there is little value in including a test ina clinical assessment schedule.

Several designs are available for the assessment ofinterrater agreement and reliability (e.g., independentand consecutive assessments, simultaneous assessmentprocedures with one rater and one observer, video-taped recordings, and vignette studies). Each designassesses a different aspect of agreement and reliability.For example, in vignette studies, ratings are based onthe same signs and symptoms but do not account forpatients that present differently in separate assess-ments and testing procedures. Independent and con-secutive assessments include variation in patientpresentations and testing procedures, but the down-side is that the first assessment might have an influ-ence on the second assessment. Nevertheless, wecontend that this latter method is an appropriatedesign for the assessment of interrater agreementand reliability, because in clinical practice the label

a patient receives is based on these tests and shouldnot depend on what the therapist he/she sees or doesnot see. So in order to test if two therapists wouldarrive at the same conclusion, this design is the mostappropriate. The aim of this study was to assess inter-rater agreement and reliability of physical shouldertests by independent and consecutive assessmentstwice on the same day, by pairs of independent phy-sical therapists. The secondary aim was to evaluatewhether agreement and reliability differs between phy-sical therapists who did or did not receive an addi-tional brief practical training session.

Methods

The study was approved by the EMGO+ ScienceCommittee on July 1, 2014, number WC2014-043 andthe Medical Ethical Committee of the VU UniversityMedical Center in Amsterdam approved the study onOctober 19, 2015, registration number 2014.482(NL47668.029.14) The trial was registered in theDutch trial register, number NTR5905. The presenta-tion of the results follows the Guidelines for ReportingReliability and Agreement Studies (GRRAS) (Kottneret al., 2011). This study design is a single-group test–retest design.

Subjects

Patients with a primary report of shoulder pain wererecruited by participating physical therapy clinics.Inclusion criteria were: shoulder (girdle) pain, with orwithout radiation into the arm, age over 17, and ade-quate command of the Dutch language. Exclusion cri-teria were: recent (< 3 months) surgery or shoulderfracture, shoulder pain as a result of possible cervicalnerve root entrapment (e.g., positive Spurling and/ortraction test) (Bertilson, Grunnesjo, and Strender, 2003;Tong, Haig, and Yamakawa, 2002), possible total cufftear (i.e., positive lag signs like the external rotation lagsign and the Hornblower’s sign) (Jain et al., 2017),serious diseases (e.g., malignancies), rheumatic and/orneurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain,dementia, and/or psychological disorders. Patients withpartial cuff ruptures or calcifications as established byMRI or ultrasound were not excluded.

Design

Participating physical therapists informed all eligiblepatients who attended their primary care clinic fortheir shoulder pain about the study. Eligible patients

2 A. T. APELDOORN ET AL.

Page 4: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

were further instructed about the study througha participant information letter. Patients were given atleast 2 days’ time to consider whether to participate instudy or not. The first rater collected informed consentof the patient, and checked the inclusion and exclusioncriteria. Information about the patient was collectedregarding demographic characteristics (i.e., age andgender) and duration of symptoms, previous historyof shoulder pain, pain intensity, functional status, edu-cation, employment and psychological status by meansof a questionnaire. Disability was measured with theDutch version of the Shoulder Pain and DisabilityIndex (SPADI-D) (Roach, Budiman-Mak, Songsiridej,and Lertratanakul, 1991), which is a reliable and validmeasure of shoulder disability in primary care(Thoomes-de Graaf et al., 2015).

Patients were assigned to two physical therapistssuccessively. Both physical therapists performed eachindividual test according to the same procedure, andthey followed a particular order that was establisheda priori. The second physical therapist was blinded tothe results of the first physical therapist and per-formed the tests within 30 minutes of the first assess-ment. The patient was asked not to communicate anydetails or outcome of the first assessment withthe second rater.

Patients could have been referred by their generalpractitioner or medical specialist or visited the clinicwithout referral (direct access). Patients could be

assessed at the first visit at the clinic, or ata subsequent visit. In case of the first visit, both ratershad no or minor information from the referring generalpractitioner about the clinical diagnosis or patient his-tory. Examples of minor information are “shoulderproblems,” “it might be impingement,” “patient doesnot want an injection.” In case the patient was includedafter the first visit, the rater who had seen the patientbefore had information about the patient history andclinical diagnosis. All tests were scored positive, nega-tive or not applicable on a standardized data form.Tests were regarded not applicable if the test couldnot be performed (i.e., the patient could not accomplishthe shoulder test position due to a restricted range ofmotion or pain), or the test was not informative (e.g.,the mSAT is not applicable if the patient does notreport symptoms with active elevation).

Provocation tests were applied with a minimalamount of stress. When a patient reported (increaseof) pain the test was ended immediately. The writtenevaluations were placed in envelopes, and the ratersremained blind to the other’s classification decisionduring the study. Figure 1 shows a detailed descriptionof the study design.

Before each examination, current shoulder pain wasassessed on a numerical rating scale (NRS: 0–10) tocheck stability of the patient’s pain between examina-tions. Unstable patients were a priori defined as thosehaving > 2 points change.

Patients who report primary shoulder pain at participating physical therapy clinics are informed about the study and

receive written information. Patients are given at least 2 days’ time to consider whether to

participate in this study or not.

Patients who are interested in participating are scheduled for examinations by two different independent physical therapists on

the same day. Patients are assigned to physical therapists based on scheduling

availability. Patient is not interested

Stop Patient signs informed consent, in- and exclusion criteria are checked by the physical therapist who assesses the patient first

and the patient fills in questionnaires. Physical therapist 1: Tests patient according to protocol. Physical therapist 2: Tests patient according to protocol

(within 30 minutes of the 1st physical therapist assessment).

Patient is not interested.

Stop

The physical therapy clinic informs the general practitioner of the patient about participation in this study. Completed data forms

are sent to the principal investigator.

Patient is not interested and/or does not meet the inclusion criteria

Stop

Figure 1. Study design.

PHYSIOTHERAPY THEORY AND PRACTICE 3

Page 5: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Therapists and clinics

Private physical therapy clinics of the ShoulderNetwork Amsterdam and of the personal network ofthe principal investigator (AA) were invited to partici-pate. Physical therapy clinics could participate if at leasttwo physical therapists of that clinic were interested inparticipating in the study.

Prior to recruitment of study patients, three patientswere pilot-tested by participating physical therapists tostandardize techniques and interpretations. Due to lim-ited resources for this study physical therapists werenot asked to record the number of patients who refusedto participate or reasons for not participating.

Physical therapy clinics received instructions by theprincipal investigator consisting of leaflets and access toa video demonstrating how all tests are performed. Theleaflet and the video detailed the performance andscoring of the tests (positive or negative). Half of theclinics received an additional brief face-to-face practicalsession of approximately 45 minutes, delivered by theprincipal investigator in order to assess if this briefpractical session had an influence on results.

Randomization

Participating physical therapy clinics were randomized intwo groups regarding the level of instructions theyreceived prior to the study. Clinics were randomizedusing a computer-generated randomization list. To con-ceal the allocation, randomization was conducted by anindependent person unaware of the purpose of the study.

Tests

The selection of shoulder tests was mainly based on theclassification algorithm of Cools, Cambier, andWitvrouw (2008). This algorithm uses not only gener-ally accepted symptom-provoking shoulder tests likethe Neer Test, Hawkins-Kennedy Test, and the EmptyCan Test, but also relatively new symptom altering testsbased on movement dysfunction like the mSAT and theSRT. There is very little or no information about theagreement or reliability of these latter tests available(Lewis, 2009).

Prior to this study, the principal investigator dis-cussed the design of the study and the operationaldefinitions of the clinical tests in detail with two coremembers of the Shoulder Network Amsterdam (coau-thors RS and KH) and coauthor DE. After consultationthe Shoulder Network Amsterdam (a part of the Dutchshoulder Network), four extra tests were added; theScapula Position (visual observation of the scapula in

neutral standing), Internal Rotation Resistance StrengthTest, Impingement Relief Test (IRT), and theCombined Reduction Test (CRT). The CRT combineselements of the mSAT, SRT, and the IRT. The ShoulderNetwork Amsterdam promotes the use of symptomreduction tests. If reduction of symptoms is foundwith a reduction test, the same technique can be usedto treat symptoms.

The test procedures recommended in a Dutch stan-dard textbook for the clinical examination and treat-ment of extremities (Egmond and Schuitemaker, 2014)slightly differ from the instructions of Cools, Cambier,and Witvrouw (2008). Therefore, RS and DE producedan instruction video of the performance of all 21shoulder tests exclusively for the present study accord-ing to the Dutch textbook. Operational definitions ofthe selected tests of the present study are provided inAppendix 1.

Table 1 presents a summary of the interrater agree-ment and reliability findings of the 21 selected clinicalshoulder tests. The results are based on an extensivesearch in Medline, from inception till 1 March 2018,and performed by the principal investigator (AA).Studies were included if they selected subjects withshoulder pain or a mix of subjects with and withoutshoulder pain. Studies that included asymptomatic sub-jects or subjects with shoulder pain due to neurologicalproblems or neck or upper limb symptoms wereexcluded. Studies that based their results on the assess-ment on the affected and the nonaffected limb were alsoexcluded. The search strategy was restricted to English,French, Dutch, and German-language papers. The refer-ence list of identified articles was checked for additionalstudies. Also, three relevant systematic reviews on mea-surement properties of shoulder tests were checked(D’Hondt et al., 2017; Lange et al., 2017; May et al., 2010).

Sample-size estimation

Sample size estimations for reliability parameters arenot a matter of statistical significance (de Vet, Terwee,Mokkink, and Knol, 2011; Tooth and Ottenbacher,2004). Sample size however does affect the precisionof the reliability estimates, and therefore de Vet,Terwee, Mokkink, and Knol (2011) advise to includeat least 50 patients to fill a 2 × 2 table. Because physicaltherapists were randomized in two groups, the plan wasto recruit at least 100 patients in total.

Data analyses

Descriptive statistics and frequency distributions of allbaseline variables were assessed. As recommended by

4 A. T. APELDOORN ET AL.

Page 6: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Table 1. Overview of results of the literature search of interrater agreement and interrater reliability of clinical shoulder tests used inthe present study.

Test StudySubjects,

nObserved

agreement, %

Positivespecific

agreement

Negativespecific

agreement Kappa coefficient (95%CI)

Scapula PositionExternal RotationResistance Test

Nørregaard, Krogsgaard,Lorenzen, and Jensen, 2002

68 86 0.67

Hayes and Petersen, 2003 18 61 0.37 (0.01-0.74)(grading included 4 classifications)

Ostor et al, 2004 136 NR 0.45; 0.18; 0.38(three pairs of examiners ab; bc; ac)

Nanda et al, 2008 63 80 0.44Michener, Walsworth, Doukas,and Murphy, 2009

55 87 0.67 (0.40-0.94)

Empty Can Test (JobeTest)

Ostor et al, 2004 136 NR 0.49; 0.44; 0.46(three pairs of examiners ab; bc; ac)

Holtby and Razmjou, 2004 152 NR 0.43Nanda et al, 2008 63 77 0.44Johansson and Ivarson, 2009 33 NR 0.94Michener, Walsworth, Doukas,and Murphy, 2009

55 76 0.47 (0.22-0.72)

Vind et al, 2011 44 95 0.90 (0.76-1.00)Burns, Cleland, Carpenter, andMintken, 2016

21 91 (pain)67 (weakness)

0.69 (0.28-1) (pain)0.35 (0-0.74) (weakness)

Full Can Test Burns, Cleland, Carpenter, andMintken, 2016

21 81 0.23 (0-0.65) (pain)0.38 (0.01-0.76) (weakness)

Active CompressionTest (O’Brien’sTest)

Walsworth et al, 2008 55 64 0.24 (-0.02-0.50)Cadogan et al, 2011 40 88

70

0.22 (-0.24-0.68) (pain on ‘top’ of theschoulder)0.38 (0.1-0.65) (pain ‘inside’ the shoulder)

Burns, Cleland, Carpenter, andMintken, 2016

21 81 0.57 (0.19-0.95)

Neer Test Razmjou, Holtby, and Myhr,2004

149 77 0.67 0.82 0.51 (0.36-0.65)

Nanda et al, 2008 63 75 0.10Johansson and Ivarson, 2009 33 NR 1.0Michener, Walsworth, Doukas,and Murphy, 2009

55 71 0.40 (0.13-0.67)

Vind et al, 2011 44 58 0.95 (0.86-1.00)Burns, Cleland, Carpenter, andMintken, 2016

21 75 0.51 (0.13-0.88)

Hawkins-KennedyTest

Nørregaard, Krogsgaard,Lorenzen, and Jensen, 2002

68 NR 0.07

Razmjou, Holtby, and Myhr,2004

150 60 0.57 0.63 0.29 (0.15-0.43)

Ostor et al, 2004 136 NR 0.29; 0.18; 0.43(three pairs of examiners ab; bc; ac)

Nanda et al, 2008 63 95 0.55Johansson and Ivarson, 2009 33 NR 0.91Michener, Walsworth, Doukas,and Murphy, 2009

55 69 0.39 (0.12-0.65)

Cadogan et al, 2011 40 68 0.38 (0.10-0.63)Vind et al, 2011 44 82 0.60 (0.34-0.85)Burns, Cleland, Carpenter, andMintken, 2016

21 86 0.71 (0.41-1)

Kim Test Kim, Park, Jeong, and Shin,2005

172 NR 0.91 (unclear if this is the kappa coefficientor the percentage of agreement)

Cadogan et al, 2011 40 85 -0.04 (-0.12-0.03)Biceps Load II Test Kim et al, 2001 127 NR 0.82Internal RotationResistanceStrength Test

Load and Shift Test Tzannes, Paxinos, Callanan,and Murrell, 2004

13 NR 0.42-0.72* (four examiners)

Burns, Cleland, Carpenter, andMintken, 2016

21 95-100 0.64 (0-1) right shoulder1.0 (1.0-1.0) left shoulder

Eshoj et al, 2018 40 95 0.50 0.97 0.48 (0.00-1.00)AcromioclavicularJoint Stress Test

Modified ScapularAssistance Test

Rabin, Irrgang, Fitzgerald, andEubanks, 2006

46 77-91 0.53-0.62

Kopkow, Lange, Schmitt, andKasten, 2015

110 89 0.75 0.93 0.68 (0.50-0.85)

Scapular RetractionTest

(Continued )

PHYSIOTHERAPY THEORY AND PRACTICE 5

Page 7: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

the GRRAS guidelines (Kottner et al., 2011) proportionsof observed and specific interrater agreement and kappastatistics were used to determine interrater agreement andreliability. Observed agreement is the portion of cases forwhich the raters agree. Specific agreement quantifies thedegree of agreement for positive and negative scoresseparately. Positive agreement (PA) is calculated ina 2 × 2 table by 2a/2a+b + c. Negative agreement (NA)by 2d/2d+b + c. Cell “a” contains the number of scores forwhich raters agree on positive scores, cells “b” and “c”contain the number of scores for which they disagree, andcell “d” the number for which raters agree on negativescores (Table 2). Specific agreement is a useful measurebecause it highlights where the difficulties lie (deVet et al.,2013; Hripcsak and Heitjan, 2002). Interrater reliability isthe observed proportion of agreement corrected forchance. A standard measure of interrater reliability isknown as Cohen’s κ.

The degree to which interrater agreement and reliabil-ity values are sufficient for clinical decision-makingdepends on the purpose and consequences of test results(Kottner et al., 2011). We decided that a positive andnegative agreement value of ≥ 0.75 was relevant to con-sider in this setting. In other words, we assume that theinterrater agreement is clinically relevant and acceptable ifthe probability that a physical therapist will agree with the

opinion of a colleague physical therapist is ≥ 0.75.However, we realize that this cut-off value is based onarguments and lacks clear foundation. For Cohen’s κ, thefollowing classification was used: < 0.20 poor, 0.21–0.40fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00 verygood reliability (Altman, 1997).

Interrater agreement and reliability values of shouldertests were calculated separately for the two groups ofphysical therapists with different levels of instruction,that is, with or without a brief practical session on howto conduct the tests. Differences between characteristics ofphysical therapists who received a brief practical session ornot were analyzed with appropriate methods (i.e., chi-square tests, Mann-Whitney tests, and unpaired t-tests).Also, differences between baseline characteristics ofpatients assessed by physical therapists who receiveda brief practical session or not were analyzed. Ina further analysis, the interrater agreement and reliabilityof shoulder tests were calculated excluding patients who

Table 1. (Continued).

Test StudySubjects,

nObserved

agreement, %

Positivespecific

agreement

Negativespecific

agreement Kappa coefficient (95%CI)

Impingement ReliefTest

Nørregaard, Krogsgaard,Lorenzen, and Jensen, 2002

68 NR 0.34

Sulcus Sign Test Nørregaard, Krogsgaard,Lorenzen, and Jensen, 2002

68 NR 0.13

Tzannes, Paxinos, Callanan,and Murrell, 2004

13 NR 0.60* (four examiners)

Eshoj et al, 2018 40 75 0.58 0.82 0.43 (0.17-0.72)Apprehension Test Vind et al, 2011 44 86 0.71 (0.59-0.98)

Tzannes, Paxinos, Callanan,and Murrell, 2004

13 NR 0.31* (pain)0.47* (apprehension)0.44* (pain and/or apprehension)(four examiners)

Eshoj et al, 2018 40 83 0.80 0.84 0.65 (0.38-0.85)Relocation Test Tzannes, Paxinos, Callanan,

and Murrell, 200413 NR 0.31* (pain)

0.71* (apprehension)0.44* (pain and/or apprehension)(four examiners)

Eshoj et al, 2018 40 75 0.55 0.83 0.39 (0.07-0.68)Release Test Tzannes, Paxinos, Callanan,

and Murrell, 200413 NR 0.31* (pain)

0.63* (apprehension)0.45* (pain and/or apprehension)(four examiners)

Eshoj et al, 2018 40 65 0.80 0.84 0.65 (0.38-0.85)Combined ReductionTest

GlenohumeralInternal RotationDeficit Test

Abbreviations: NR: not reported*Intra Class Correlation

Table 2. Two-by-two table to calculate specific agreement.Rater 2

Rater 1 Positive Negative Total rater 2Positive a b a + bNegative c d c + dTotal rater 1 a + c b + d a + b + c + d

6 A. T. APELDOORN ET AL.

Page 8: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

were ‘unstable’ (> 2 points of change on an 11-point NRS)between examinations.

A P value of less than 0.05 (two-tailed) was consideredas statistically significant. The data were analyzed usingIBM SPSS Statistics Version 20.0 (IBM Corporation,Armonk, NY, USA), VasserStats (http://vassarstats.net/).

Results

From July 2016 until December 2016, 113 patientswere assessed in 12 physical therapy practices by 36physical therapists. Physical therapists’ mean age was38.3 years (standard deviation [SD] 11.1), and theirmean duration of experience in physical therapy was14.4 years (SD 14.4). Seventeen participating physicaltherapists (47.2%) had formal postgraduate manualtherapy training. The median number of patientsassessed by each practice was 10 (inter quartile range[IQR] 6–12). Patients were assessed during their first(74.8%), second (16.8%), third (2.8%), fifth (3.7%),15th (0.9%), or 25th (0.9%) visit at their physical ther-apy clinic for their shoulder pain.

Of the 113 participating patients, there were nowithdrawals during the assessment procedure. In allpatients, the time between the two assessment proce-dures was not more than 30 minutes. The majority ofpatients were female (61.2%), had chronic shoulderpain (>12 weeks, 72.6%), and the mean SPADI-D was45.2 (SD 21.4). Table 3 presents the clinical and demo-graphic characteristics of the patients.

The proportion of missing values and not applicablescores for all clinical shoulder tests was 8.7% (411/4746). Most not applicable scores were found for theRelocation Test (46.0%, 104/226), Release Test (46.9%,106/226), and the CRT (49.6%, 112/226).

For seven shoulder tests the positive agreement wasregarded as sufficient (≥ 0.75); Empty Can Test, FullCan Test, Active Compression Test, Neer Test, mSAT,Relocation Test, and the Release Test. For 10 shouldertests the negative agreement was regarded as sufficient(≥ 0.75): External Rotation Resistance Test, Full CanTest, Kim Test, Biceps Load II Test, Internal RotationResistance Strength Test, Load and Shift Test,Acromioclavicular Joint Stress Test, Sulcus Sign Test,CRT, and the Glenohumeral Internal Rotation DeficitTest. Only the Full Can Test scored sufficient on bothpositive and negative agreement (Table 4). A total of 11tests scored fair, 9 moderate, and 1 good (Full CanTest) on interrater reliability (Table 4).

Randomization of the clinics resulted in 7 out the 12practices receiving an additional brief practical session of45 minutes. In the group without this additional training,16 physical therapists assessed 73 patients, and in thegroup with additional training 20 physical therapistsassessed 40 patients. Patients assessed by physical thera-pists in the group without additional training had a lowereducation level compared to the high-level instructiongroup (i.e., patients with a low, middle or high-level ofeducation in percentages: 25, 60, 15 vs. 15, 35, 50,P < 0.001). The two groups of patients were not differenton the other evaluated characteristics. Physical therapists

Table 3. Characteristics of patients (n = 113).Characteristics

Age, years, mean (SD) (range) 50.7 (14.4) (18–84)Sex, n male (%) 45 (39.8)Regard themselves as being Dutch, n (%) 110 (97.3)Shoulder affected, n right (%) 53 (46.9)Arm affected, n dominant (%) 63 (55.8)Duration of current shoulder pain (weeks), median (IQR) 36.0 (12.0–136.0)Acute (0–6 weeks), n (%) 16 (14.2)Sub-acute (7–12 weeks), n (%) 15 (13.3)Chronic (> 12 weeks), n (%) 82 (72.6)

Previous shoulder surgery, n (%) 8 (7.1)Previous shoulder fracture, n (%) 1 (0.9)Medical images (X-ray, CT, MRI, ultrasound), n (%) 38 (33.6)Pain past week, Numerical Rating Scale, (0–10), mean (SD) (range) 5.4 (2.1) (1–9)SPADI-D (0–100), mean (SD) (range) 45.2 (21.4) (4–94)Currently taking pain medication, n (%) 11 (9.7)Marital statusWith a partner, n (%) 80 (70.8)Single, n (%) 33 (29.2)

Educational levelLow, n (%) 24 (21.2)Middle, n (%) 58 (51.3)High, n (%) 31 (27.4)

Short-Form 36Physical Component Summary (0–100), mean (SD) (range) 43.6 (8.7) (18.8–64.7)Mental Component Summary (0–100), mean (SD) (range) 52.0 (10.5) (14.8–67.6)

Abbreviations: SD: standard deviation; IQR: interquartile range; CT: computed tomography; MR: magnetic resonance imaging; SPADI-D:Shoulder Pain and Disability Index (Dutch version)

PHYSIOTHERAPY THEORY AND PRACTICE 7

Page 9: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

in the group without additional training had less experi-ence in clinical practice compared to the group withadditional training (median 8.0 years [IQR 4.0–12.0] vs.16.0 years [IQR 10.3–28.8], P = .01), and were younger(mean 32.7 year [SD 7.9] vs. 42.5 year [SD 11.5], P = .01).The proportion of manual therapists was similar betweenthe two groups. In general, the group that received anadditional brief practical session did not score better oninterrater agreement and reliability values than the groupwithout an additional brief practical session (Table 5). Ina second additional analysis, 21 patients that reporteda change of pain of > 2 points between the two examina-tions were excluded. Of these 21 patients, 10 patientsreported a reduction and 11 patients an increase ofpain. In general, excluding these 21 patients did notresult in higher agreement values. Positive agreementwas regarded as sufficient for 7 tests, and negative agree-ment for 10 tests. Cohen’s κ ranged from 0.24 to 0.66(Table 6).

Discussion

The present study investigated the interrater agreementand reliability of 21 clinical shoulder tests commonlyused in the assessment of patients with shoulder pain inprimary care. The observed interrater agreement valuesranged from 63% to 85% for all shoulder tests. Thepositive interrater agreement was sufficient for 7shoulder tests, and the negative interrater agreementsufficient for 10 shoulder tests. Only one test (Full

Can Test) scored sufficient positive and negative agree-ment values for clinical use. The Full Can Test was alsothe only test with a good interrater reliability. For theother 20 tests, the interrater reliability was fair (11 tests)or moderate (9 tests).

Previous interrater agreement and reliability studiesof shoulder tests reported observed agreement valuesbetween 58 and 100% (Table 1). Although our agree-ment values were somewhat lower for the Kim Test,mSAT, and the Apprehension Test, in general, theywere similar to previously reported values. It should benoted that comparison with other studies is hampereddue to several reasons including no or a small number ofstudies available for each test, the variety of agreementand reliability parameters reported, and due differencesof the populations studied. For example, Eshoj et al.(2018) reported sufficient interrater agreement and relia-bility values for the Apprehension Test and the ReleaseTest, however their population sample was small(n = 40), and most of the subjects were normal shoulderindividuals (68%). In the present study only patientswith a primary report of shoulder pain were included.Besides, the values in Table 1 are not necessarily validdue to poor methodological quality of the majority of theprevious studies (Lange et al., 2017). Improved compar-ability of studies in the future will be facilitated byconcordance with minimal standards as outlined in theGRRAS (Kottner et al., 2011).

In general, agreement and reliability studies reportobserved agreement values and Cohen’s κ values.

Table 4. Interrater agreement and interrater reliability (n = 113).

TestObserved agreement, %

(95%CI)Positive specificagreement

Negative specificagreement

Cohen’s kappa (95%CI)

Scapula Position 71.8 (62.3–79.8) 0.71 0.73 0.44 (0.27–0.60)External Rotation Resistance Test 75.2 (66.1–82.7) 0.72 0.78* 0.50 (0.34–0.66)Empty Can Test(Jobe test)

76.1 (67.0–83.4) 0.79* 0.72 0.51 (0.34–0.66)

Full Can Test 83.0 (74.5–89.2) 0.75* 0.87* 0.62 (0.47–0.78)Active Compression Test(O’Brien’s test)

74.1 (64.8–81.7) 0.79* 0.67 0.46 (0.29–0.63)

Neer Test 76.2 (66.9–83.6) 0.83* 0.59 0.43 (0.23–0.62)Hawkins-Kennedy Test 67.9 (58.3–76.2) 0.74 0.59 0.33 (0.15–0.51)Kim Test 70.6 (61.0–78.8) 0.56 0.78* 0.34 (0.14–0.53)Biceps Load II Test 83.2 (74.5–89.5) 0.40 0.90* 0.31 (0.01–0.60)Internal Rotation Resistance Strength Test(Zaslav Test)

76.8 (67.7–84.0) 0.64 0.83* 0.47 (0.30–0.65)

Load and Shift Test 85.0 (76.7–90.7) 0.48 0.91* 0.40 (0.13–0.66)Acromioclavicular Joint Stress Test 76.1 (67.0–83.4) 0.64 0.82* 0.47 (0.29–0.64)Modified Scapular Assistance Test 71.4 (61.3–79.9) 0.77* 0.62 0.39 (0.20–0.58)Scapular Retraction Test 62.9 (52.4–72.3) 0.56 0.68 0.25 (0.06–0.45)Impingement Relief Test 67.7 (57.3–76.7) 0.62 0.72 0.35 (0.16–0.54)Sulcus Sign Test 84.8 (76.5–90.7) 0.45 0.91* 0.36 (0.09–0.64)Apprehension Test 66.7 (57.0–75.2) 0.70 0.62 0.32 (0.14–0.50)Relocation Test 67.4 (51.3–80.5) 0.76* 0.50 0.27 (0.00–0.58)Release Test 75.6 (59.4–87.1) 0.81* 0.64 0.46 (0.17–0.75)Combined Reduction Test 69.0 (52.8–81.9) 0.48 0.78* 0.26 (0.00–0.60)Glenohumeral Internal Rotation Deficit Test 80.9 (72.1–87.5) 0.68 0.86* 0.54 (0.37–0.72)

Abbreviations: CI: confidence interval; *Sufficient agreement for positive agreement or negative agreement (≥ 0.75)

8 A. T. APELDOORN ET AL.

Page 10: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Table5.

Interrater

agreem

entandinterrater

reliabilityforph

ysicaltherapists

that

received

high

-leveland

low-levelinstructio

n.High-levelinstructio

n(20ph

ysical

therapists,4

0patients)

Low-levelinstructio

n(16ph

ysical

therapists,7

3patients)

Test

Observedagreem

ent%

(95%

CI)

Positive

specific

agreem

ent

Negative

specific

agreem

ent

Cohen’skapp

a(95%

CI)

Observedagreem

ent

%(95%

CI)

Positive

specific

agreem

ent

Negative

specific

agreem

ent

Cohen’skapp

a(95%

CI)

ScapulaPositio

n62.2

(44.8–77.1)

0.56

0.67

0.23

(−0.09–0.55)

76.7

(65.1–85.5)

0.77*

0.76*

0.54

(0.31–0.76)

ExternalRotatio

nResistance

Test

77.5

(61.1–88.6)

0.71

0.82*

0.53

(0.22–0.84)

74.0

(62.2–83.2)

0.72

0.75*

0.48

(0.25–0.71)

EmptyCanTest(Job

etest)

72.5

(55.9–84.9)

0.65

0.78*

0.42

(0.11–0.73)

78.1

(66.6–86.6)

0.84*

0.65

0.49

(0.26–0.72)

FullCanTest

77.5

(61.1–88.6)

0.61

0.84*

0.45

(0.15–0.76)

86.1

(75.5–92.8)

0.81*

0.89*

0.70

(0.47–0.93)

ActiveCo

mpression

Test

(O’Brien’stest)

74.4

(57.6–86.4)

0.67

0.79*

0.46

(0.15–0.77)

74.0

(62.2–83.2)

0.82*

0.54

0.36

(0.13–0.59)

NeerTest

87.5

(72.4–95.3)

0.91*

0.78*

0.70

(0.45–0.95)

70.0

(57.2–79.8)

0.78*

0.49

0.27

(0.04–0.51)

Haw

kins-Kennedy

Test

60.0

(43.4–74.7)

0.67

0.50

0.17

(−0.14–0.48)

72.2

(60.2–81.8)

0.77*

0.64

0.42

(0.18–0.65)

Kim

Test

65.0

(48.3–78.9)

0.56

0.71

0.28

(−0.02–0.58)

73.9

(61.7–83.4)

0.55

0.82*

0.37

(0.13–0.60)

Biceps

Load

IITest

81.1

(64.3–91.4)

0.22

0.89*

0.13

(−0.17–0.43)

84.3

(73.2–91.5)

0.48

0.91*

0.39

(0.16–0.62)

InternalRotatio

nResistance

Streng

thTest

(ZaslavTest)

72.5

(55.9–84.9)

0.48

0.81*

0.29

(−0.02–0.60)

79.2

(67.7–87.5)

0.71

0.84*

0.55

(0.33–0.77)

Load

andShift

Test

72.5

(55.9–84.9)

0.27

0.83*

0.11

(−0.19–0.41)

91.8

(82.4–96.6)

0.67

0.95*

0.62

(0.40–0.84)

Acromioclavicular

JointStress

Test

82.5

(66.6–92.1)

0.59

0.89*

0.48

(0.18–0.78)

72.6

(60.7–82.1)

0.66

0.77*

0.43

(0.21–0.66)

Mod

ified

Scapular

Assistance

Test

45.2

(27.8–63.7)

0.51

0.37

−0.10

(−0.45–0.24)

83.6

(72.1–91.1)

0.87*

0.77*

0.64

(0.40–0.88)

Scapular

Retractio

nTest

61.3

(42.3–77.6)

0.55

0.67

0.23

(−0.09–0.56)

63.6

(50.8–74.9)

0.57

0.68

0.26

(0.02–0.50)

Imping

ementReliefTest

70.0

(50.4–84.6)

0.67

0.73

0.39

(0.04–0.75)

66.7

(53.9–77.5)

0.59

0.72

0.33

(0.11–0.56)

Sulcus

Sign

Test

74.4

(57.6–86.4)

0.50

0.83*

0.33

(0.01–0.64)

90.4

(80.7–95.7)

0.36

0.95*

0.31

(0.08–0.54)

Apprehension

Test

60.0

(43.4–74.7)

0.67

0.50

0.20

(−0.08–0.48)

70.4

(58.3–80.4)

0.73

0.68

0.41

(0.19–0.64)

Relocatio

nTest

66.7

(38.7–87.0)

0.78*

0.29

0.12

(−0.31–0.55)

67.9

(47.6–83.4)

0.74

0.57

0.32

(−0.05–0.68)

ReleaseTest

64.3

(35.6–86.0)

0.78*

<0.01

−0.21

(−0.72–0.30)

81.5

(61.3–93.0)

0.84*

0.78*

0.63

(0.26–0.99)

CombinedRedu

ctionTest

71.4

(51.1–86.1)

0.60

0.78*

0.38

(0.02–0.75)

64.3

(35.6–86.0)

<0.01

0.78*

−0.21

(−0.72–0.30)

Gleno

humeralInternalRotatio

nDeficitTest

67.5

(50.8–80.9)

0.70

0.65

0.36

(0.06–0.66)

88.6

(78.2–94.6)

0.64

0.93*

0.57

(0.34–0.80)

Abbreviatio

ns:C

I:confidence

interval;*Sufficientagreem

entforpo

sitiveagreem

entor

negativeagreem

ent(≥

0.75)

PHYSIOTHERAPY THEORY AND PRACTICE 9

Page 11: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

However, the GRASS also recommend to calculate specificagreement (Kottner et al., 2011). Despite these recommen-dations specific agreement values are rarely reported. In ourliterature review only one study reported specific agree-ment values (Kopkow, Lange, Schmitt, and Kasten, 2015),and for two studies we were able to calculate specific agree-ment values (Eshoj et al., 2018; Razmjou, Holtby, andMyhr, 2004) (Table 1).

The present study showed that an additional, briefface-to-face practical session did not improve interrateragreement and reliability. That is, agreement and reliabil-ity between pairs of raters who received this brief practicalsession on how to conduct clinical tests was not betterthan the agreement and reliability of two raters who onlyreceived basic instruction (i.e., leaflet and a video of theperformance of all clinical shoulder tests). Although thegroups of patients and physical therapists were differentwith respect to some characteristics, we think that theexplanation for not finding a difference between raterswho received a brief practical session and those whodidn’t, is that most of the participating physical therapistswere experienced. Besides, almost 50% of the participat-ing physical therapists had post-graduate qualifications. Itis reasonable to believe that an additional brief practicalsession will not add much to the proficiency of experi-enced clinicians.

Limitations

The findings of this investigation need to be interpretedin the light of certain limitations. Patients were sub-jected to a large number of tests, on two separate

sessions with a maximum time interval of 30 minutes.We chose independent examinations in order not toeliminate the variability in patient status and therapistmethod. An alternate method involves making videosof patients with one physical therapist that performsthe clinical shoulder tests (Lewis et al., 2016). Thesevideos can be assessed by several examiners. Thesevideos, however, do not mimic clinical practice. Werealized that an extensive test procedure performed bytwo different physical therapists can cause exacerbationof symptoms which can lead to poor agreement andreliability values. Therefore, we expected that a smallsubgroup of patients would report more pain after thefirst examination, although the physical therapists wereinstructed to avoid unnecessary pain with the shouldertesting procedure. In the present study, 21 patientsreported a change of > 2 points on an 11-point NRSafter the first examination. To our surprise, 10 of these21 patients reported a reduction of pain rather than anincrease. The test procedure involved provocative testsand symptom reduction tests. Although it is specula-tion, it might be that some patients had benefit of thereduction tests, and reported less pain after the firstexamination. An additional analysis in which patientswith a clinically relevant change in pain between thetwo examinations (> 2 points of change on an 11-pointNRS) were excluded did not lead to other agreement orreliability values. Therefore, we do not think that theextensive test procedure had a major effect on ourresults.

Patients were included with acute, subacute andchronic shoulder pain. The usefulness of clinical

Table 6. Interrater agreement and interrater reliability for symptomatically stable patients between examinations (n = 92).

TestObserved agreement % (95%

CI)Positive specificagreement

Negative specificagreement

Cohen’s kappa(95%CI)

Scapula Position 73.0 (62.4–81.6) 0.72 0.74 0.47 (0.26–0.67)External Rotation Resistance Test 77.2 (67.0–85.0) 0.73 0.80* 0.53 (0.33–0.74)Empty Can Test (Jobe test) 72.8 (62.4–81.3) 0.75* 0.70 0.45 (0.25–0.66)Full Can Test 84.6 (75.2–91.0) 0.78* 0.88* 0.66 (0.46–0.87)Active Compression Test (O’Brien’s test) 71.4 (60.9–80.2) 0.75* 0.66 0.41 (0.21–0.62)Neer Test 76.1 (65.7–84.3) 0.83* 0.60 0.44 (0.23–0.64)Hawkins-Kennedy Test 68.1 (57.4–77.3) 0.74 0.59 0.33 (0.13–0.54)Kim Test 68.5 (57.7–77.7) 0.53 0.76* 0.30 (0.09–0.50)Biceps Load II Test 81.6 (71.6–88.8) 0.27 0.89* 0.18 (−0.02–0.38)Internal Rotation Resistance Strength Test (ZaslavTest)

75.8 (65.5–83.9) 0.61 0.82* 0.44 (0.24–0.64)

Load and Shift Test 84.8 (75.4–91.1) 0.46 0.91* 0.37 (0.17–0.58)Acromioclavicular Joint Stress Test 76.1 (65.9–84.1) 0.61 0.83* 0.44 (0.24–0.64)Modified Scapular Assistance Test 71.6 (60.3–80.8) 0.78* 0.61 0.39 (0.17–0.60)Scapular Retraction Test 63.8 (52.2–74.0) 0.57 0.69 0.28 (0.08–0.48)Impingement Relief Test 63.8 (52.2–74.0) 0.60 0.67 0.28 (0.07–0.49)Sulcus Sign Test 82.4 (72.7–89.3) 0.38 0.90* 0.28 (0.08–0.49)Apprehension Test 65.6 (54.7–75.1) 0.68 0.63 0.31 (0.10–0.51)Relocation Test 68.8 (49.9–83.3) 0.78* 0.44 0.24 (−0.09–0.57)Release Test 77.4 (58.5–89.7) 0.84* 0.63 0.47 (0.13–0.82)Combined Reduction Test 68.6 (50.6–82.6) 0.52 0.77* 0.29 (−0.03–0.62)Glenohumeral Internal Rotation Deficit Test 81.3 (71.5–88.4) 0.71 0.86* 0.57 (0.37–0.78)

Abbreviations: CI: confidence interval; *Sufficient agreement for positive agreement or negative agreement (≥ 0.75)

10 A. T. APELDOORN ET AL.

Page 12: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

shoulder tests in patients with chronic pain can bequestioned. Contrary to acute pain in which nociceptionis presumed to be the primary source of pain, chronicpain is a more complex condition. For example, severalstudies have provided evidence for altered central noci-ceptive processing in subgroups of patients with persist-ing pain which is suggestive of central sensitization (Nijset al., 2014). Due to this process of generalized hyper-sensitivity of the somatosensory system, patients canreact with an increased response on clinical shouldertests. There is reason to believe, that in these patients,clinical shoulder tests are less valid because of a highpotential for false positive testing. However, as the aimof the present study was related to agreement and relia-bility, and not validity, we do not think that the possi-bility of dominant central sensitization pain in ourpatients has had a major influence on our results.

Pairing of raters and the order of testing were notrandomized due to organizational constraints.Although we do not think that this has substantiallyinfluenced our results, the possibility of bias cannot beexcluded.

The proportion of missing values and not applicablescores was low (8.7%). Several of the not applicable scoreswere logical (e.g., the Relocation Test is not applicable whenthe Apprehension Test is negative). Although the impact ofmissing values on our results is uncertain, the not applicablescores did not influence our interrater agreement and relia-bility values, because in these cases, calculations could notbe performed.

The majority of the patients (75%) were assessed andreassessed at the first visit at the clinic where physicaltherapists had no or minor information about the patient.In the remaining cases (25%), the physical therapist whotreated the patient was informed about the patient historyand clinical diagnosis. It is uncertain if this has influencedour results.

A last point of concern is that participating physicaltherapists did not collect characteristics of patients thatrefused to participate. Although potential selection biascannot be excluded, we consider the risk to be low.

Clinical implications

It is important to realize that for the performance of mostclinical shoulder tests several modifications have been pro-posed. We advise clinicians to discuss advantages and dis-advantages of the original performance and theirmodifications with colleagues, to assess patients togetheroccasionally and to discuss disagreements. Tests that can-not be performedwith acceptable reliability offer little or novalue in the clinical assessment of patients.

Conclusion

The present study is in line with the GRRAS guidelines andprovides data for shoulder tests that has never been pub-lished before in patients with shoulder pain. The findings ofthe present study show that most clinical shoulder tests donot show sufficient interrater agreement and reliabilityvalues for clinical use.

Acknowledgments

This work was supported by the Dutch Scientific College ofPhysiotherapy (WCF) of the Royal Dutch Society for PhysicalTherapy (KNGF).

Funding

This work was supported by the Dutch Scientific College ofPhysiotherapy (WCF) of the Royal Dutch Society for PhysicalTherapy (KNGF) (no grant number).

Disclosure Statement

The authors declare that they have no conflict of interest.

References

Altman DG 1997 Practical Statistics for Medical Research.London: Chapman and Hall.

Barten D, Koppes L 2016 Zorg door de fysiotherapeut.Jaarcijfers 2015 en trendcijfers 2011–2015 (Care deliveredby the physical therapist. In: The year 2015 and trendbetween 2011–2015). Utrecht: Nivel.

Bertilson BC, Grunnesjo M, Strender LE 2003 Reliability ofclinical tests in the assessment of patients with neck/shoulder problems-impact of history. Spine 28: 2222–2231.

Boon AJ,Smith J 2000 Manual scapular stabilization: Its effecton shoulder rotational range of motion. Archives ofPhysical Medicine and Rehabilitation 81: 978–83.

Burns SA, Cleland JA, Carpenter K, Mintken PE 2016Interrater reliability of the cervicothoracic and shoulderphysical examination in patients with a primary complaintof shoulder pain. Physical Therapy in Sport : OfficialJournal of the Association of Chartered Physiotherapistsin Sports Medicine 18: 46–55.

Cadogan A, Laslett M, Hing W, McNair P, Williams M 2011Interexaminer reliability of orthopaedic special tests used in theassessment of shoulder pain. Manual Therapy 16: 131–135.

Cools AM, Cambier D, Witvrouw EE 2008 Screening theathlete’s shoulder for impingement symptoms: A clinicalreasoning algorithm for early detection of shoulder pathol-ogy. British Journal of Sports Medicine 42: 628–635.

Corso G 1995 Impingement relief test: an adjunctive proce-dure to traditional assessment of shoulder impingementsyndrome. Journal of Orthopaedic and Sports PhysicalTherapy 22: 183–92.

Cyriax J 1982 Textbook of Orthopaedic Medicine Volume I.Diagnosis of soft Tissue Lesions (8th ed). London, BallièreTindall.

PHYSIOTHERAPY THEORY AND PRACTICE 11

Page 13: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

D’Hondt NE, Kiers H, Pool JJ, Hacquebord ST, Terwee CB,Veeger DH 2017 Reliability of performance-based clinicalmeasurements to assess shoulder girdle kinematics and posi-tioning: Systematic review. Physical Therapy 97: 124–144.

de Vet HC, Mokkink LB, Terwee CB, Hoekstra OS, Knol DL2013 Clinicians are right not to like Cohen’s kappa. BritishMedical Journal 346: f2125.

de VetHC, Terwee CB,Mokkink LB, Knol DL 2011Measurementin Medicine. New York: Cambridge University Press.

Egmond DL, Schuitemaker R 2014 Extremiteiten ManueleTherapie in Enge en Ruime Zin (Extremities ManualTherapy Including both Broad and Narrow Definition)(11th ed), Houten: Bohn Stafleu van Loghum.

Eshoj H, Ingwersen KG, Larsen CM, Kjaer BH, Juul-Kristensen B2018 Intertester reliability of clinical shoulder instability andlaxity tests in subjects with and without self-reported shoulderproblems. British Medical Journal Open 8: e018472.

Greving K, Dorrestijn O, Winters JC, Groenhof F, van derMeer K, Stevens M, Diercks RL 2012 Incidence, preva-lence, and consultation rates of shoulder complaints ingeneral practice. Scandinavian Journal of Rheumatology41: 150–155.

Hawkins RJ, Kennedy JC 1980 Impingement syndrome inathletes. American Journal of Sports Medicine 8: 151–8.

Hayes KW, Petersen CM 2003 Reliability of classificationsderived from Cyriax’s resisted testing in subjects withpainful shoulders and knees. Journal of Orthopaedic andSports Physical Therapy 33: 235–246.

Hegedus EJ, Cook C, Lewis J, Wright A, Park JY 2015Combining orthopedic special tests to improve diagnosisof shoulder pathology. Physical Therapy in Sport : OfficialJournal of the Association of Chartered Physiotherapists inSports Medicine 16: 87–92.

Holtby R, Razmjou H 2004 Validity of the supraspinatus testas a single clinical test in diagnosing patients with rotatorcuff pathology. Journal of Orthopaedic and Sports PhysicalTherapy 34: 194–200.

Hripcsak G, Heitjan DF 2002 Measuring agreement in med-ical informatics reliability studies. Journal of BiomedicalInformatics 35: 99–110.

Jain NB, Luz J, Higgins LD, Dong Y, Warner JJ, Matzkin E,Katz JN 2017 The diagnostic accuracy of special tests forrotator cuff tear: The ROW cohort study. American Journalof Physical Medicine and Rehabilitation 96: 176–183.

Jobe FW, Jobe CM 1983 Painful athletic injuries of the shoulder.Clinical orthopaedics and related research: 117–24.

Johansson K, Ivarson S 2009 Intra- and interexaminer relia-bility of four manual shoulder maneuvers used to identifysubacromial pain. Manual Therapy 14: 231–239.

Karel YH, VerhagenAP, Thoomes-de GraafM, Duijn E, vanDenBorne MP, Beumer A, Ottenheijm RP, Dinant GJ, Koes BW,Scholten-Peeters GG 2016Development of a prognosticmodelfor patients with shoulder complaints in physical therapistpractice. Physical Therapy 97: 72–80.

Kelly BT, Kadrmas WR, Speer KP 1996 The manual muscleexamination for rotator cuff strength. An electromyo-graphic investigation. American Journal of SportsMedicine 24: 581–8.

Kibler WB, Sciascia A, Dome D 2006 Evaluation of apparentand absolute supraspinatus strength in patients withshoulder injury using the scapular retraction test.American Journal of Sports Medicine 34: 1643-7.

Kibler WB, Uhl TL, Maddux JW, Brooks PV, Zeller B,McMullen J 2002 Qualitative clinical evaluation of scapulardysfunction: a reliability study. Journal of Shoulder andElbow Surgery 11: 550–6.

Kim SH, Ha KI, Ahn JH, Kim SH, Choi HJ 2001 Biceps loadtest II: A clinical test for SLAP lesions of the shoulder.Arthroscopy 17: 160–164.

Kim SH, Park JS, Jeong WK, Shin SK 2005 The Kim test:A novel test for posteroinferior labral lesion of theshoulder–A comparison to the jerk test. AmericanJournal of Sports Medicine 33: 1188–1192.

Kooijman M, Swinkels I, van Dijk C, de Bakker D, Veenhof C2013 Patients with shoulder syndromes in general andphysiotherapy practice: An observational study. BMCMusculoskeletal Disorders 14: 128.

Kopkow C, Lange T, Schmitt J, Kasten P 2015 Interraterreliability of the modified scapular assistance test withand without handheld weights. Manual Therapy 20:868–874.

Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ,Hróbjartsson A, Roberts C, Shoukri M, Streiner DL 2011Guidelines for Reporting Reliability and AgreementStudies (GRRAS) were proposed. Journal of ClinicalEpidemiology 64: 96–106.

Lange T, Matthijs O, Jain NB, Schmitt J, Lutzner J, Kopkow C2017 Reliability of specific physical examination tests forthe diagnosis of shoulder pathologies: A systematic reviewand meta-analysis. British Journal of Sports Medicine 51:511–518.

Lewis JS 2009 Rotator cuff tendinopathy/subacromial impinge-ment syndrome: Is it time for a new method of assessment?British Journal of Sports Medicine 43: 259–264.

Lewis JS, McCreesh K, Barratt E, Hegedus EJ, Sim J 2016 Inter-rater reliability of the shoulder symptom modification pro-cedure in people with shoulder pain. British Medical JournalOpen Sport and Exercise Medicine 2: e000181.

May S, Chance-Larsen K, Littlewood C, Lomas D, Saad M2010 Reliability of physical examination tests used in theassessment of patients with shoulder problems:A systematic review. Physiotherapy 96: 179–190.

Michener LA, Walsworth MK, Doukas WC, Murphy KP2009 Reliability and diagnostic accuracy of 5 physicalexamination tests and combination of tests for subacromialimpingement. Archives of Physical Medicine andRehabilitation 90: 1898–1903.

Nanda R, Gupta S, Kanapathipillai P, Liow RYL, Rangan A2008 An assessment of the inter examiner reliability ofclinical tests for subacromial impingement and rotatorcuff integrity. European Journal of Orthopaedic Surgeryand Traumatology 18: 6.

Neer CS 2nd 1983 Impingement lesions. ClinicalOrthopaedics and Related Research: 70–77.

Nijs J, Torres-Cueco R, van Wilgen CP, Girbes EL, Struyf F,Roussel N, van Oosterwijck J, Daenen L, Kuppens K,Vanwerweeen L et al. 2014 Applying modern pain neu-roscience in clinical practice: Criteria for the classificationof central sensitization pain. Pain Physician 17: 447–457.

Nørregaard J, Krogsgaard MR, Lorenzen T, Jensen EM 2002Diagnosing patients with longstanding shoulder joint pain.Annals of the Rheumatic Diseases 61: 646–649.

O'Brien SJ, Pagnani MJ, Fealy S, McGlynn SR, Wilson JB1998 The active compression test: A new and effective test

12 A. T. APELDOORN ET AL.

Page 14: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

for diagnosing labral tears and acromioclavicular jointabnormality. American Journal of Sports Medicine 26:610-3.

Ostor AJ, Richards CA, Prevost AT, Hazleman BL, Speed CA2004 Interrater reproducibility of clinical tests for rotator cufflesions. Annals of the Rheumatic Diseases 63: 1288–1292.

Picavet HS, Schouten JS 2003 Musculoskeletal pain in theNetherlands: Prevalences, consequences and risk groups,the DMC(3)-study. Pain 102: 167–178.

Pope DP, Croft PR, Pritchard CM, Silman AJ 1997Prevalence of shoulder pain in the community: The influ-ence of case definition. Annals of the Rheumatic Diseases56: 308–312.

Rabin A, Irrgang JJ, Fitzgerald GK, Eubanks A 2006 Theintertester reliability of the scapular assistance test. Journalof Orthopaedic and Sports Physical Therapy 36: 653–660.

Razmjou H, Holtby R, Myhr T 2004 Pain provocativeshoulder tests: Reliability and validity of the impingementtests. Physiotherapy Canada 56: 229–236.

Roach KE, Budiman-Mak E, Songsiridej N, Lertratanakul Y1991 Development of a shoulder pain and disabilityindex. Arthritis Care and Research: The OfficialJournal of the Arthritis Health Professions Association4: 143–149.

Silliman JF, Hawkins RJ 1993 Classification and physicaldiagnosis of instability of the shoulder. ClinicalOrthopaedics and Related Research: 7–19.

Tempelhof S, Rupp S, Seil R 1999 Age-related prevalence ofrotator cuff tears in asymptomatic shoulders. Journal ofShoulder and Elbow Surgery 8: 296–299.

Thoomes-de Graaf M, Scholten-Peeters GG, Duijn E,Karel Y, Koes BW, Verhagen AP 2015 The DutchShoulder Pain and Disability Index (SPADI): A reliabilityand validation study. Quality of Life Research: AnInternational Journal of Quality of Life Aspects ofTreatment, Care and Rehabilitation 24: 1515–1519.

Tong HC, Haig AJ, Yamakawa K 2002 The Spurling test andcervical radiculopathy. Spine 27: 156–159.

Tooth LR, Ottenbacher KJ 2004 The kappa statistic in reha-bilitation research: An examination. Archives of PhysicalMedicine and Rehabilitation 85: 1371–1376.

Tzannes A, Paxinos A, Callanan M, Murrell GA 2004 Anassessment of the interexaminer reliability of tests forshoulder instability. Journal of Shoulder and ElbowSurgery 13: 18–23.

Vind M, Bogh SB, Larsen CM, Knudsen HK, Sogaard K, Juul-Kristensen B 2011 Inter-examiner reproducibility of clinicaltests and criteria used to identify subacromial impingementsyndrome. British Medical Journal Open 1: e000042.

Walsworth MK, Doukas WC, Murphy KP, Mielcarek BJ,Michener LA 2008 Reliability and diagnostic accuracy of his-tory and physical examination for diagnosing glenoid labraltears. American Journal of Sports Medicine 36: 162–168.

Worcester JN, Green DP 1968 Osteoarthritis of the acromio-clavicular joint. Clinical Orthopaedics and RelatedResearch 58: 69-73.

Zaslav KR 2001 Internal rotation resistance strength test: Anew diagnostic test to differentiate intra-articular pathol-ogy from outlet (Neer) impingement syndrome in theshoulder. Journal of Shoulder and Elbow Surgery 10: 23–7.

PHYSIOTHERAPY THEORY AND PRACTICE 13

Page 15: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Appendix 1. Test procedures

Test Performance Criteria Photo

Patient in standardized standing position: Arms at the sides of the body, elbows straight and shoulders in neutral position. Feet shoulder width apart. Lightcontraction of m. transversus abdominis.Scapula PositionKibler et al, 2002

Examiner is standing behind thepatient and observes the position ofthe scapula of the affected side.

Types 1, 2, 3, or normal.1: Tipping: increased anteriortilt, inferior angle of thescapula is prominent.

2: Winging or scapula alatae:increased internal rotation(protraction shoulder girdle),medial border of the scapulais prominent.

3: Upward rotation of thesuperomedialborder (cavitas glenoidalisscapulae is pointed caudal),superior angle of the scapulais prominent.

Patient in standardized seated position: Extended lumbar spine. Both feet flat on the ground. Hips and knees in 90° flexion.External Rotation ResistanceTestCyriax, 1982

Patient’s elbow is at the side of thebody, elbow 90° flexed, forearm inneutral position (thumbs up).Examiner is standing at the same sideof the patient with the hip against thetested shoulder. One hand fixates thetrunk by pulling the patient’s shoulderagainst his side. The other hand isplaced on the lateral surface of theforearm proximal to the patient’swrist, and applies pressure. Patient isasked to resist the slowly increasingpressure.

Test is considered positive ifpain is reported.

Empty Can Test (Jobe Test)Jobe and Jobe, 1983

Both arms in 90° abduction in thescapular plane.Elbows slightly flexed, thumbspointing downwards (internal rotationshoulder and forearm).Examiner is standing in front ofpatient and asks the patient to holdthis position against slowly increasingdownward pressure applied to elbow(into adduction/internal rotation).

Test is considered positive ifpain or weakness is reported.The test is also positive if theposition itself produces pain.

(Continued )

14 A. T. APELDOORN ET AL.

Page 16: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

(Continued).

Test Performance Criteria Photo

Full Can TestKelly, Kadrmas, and Speer,1996

Both arms in 90° abduction in thescapular plane.Elbows extended, forearm supinatedand thumbs pointing upwards.Examiner is standing in front ofpatient and asks the patient to remainthis position against slowly increasingdownward pressure applied to theelbows (into adduction).

Test is considered positive ifpain or weakness is reported.The test is also positive if theposition itself produces pain.

Active Compression Test(O’Brien’s Test)O’Brien et al, 1998

Patient’s arm in 90° flexion, 10–15°horizontal adduction. Examiner isstanding behind or at the same sideof the patient and provides pressuredownwards with one hand on thedistal part of the humerus.

Test is considered positive ifpain is reported in the firstposition (internal rotation) andis significant less or disappearsin the second position(external rotation).

First, the patient is asked to maximallyinternally rotate the arm (thumbdown) and resist slowly increasingpressure which is applied in adownward direction. The patient isthen asked to maximally externallyrotate the arm (thumb up) and isasked again to resist the downwardpressure.

Neer TestNeer, 1983

Examiner is standing behind thepatient. One hand palpates andcontrols the scapula (no force isapplied). The other hand holdspatients extended elbow and forcesthe patient’s arm into maximalelevation.

Test is considered positive ifpain is produced.

Hawkins-Kennedy TestHawkins and Kennedy,1980

Patient’s arm in 90° flexion, elbow in90° flexion. Forearm is internallyrotated and positioned in thetransversal plane.Examiner is standing behind thepatient and supports this positionwith both arms in order to relax thepatient’s arm. The shoulder is thenpassively gently rotated internally bylowering the forearm whilesupporting the elbow.

Test is considered positive ifpain is produced duringinternal rotation. The test isalso positive if the positionitself produces pain.

(Continued )

PHYSIOTHERAPY THEORY AND PRACTICE 15

Page 17: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

(Continued).

Test Performance Criteria Photo

Kim TestKim, Park, Jeong, and Shin,2005

Patient’s arm in 90° abduction, elbowin 90° flexion. Forearm is internallyrotated and positioned in transversalplane. Examiner is standing next tothe patient on the same side as thetested shoulder and supports thisposition with one hand at thepatient’s elbow, and the other handon the proximal arm. The examinerapplies an strong axial loading forcealong the line of the proximal armfrom the elbow toward the glenoid.While a downward and backwardforce is applied to the proximal arm,the arm is elevated 45°diagonallyupward (to 135° flexion-elevation).During the test, it is important toapply a firm axial compression forceto the glenoid surface by the humeralhead.

Test is considered positive if asudden onset of posteriorshoulder pain is producedduring the test, regardless ofaccompanying posterior clunkof the humeral head.

Biceps Load II TestKim et al., 2001

Patient’s arm in 120° abduction andmaximal external rotation. Elbow in90° flexion.Examiner is standing behind thepatient and applies an axial forcealong the line of the humerus fromthe elbow to glenoid with one handwhile supporting the external rotationof the forearm. The patient is asked toflex the elbow against resistance.

Test is considered positive ifpain is reported.

Internal Rotation ResistanceStrength TestZaslav, 2001

Patient’s arm in 90° abduction, elbowin 90° flexion and 80° (submaximal)external rotation.Examiner is standing at the side of thepatient’s tested shoulder. One hand isplaced at medial border of the elbow,the other hand at the distal/dorsalpart of the forearm.First, gradual force in the direction ofinternal rotation is applied. Thepatient responds with an isometricexternal rotation force. Then theexaminer switches hands and appliesa gradually force in the direction ofexternal rotation. The patientresponds with isometric internalrotation force.

Test is considered positive ifthe patient has good strengthin external rotation butapparent weakness in internalrotation.

(Continued )

16 A. T. APELDOORN ET AL.

Page 18: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

(Continued).

Test Performance Criteria Photo

Load and Shift TestSilliman and Hawkins, 1993

Patient’s arm in loose packed position.Examiner is standing on the same sideas the tested shoulder and grasps thehumerus as proximal as possible. Thepatient’s forearm is rested on theexaminer’s forearm. The other hand ofthe examiner fixates the clavicle,acromion and coracoid process. Thepatient is asked to active ‘load’ theglenohumeral joint with an isometriccontraction of the shoulder muscles.After relaxation the examiner shiftsthe humeral head in the anterior andposterior directions with a maximumof grade 1 translation.This is a slightly modified version ofthe load and shift test described bySilliman and Hawkins (1993) in whichone hand grasp the humeral head,and load it into the glenoid.

Test is considered positive atgrade 1.Grade 0: Barely no translation(<25% diameter humeralhead).Grade 1: Translation within therim of the glenoid (25–50%diameter humeral head).Note:Grade 2: Subluxation over therim of the glenoid (>50%diameter humeral head) butspontaneous reduction (doesnot lock).Grade 3: Dislocation over therim of the glenoid (>50%diameter humeral head) withno spontaneous reduction(locks out over the rim).

Acromioclavicular Joint StressTestWorcester and Green, 1968

Patient’s arm is relaxed at the side ofthe body.Examiner is standing behind thepatient and gradually appliesdownward pressure with fore fingerand middle finger directly over theacromioclavicular joint.

Test is considered positive ifpain is reported on top of theacromioclavicular joint.

Patient in standardized standing position: Arms at the sides of the body, elbows straight and shoulders in neutral position. Feet shoulder width apart. Lightcontraction of m. transversus abdominis.

ModifiedScapularAssistanceTestRabin, Irrgang,Fitzgerald, andEubanks, 2006

The patient is asked to actively elevatethe arm in a provocative direction (e.g.,sagittal (anteflexion), frontal (abduction),or scapular (scaption) plane). Examiner isstanding diagonally behind the patientand places one hand over the superiorpart of the scapula. The other hand isplaced over the inferior part of thescapula. During the movement theexaminer facilitates upward rotation andposterior tilting by pushing the lower partof the scapula against the thorax.

Test is considered positive if a significantreduction of pain, symptoms or effortoccurs during assisted elevation,compared to elevation withoutassistance.Test is considered not applicable if thepatient does not report symptoms withactive elevation.

ScapularRetraction TestKibler,Sciascia, andDome, 2006

A positive clinical provocation test (forexample the Active Compression Test) isrepeated.Examiner is standing behind the patientand places one hand on the uppertrapezius and acromion of the testedshoulder and pulls the shoulder girdle inretraction. The scapula is lightly held inretraction by forearm pressureon the medial scapular border. With theother hand the clinical provocation test isrepeated.

The test is positive when the initial pain,present in the clinical provocation testreduces significantly or disappears duringthe Scapular Retraction Test.Test is considered not applicable if thepatient does not report symptoms withclinical provocation tests like the ActiveCompression Test, the Full Can Test andthe Empty Can Test.

(Continued )

PHYSIOTHERAPY THEORY AND PRACTICE 17

Page 19: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

(Continued).

ImpingementRelief TestCorso, 1995

The patient is asked to actively elevatethe arm in a provocative direction (e.g.,sagittal [anteflexion], frontal [abduction],or scapular [scaption] plane). Examiner isstanding diagonally behind the patientand applies a gentle inferior glide to thehumeral head just prior to the onset ofthe painful arc or “impingement sign”.

Test is considered positive if a reductionof pain, symptoms or effort occurs duringassisted elevation, compared to elevationwithout inferior glide pressure.Test is considered not applicable if thepatient does not report symptoms withactive elevation.

Patient in standing position next to examiner’s table: Trunk 45° flexed. One foot in front of the other. Closest hand on the table. The tested shoulder ishanging relaxed in 20° flexion.

Sulcus Sign TestSilliman andHawkins, 1993

Patient’s arm is relaxed in loose packedposition.The examiner is standing diagonallybehind the patient and grasps theacromion, clavicle, coracoid process andspinae scapula with one hand. This handpalpates the size of the gap betweenacromion and caput humeri (sulcus). Theother hand holds the distal part of thehumerus just above the lateral epicondyland applies a downward force.

Test is considered positive at grade1 or 2:Grade 0: No sulcus sign (no enlarged gap)Grade 1: Sulcus sign, enlarged gap of onecentimeterGrade 2: Sulcus sign, enlarged gap of twocentimeter

Patient in supine position at the side of the table but with the scapula fully supported by the table.

ApprehensionTestTzannes,Paxinos,Callanan, andMurrell, 2004

Examiner is standing next to the side ofthe patient.Patient’s arm is in 90° abduction andmaximal external rotated (submaximalclosed-packed position), elbow in 90°flexion.The examiner applies a horizontalabduction- external rotation to the endrange of motion (or until the patient’srequest for the examinerto stop).

Test is considered positive if pain orapprehension is reported.

Relocation TestTzannes,Paxinos,Callanan, andMurrell, 2004

Examiner is standing next to the side ofthe patient.Patient’s arm is in 90° abduction andmaximal external rotated (submaximalclosed-packed position), elbow in 90°flexion.The examiner applies a horizontalabduction- external rotation to the endrange of motion (or until the patient’srequest for the examinerto stop). The examiner’s other handapplies a posterior translation(downward) force on the head of thehumerus.

Test is considered positive if pain orapprehension is reduced with the appliedposterior translation force.Test is not applicable if the ApprehensionTest is negative.

Release TestTzannes,Paxinos,Callanan, andMurrell, 2004

Examiner is standing next to the side ofthe patient.Patient’s arm is in 90° abduction andmaximal external rotated (submaximalclosed-packed position), elbow in 90°flexion.The examiner applies a horizontalabduction-external rotation to the endrange of motion (or until the patient’srequest for the examinerto stop). The examiner’s other handapplies a posterior translation(downward) force on the head of thehumerus. The examiner suddenly releasethe posterior/downward pressure of thehumeral head whileholding the patient’s arm in the positionof apprehension.

Test is considered positive if pain orapprehension is returned after removal ofthe applied posterior translation force.Test is not applicable if the ApprehensionTest and Relocation Test are negative.

(Continued )

18 A. T. APELDOORN ET AL.

Page 20: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

REFERENCES

Boon AJ, Smith J 2000 Manual scapular stabilization: itseffect on shoulder rotational range of motion. Archivesof Physical Medicine and Rehabilitation 81: 978–983.

Corso G 1995 Impingement relief test: an adjunctive proce-dure to traditional assessment of shoulder impingementsyndrome. Journal of Orthopaedic and Sports PhysicalTherapy 22: 183–192.

Cyriax J 1982 Textbook of Orthopaedic Medicine VolumeI. Diagnosis of Soft Tissue Lesions (8th ed). London,Ballière Tindall.

Egmond DL, Schuitemaker R 2014 Extremiteiten manueletherapie in enge en ruime zin (Extremities manual therapyincluding both broad and narrow definition) (11th ed).Houten: Bohn Stafleu van Loghum.

Hawkins RJ, Kennedy JC 1980 Impingement syndrome inathletes. American Journal of Sports Medicine 8: 151–158.

Jobe FW, Jobe CM 1983 Painful athletic injuries of the shoulder.Clinical Orthopaedics and Related Research: 117–124.

Kelly BT, Kadrmas WR, Speer KP 1996 The manual muscleexamination for rotator cuff strength. An electromyographicinvestigation. American Journal of Sports Medicine 24:581–588.

(Continued).

CombinedReduction TestEgmond andSchuitemaker,2014

The examiner is standing at the side ofthe patient between the trunk and thearm of the patient. Patient places the armat the side of the examiner’s shoulder. Tocenter the caput humerus into the cavitasglenoidalis the patient is asked to makea fist in order to establish an isometriccontraction of the shoulder muscles. Thisarm is guided actively to 60° flexion bythe examiner.The arm of the examiner which supportsthe patient’s arm is placed on the lateralpart of the m. deltoideus. The other armof the examiner applies light pressurewith the forearm on the sternum of thepatient (to stabilize the scapula on thetable) and this hand is placed on top ofthe other hand. Both hands of theexaminer (hand-over-hand technique) areapplying facilitating, proximal resistanceon the m. deltoideus region during thecomplete movement from 60° to 90°flexion. This pressure will result in a safefeeling of stability. Rotation is neutral.From this point (90° flexion) the examinermoves his body to cranial to continue theactively guided final goal of 180° flexion.The patient flex the arm active and theexaminer partly resist this movement.This should be pain free. After thisprocedure, the patient moves the armback to neutral position againstresistance.The test is repeated three to five times.The technique is an attempt to combinethe several reduction techniques, i.e.,Modified ScapularAssistance Test, Scapular Retraction Testand Impingement Relief Test.

Examiner asks and observes the patient if‘centering’ is successful. The patientdetermines the maximum pain free rangeof motion.Test is considered positive if pain freerange of motion increases with at least20° or a significant reduction ofsymptoms occurs after 3–5 repetitions.

GlenohumeralInternalRotationDeficit TestBoon andSmith, 2000

Patient’s arm is in 90° abduction. Elbow in90° flexion. Forearm is neutral. A towel isplaced under the upper arm of thepatient to prevent horizontal abduction.Examiner is standing next to the head ofthe patient and fixates the scapula on theexaminer’s table by pressing the frontalside of the anterior chest walldownwards. The other hand performsinternal rotation of the shoulder (in 90°abduction) until resistance or pain occurs.The examiner measures the ROM witha goniometer or by visual inspection.

Test is considered positive if the passiveROM is 20° or more restricted comparedto the other side.

PHYSIOTHERAPY THEORY AND PRACTICE 19

Page 21: Interrater agreement and reliability of clinical tests for ...neurological disorders (e.g., CVA, MS, Parkinson’s dis-ease), organ pathologies which affect the shoulder pain, dementia,

Kibler WB, Sciascia A, Dome D 2006 Evaluation of apparentand absolute supraspinatus strength in patients withshoulder injury using the scapular retraction test.American Journal of Sports Medicine 34: 1643–1647.

Kibler WB, Uhl TL, Maddux JW, Brooks PV, Zeller B,McMullen J 2002 Qualitative clinical evaluation of scapulardysfunction: a reliability study. Journal of Shoulder andElbow Surgery 11: 550–556.

Kim SH, Ha KI, Ahn JH, Kim SH, Choi HJ 2001 Biceps loadtest II: A clinical test for SLAP lesions of the shoulder.Arthroscopy 17: 160–164.

Kim SH, Park JS, Jeong WK, Shin SK 2005 The Kim test:A novel test for posteroinferior labral lesion of theshoulder–a comparison to the jerk test. American Journalof Sports Medicine 33: 1188–1192.

Neer CS 2nd 1983 Impingement lesions. ClinicalOrthopaedics and Related Research: 70–77.

O’Brien SJ, Pagnani MJ, Fealy S, McGlynn SR, Wilson JB1998 The active compression test: a new and effective testfor diagnosing labral tears and acromioclavicular joint

abnormality. American Journal of Sports Medicine 26:610–613.

Rabin A, Irrgang JJ, Fitzgerald GK, Eubanks A 2006 Theintertester reliability of the Scapular Assistance Test.Journal of Orthopaedic and Sports Physical Therapy 36:653–660.

Silliman JF, Hawkins RJ 1993 Classification and physicaldiagnosis of instability of the shoulder. ClinicalOrthopaedics and Related Research: 7–19.

Tzannes A, Paxinos A, Callanan M, Murrell GA 2004 Anassessment of the interexaminer reliability of tests forshoulder instability. Journal of Shoulder and ElbowSurgery 13: 18–23.

Worcester JN Jr, Green DP 1968 Osteoarthritis of the acro-mioclavicular joint. Clinical Orthopaedics and RelatedResearch 58: 69–73.

Zaslav KR 2001 Internal rotation resistance strength test: A newdiagnostic test to differentiate intra-articular pathology fromoutlet (Neer) impingement syndrome in the shoulder.Journal of Shoulder and Elbow Surgery 10: 23–27.

20 A. T. APELDOORN ET AL.