differences in usmle step 3 performance by setting and ... · pdf filedifferences in usmle...
TRANSCRIPT
Differences in USMLE Step 3 Performance by Setting and Specialty
D Swanson, R Feinberg, K Swygert, G Dillon,
K Holtzman, M Raymond, and S Haist
National Board of Medical Examiners Philadelphia, PA
Purpose of Study
• To explore differences in Step 3 MCQ performance by clinical setting (Ambulatory vs ED/Hospital) in relation to – Type of medical school (LCME vs International) – Specialty and amount of graduate training
• Interpret results from perspective of the validity of Step 3 score interpretations
• Inform decisions about Step 3 design
Exam Purpose: assesses whether examinees can apply medical knowledge and understanding of biomedical and clinical science essential for the unsupervised practice of medicine, with emphasis on patient management in ambulatory settings: Designed to test examinees’ readiness for independent (general) practice
• Two-day computer-based test • Passing scores on other Steps
required to take Step 3 • Test design based on practice of
the General Undifferentiated Medical Practitioner (“GUMP”)
• Usually taken in PGY1 or PGY2 – Some do not sit until later – Some IMGs sit before GME
• Often taken in order to moonlight during residency
Context for the Study: USMLE Step 3
Method: Test Material and Subjects
Test Material • 480 MCQs & 9 CCS cases
per form (includes pretest) • MCQs given in seven 60-min
and four 45-min blocks – Blocks organized by setting – Ambulatory: 66% of items – ED/Hosp: 34% of items
• 4000+ unique MCQs scored across all test forms
• Almost all patient vignettes
Subjects • 28935 examinees taking
Step 3 in 2010-11 – 16487 LCME first-takers – 1012 LCME repeat takers – 8812 IMG first-takers – 2624 IMG repeat takers
• 14935 LCME first-takers – 17 selected specialties
Sample Step 3 MCQ – ED/Hospital An 87-year-old woman is brought to the emergency department by ambulance. Her friend found her lying in bed in her home about one-half hour ago. She had been incontinent of urine and had also vomited. The patient has a history of degenerative joint disease, hypertension, and chronic obstructive pulmonary disease. The paramedics brought in her medications, which include felodipine, naproxen, albuterol inhaler, ipratropium inhaler, prednisone, theophylline, and ciprofloxacin. On questioning the woman she says she has a headache and nausea, but she is not able to give a more coherent history. She appears restless, tremulous, and agitated. Vital signs are temperature 37.0°C (98.6°F), pulse 120/min, respirations 26/min, and blood pressure 110/65 mm Hg. Physical exam is normal except for mild expiratory wheezing. Chest x-ray is normal. Which of the following is the most likely cause of her symptoms?
(A) Exacerbation of chronic obstructive pulmonary disease (B) Gastroenteritis (C) Migraine (D) Stroke (E) Theophylline toxicity
Sample Step 3 MCQ – Ambulatory A 50-year-old African-American woman returns to the office for follow-up of diabetes mellitus, which has been treated with diet; extended-release glipizide, 10 mg daily; and metformin, 500 mg twice a day. She says, "I do the best I can adhering to my diet." She tests her blood glucose concentration daily. For the past month her fasting blood glucose concentrations have averaged 170 mg/dL. Hemoglobin A1c 1 week ago was 8.4%. The patient is 167.5 cm (5 ft 6 in) tall and weighs 86 kg (190 lb); BMI is 31 kg/m2. Which of the following is the most appropriate change in therapy?
(A) Add chlorpropamide (B) Add insulin (C) Increase the metformin dosage (D) Stop the glipizide and metformin and start insulin (E) No change is indicated
Method: Scaling of Subscores The following procedure was followed for each test form for both ED/Hospital and Ambulatory (MCQ) subscores 1. Calculate the mean and SD of percent-correct scores for first-time
examinees graduating from LCME schools 2. Use the values from 1 to determine the linear transformation that
standardizes scores for first-time examinees from LCME schools to have a mean of 0 and an SD of 1 (z-scores)
3. Apply the linear transformation from 2 to all examinee scores Because examinees are randomly assigned to forms, the above (linear equating) procedure places scores from different test forms on roughly comparable (z-score) scales 4. Ambulatory z-score was subtracted from ED/Hospital z-score to
produce an index of relative performance (ED/Hosp – Ambulatory) 5. Relative performance by setting and year in training was analyze
for selected specialties
Reliabilities for and Correlations between Subscores (LCME First-Takers)
Setting/Subscore Ed/Hosp Ambulatory Hospital ED
Reliability (coefficient alpha) 0.74 0.82 0.61 0.56
Observed Correlations
Ed/Hosp --- 0.71 0.87 0.90 Ambulatory 0.71 --- 0.63 0.63 Hospital 0.87 0.63 --- 0.56 ED 0.90 0.63 0.56 ---
True Correlations
Ed/Hosp --- 0.91 Ambulatory --- Hospital 0.94 --- ED 0.89 0.96 ---
ED/Hosp – Ambulatory by Specialty
Results from this slide forward are based on LCME first-takers in 17 selected specialties, rather than the total group
ANOVA Results for ED/Hosp – Ambulatory by Specialty and Year in Training
Source Sum of Squares df
Mean Square F Sig
Partial Eta2
Specialty 753.59 16 47.10 93.12 .000 .091
Year in Training (YIT) 21.33 2 10.67 21.09 .000 .003
Specialty x YIT 48.16 32 1.50 2.98 .000 .006
Error 7528.01 14884 0.51
R2 = 0.10
Summary of Results and Conclusions • For some examinees, sizable performance differences were
observed for ED/Hospital vs Ambulatory subscores – Some passing examinees did much better on ED/Hospital
material; others did much better on Ambulatory material • Performance differences by setting appeared to reflect:
– Specialty-related differences in training – Variation in when examinees sit for Step 3 – The interaction of the above
• The study provides some validity evidence for Step 3 scores but may raise concern about a small percentage of residents who could be moonlighting in the “wrong” specialty