comparison of performance in biostatistics and ... · pdf filecomparison of performance in...
TRANSCRIPT
Comparison of Performance in Biostatistics and Epidemiology across
USMLE Steps 1, 2, and 3
D Swanson, W Ouyang K Holtzman, M Johnson, and S Haist
National Board of Medical Examiners
Philadelphia, PA
Purpose of Study
• Several years ago, the Committee to Evaluate the USMLE Program (CEUP) recommended introduction of “a testing format designed to assess an examinee’s ability … to engage in evidence-based decision making.”
• This study was conducted to obtain baseline information about examinees’ performance in biostatistics and epidemiology as they progress through USMLE.
Method: Test Material and Subjects
Test Material • 139 (unscored) biostatistics
and epidemiology items – 77 used on Steps 1, 2CK, 3 – 62 used on Steps 2CK, 3
Procedure • Small number of study items
randomly selected for each examinee
Subjects • First-time takers from
LCME schools taking Step exams in 2010-11
• Median number of examinees/item: – Step 1: 225 – Step 2 CK: 557 – Step 3: 395
Number Needed to Treat A trial of a new lipid-lowering drug is conducted in a population of patients aged 60 years and older. The drug has been designed to decrease the risk for myocardial infarction (MI). The study participants have multiple risk factors for cardiovascular disease. After 3 years of treatment, the incidence of MI in the intervention group is 2% compared with 10% in the control group (p<0.05). Based on these data, which of the following best represents the number of patients who must be treated with this drug for 3 years to prevent one MI?
(A) 6.25 (B) 12.5 (C) 18.75 (D) 25 (E) 50
Exam p-value biserial Step 1 84 0.49 Step 2 CK 67 0.18 Step 3 65 0.26
SCBQ0836
Flaws in Study Design Investigators are conducting a study of a community-level intervention to reduce body mass index (BMI) among school-age children. Forty-eight communities are considered for the study, but only 24 of these communities meet the criteria that the investigators have chosen, including similar population sizes, median incomes, and proportions of total area that are devoted to recreation. These 24 communities are then randomly assigned to intervention status or control status. Changes in the BMIs of 100 randomly selected children in each town are measured by trained research assistants who are blinded to community assignment. The results favor the intervention by a statistically and clinically significant margin. The conclusion by the investigators that the intervention should be implemented in communities across the United States is most likely to be criticized on which of the following grounds?
(A) Ascertainment bias (B) Generalizability (C) Inappropriate control group (D) Recall bias (E) Selection bias
Exam p-value biserial Step 2 CK 77 0.36 Step 3 66 0.07
SCBq0896
Item Difficulty and Discrimination by Step
Exam Number of Items
Median Item Difficulty (p-value)
Median Item Discrimination
(biserial) Step 1 77 0.697 0.267 Step 2 CK 0.680 0.211 Step 3 0.636 0.216 Step 2 CK 139 0.653 0.199 Step 3 0.636 0.200
Item Difficulties (p-values): Step 2 CK vs Step 1
Better performance on Step 2 CK
Better performance on Step 1
Item Difficulties (p-values): Step 3 vs Step 1
Better performance on Step 3
Better performance on Step 1
Item Difficulties (p-values): Step 3 vs Step 2 CK
Better performance on Step 3
Better performance on Step 2 CK
Results of Matched-Pairs t-tests on Logit-Transformed Item Difficulties
Exams Compared # of
Items Correlation t-test Signif Step 1 and Step 2 CK 77 0.88 1.25 NS
Step 1 and Step 3 77 0.85 2.59 p < 0.012
Step 2 CK and Step 3 139 0.92 3.10 p < 0.002
Summary of Results and Areas of Declining Performance
• Overall performance declined – 1.5% (non-significant) from Step 1 to Step 2 CK
– 1.5% further decline from Step 2 CK to Step 3
• Review of item content indicated performance declines were largest for items on: – Sensitivity, specificity, and predicted value
– Health impact (risk/benefit)
-35
-30
-25
-20
-15
-10
-5
0
5
10
1975-771981-821988-891991-922004-052008-09 Step 22009-10 Step 3
Total
Biochem
Micro Anat Phys
Pharm
Behav Sci Path
Basic Science Retention: NBME Research from 1975 to 2010
Discussion • Glass half-empty or half-full? Observed performance
declines in biostatistics and epidemiology were small relative to most other basic science areas
• Reasons for the decline are unclear – Biostatistics/epidemiology may be learned primarily
during undergraduate medical education – Topics may receive little (or variable) emphasis during
graduate education
Limitations
• Extensive preparation for the high-stakes Step 1 may inflate baseline performance creating a false impression of decline
• Relatively small samples of biostatistics and epidemiology items were used in some content areas
• Use of a cross-sectional design in the study weakens the inferences that can be drawn
• Reliance on an MCQ format limits how understanding of biostatistics and epidemilogy content can be assessed
• No analyses have yet been done to identify school characteristics mediating the magnitude of performance shifts
Drug Ad Format 1 Which of the following statements most accurately
summarizes the effect of Estabile on weight loss? (A) Weight loss is more likely in patients taking Estabile
than in patients taking a placebo (B) Weight loss is less likely in patients taking Estabile
than in patients taking a placebo (C) Expected weight loss in patients taking Estabile is
greater than in patients taking a placebo (D) Expected weight loss in patients taking Estabile is
less than in patients taking a placebo (E) Weight loss in patients taking Estabile is no different
than in patients taking placebo 2 Which of the following prescriptions for Estabile would be
considered an off label use of the medication? (A) For an adult with type 1 diabetes mellitus (B) For a child with type 2 diabetes mellitus (C) Administration by the subcutaneous route (D) Administration twice daily
3 In a randomized controlled clinical trial performed to gain FDA approval for Estabile for the treatment of hyperglycemia in patients with type1 diabetes mellitus, which of the following is the most appropriate treatment for control participants? (A) Placebo (B) Diet and exercise (C) Oral anti-diabetic medication (D) Regular insulin
Abstract Format Please answer the questions below based on the abstract shown (Kernan et al, “Phenylpropanolamine and the risk of hemorrhagic stroke”, NEJM 2000; 343:1826-32 Background: Phenylpropanolamine (PPA) is commonly found in appetite suppressants and cough or cold remedies. Case reports have linked the use of products containing PPA to hemorrhagic stroke, often after the first use of these products. To study the association, we designed a case-control study Methods: Men and women 18 to 49 years of age were recruited from 43 U.S. hospitals. Eligibility criteria included the occurrence of a subarachnoid or intracerebral hemorrhage within 30 days before enrollment and the absence of a previously diagnosed brain lesion. Random-digit dialing identified two matched control subjects per patient. Results: There were 702 patients and 1376 control subjects. For women, the adjusted odds ratio was 16.58 (95 percent confidence interval, 1.51 to 182.21; P=0.02) for the association between the use of appetite suppressants containing PPA and the risk of hemorrhagic stroke and 3.13 (95 percent confidence interval, 0.86 to 11.46; P=0.08) for the association with the first use of a product containing PPA. All first uses of PPA involved cough or cold remedies. For men and women combined, the adjusted odds ratio was 1.49 (95 percent confidence interval, 0.84 to 2.64; P=0.17) for the association between the use of a product containing PPA and the risk of hemorrhagic stroke, 1.23 (95 percent confidence interval, 0.68 to 2.24; P=0.49) for the association with the use of cough or cold remedies that contained PPA, and 15.92 (95 percent confidence interval, 1.38 to 183.13; P=0.03) for the association with the use of appetite suppressants that contained PPA. An analysis in men showed no increased risk of hemorrhagic stroke in association with the use of cough or cold remedies containing PPA. No men reported the use of appetite suppressants. Conclusions: The results suggest that PPA in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.
1. Which of the following is of most concern regarding the validity of the conclusions reported in this study? (A) Children were excluded from the study (B) Hemorrhagic stroke preceded exposure to PPA (C) No men reported the use of appetite suppressants (D) There are more controls than cases in the study (E) There was inaccurate reporting of PPA intake by study participants
2. The reported 95% confidence interval for the adjusted odds ratio for the association between PPA use as an appetite suppressant and hemorrhagic stroke in women is most consistent with which of the following? (A) Large standard deviation for the odds ratio estimate (B) Statistically nonsignificant result (C) True odds ratio of 1.27 for the described association (D) Decreased risk for hemorrhagic stroke in women using PPA as an
appetite suppressant (E) Study underpowered to detect an effect of PPA on hemorrhagic
stroke 3. Which of the following can be calculated based on the data presented
in this study abstract? (A) The number of women exposed to PPA for one additional
hemorrhagic stroke to occur (B) The number of first uses of PPA that involved appetite
suppressants (C) The relative risk increase for hemorrhagic stroke associated with
PPA use in women (D) The proportion of hemorrhagic strokes observed in men in the
study (E) The proportion of PPA use that involved cough and cold remedies