comparison of performance in biostatistics and ... · pdf filecomparison of performance in...

Comparison of Performance in Biostatistics and Epidemiology across

USMLE Steps 1, 2, and 3

D Swanson, W Ouyang K Holtzman, M Johnson, and S Haist

National Board of Medical Examiners

Philadelphia, PA

Purpose of Study

• Several years ago, the Committee to Evaluate the USMLE Program (CEUP) recommended introduction of “a testing format designed to assess an examinee’s ability … to engage in evidence-based decision making.”

• This study was conducted to obtain baseline information about examinees’ performance in biostatistics and epidemiology as they progress through USMLE.

Method: Test Material and Subjects

Test Material • 139 (unscored) biostatistics

and epidemiology items – 77 used on Steps 1, 2CK, 3 – 62 used on Steps 2CK, 3

Procedure • Small number of study items

randomly selected for each examinee

Subjects • First-time takers from

LCME schools taking Step exams in 2010-11

• Median number of examinees/item: – Step 1: 225 – Step 2 CK: 557 – Step 3: 395

Number Needed to Treat A trial of a new lipid-lowering drug is conducted in a population of patients aged 60 years and older. The drug has been designed to decrease the risk for myocardial infarction (MI). The study participants have multiple risk factors for cardiovascular disease. After 3 years of treatment, the incidence of MI in the intervention group is 2% compared with 10% in the control group (p<0.05). Based on these data, which of the following best represents the number of patients who must be treated with this drug for 3 years to prevent one MI?

(A) 6.25 (B) 12.5 (C) 18.75 (D) 25 (E) 50

Exam p-value biserial Step 1 84 0.49 Step 2 CK 67 0.18 Step 3 65 0.26

SCBQ0836

Flaws in Study Design Investigators are conducting a study of a community-level intervention to reduce body mass index (BMI) among school-age children. Forty-eight communities are considered for the study, but only 24 of these communities meet the criteria that the investigators have chosen, including similar population sizes, median incomes, and proportions of total area that are devoted to recreation. These 24 communities are then randomly assigned to intervention status or control status. Changes in the BMIs of 100 randomly selected children in each town are measured by trained research assistants who are blinded to community assignment. The results favor the intervention by a statistically and clinically significant margin. The conclusion by the investigators that the intervention should be implemented in communities across the United States is most likely to be criticized on which of the following grounds?

(A) Ascertainment bias (B) Generalizability (C) Inappropriate control group (D) Recall bias (E) Selection bias

Exam p-value biserial Step 2 CK 77 0.36 Step 3 66 0.07

SCBq0896

Item Difficulty and Discrimination by Step

Exam Number of Items

Median Item Difficulty (p-value)

Median Item Discrimination

(biserial) Step 1 77 0.697 0.267 Step 2 CK 0.680 0.211 Step 3 0.636 0.216 Step 2 CK 139 0.653 0.199 Step 3 0.636 0.200

Item Difficulties (p-values): Step 2 CK vs Step 1

Better performance on Step 2 CK

Better performance on Step 1

Item Difficulties (p-values): Step 3 vs Step 1



Item Difficulties (p-values): Step 3 vs Step 2 CK


Better performance on Step 2 CK

Results of Matched-Pairs t-tests on Logit-Transformed Item Difficulties

Exams Compared # of

Items Correlation t-test Signif Step 1 and Step 2 CK 77 0.88 1.25 NS

Step 1 and Step 3 77 0.85 2.59 p < 0.012

Step 2 CK and Step 3 139 0.92 3.10 p < 0.002

Summary of Results and Areas of Declining Performance

• Overall performance declined – 1.5% (non-significant) from Step 1 to Step 2 CK

– 1.5% further decline from Step 2 CK to Step 3

• Review of item content indicated performance declines were largest for items on: – Sensitivity, specificity, and predicted value

– Health impact (risk/benefit)

-35

-30

-25

-20

-15

-10

-5

0

5

10

1975-771981-821988-891991-922004-052008-09 Step 22009-10 Step 3

Total

Biochem

Micro Anat Phys

Pharm

Behav Sci Path

Basic Science Retention: NBME Research from 1975 to 2010

Discussion • Glass half-empty or half-full? Observed performance

declines in biostatistics and epidemiology were small relative to most other basic science areas

• Reasons for the decline are unclear – Biostatistics/epidemiology may be learned primarily

during undergraduate medical education – Topics may receive little (or variable) emphasis during

graduate education

Limitations

• Extensive preparation for the high-stakes Step 1 may inflate baseline performance creating a false impression of decline

• Relatively small samples of biostatistics and epidemiology items were used in some content areas

• Use of a cross-sectional design in the study weakens the inferences that can be drawn

• Reliance on an MCQ format limits how understanding of biostatistics and epidemilogy content can be assessed

• No analyses have yet been done to identify school characteristics mediating the magnitude of performance shifts

Drug Ad Format 1 Which of the following statements most accurately

summarizes the effect of Estabile on weight loss? (A) Weight loss is more likely in patients taking Estabile

than in patients taking a placebo (B) Weight loss is less likely in patients taking Estabile

than in patients taking a placebo (C) Expected weight loss in patients taking Estabile is

greater than in patients taking a placebo (D) Expected weight loss in patients taking Estabile is

less than in patients taking a placebo (E) Weight loss in patients taking Estabile is no different

than in patients taking placebo 2 Which of the following prescriptions for Estabile would be

considered an off label use of the medication? (A) For an adult with type 1 diabetes mellitus (B) For a child with type 2 diabetes mellitus (C) Administration by the subcutaneous route (D) Administration twice daily

3 In a randomized controlled clinical trial performed to gain FDA approval for Estabile for the treatment of hyperglycemia in patients with type1 diabetes mellitus, which of the following is the most appropriate treatment for control participants? (A) Placebo (B) Diet and exercise (C) Oral anti-diabetic medication (D) Regular insulin

Abstract Format Please answer the questions below based on the abstract shown (Kernan et al, “Phenylpropanolamine and the risk of hemorrhagic stroke”, NEJM 2000; 343:1826-32 Background: Phenylpropanolamine (PPA) is commonly found in appetite suppressants and cough or cold remedies. Case reports have linked the use of products containing PPA to hemorrhagic stroke, often after the first use of these products. To study the association, we designed a case-control study Methods: Men and women 18 to 49 years of age were recruited from 43 U.S. hospitals. Eligibility criteria included the occurrence of a subarachnoid or intracerebral hemorrhage within 30 days before enrollment and the absence of a previously diagnosed brain lesion. Random-digit dialing identified two matched control subjects per patient. Results: There were 702 patients and 1376 control subjects. For women, the adjusted odds ratio was 16.58 (95 percent confidence interval, 1.51 to 182.21; P=0.02) for the association between the use of appetite suppressants containing PPA and the risk of hemorrhagic stroke and 3.13 (95 percent confidence interval, 0.86 to 11.46; P=0.08) for the association with the first use of a product containing PPA. All first uses of PPA involved cough or cold remedies. For men and women combined, the adjusted odds ratio was 1.49 (95 percent confidence interval, 0.84 to 2.64; P=0.17) for the association between the use of a product containing PPA and the risk of hemorrhagic stroke, 1.23 (95 percent confidence interval, 0.68 to 2.24; P=0.49) for the association with the use of cough or cold remedies that contained PPA, and 15.92 (95 percent confidence interval, 1.38 to 183.13; P=0.03) for the association with the use of appetite suppressants that contained PPA. An analysis in men showed no increased risk of hemorrhagic stroke in association with the use of cough or cold remedies containing PPA. No men reported the use of appetite suppressants. Conclusions: The results suggest that PPA in appetite suppressants, and possibly in cough and cold remedies, is an independent risk factor for hemorrhagic stroke in women.

1. Which of the following is of most concern regarding the validity of the conclusions reported in this study? (A) Children were excluded from the study (B) Hemorrhagic stroke preceded exposure to PPA (C) No men reported the use of appetite suppressants (D) There are more controls than cases in the study (E) There was inaccurate reporting of PPA intake by study participants

2. The reported 95% confidence interval for the adjusted odds ratio for the association between PPA use as an appetite suppressant and hemorrhagic stroke in women is most consistent with which of the following? (A) Large standard deviation for the odds ratio estimate (B) Statistically nonsignificant result (C) True odds ratio of 1.27 for the described association (D) Decreased risk for hemorrhagic stroke in women using PPA as an

appetite suppressant (E) Study underpowered to detect an effect of PPA on hemorrhagic

stroke 3. Which of the following can be calculated based on the data presented

in this study abstract? (A) The number of women exposed to PPA for one additional

hemorrhagic stroke to occur (B) The number of first uses of PPA that involved appetite

suppressants (C) The relative risk increase for hemorrhagic stroke associated with

PPA use in women (D) The proportion of hemorrhagic strokes observed in men in the

study (E) The proportion of PPA use that involved cough and cold remedies

comparison of performance in biostatistics and ... · pdf filecomparison of performance in...

Documents