# what students learn (and don’t learn) about inferential reasoning in introductory statistics...

TRANSCRIPT

What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses

2014 Joint Statistical Meetings (JSM)Boston, MA

Sharon Lane-GetazSt. Olaf College, Northfield, MN [email protected]

Objective

What does statistics education research report about correct conceptions, difficulties and misconceptions people have with inferential reasoning? How might this be of help to statistical consultant dealing with clients? • Background: To assess impact of methods on teaching inference,

developed instrument to assess 14 known misconceptions and difficulties, added items to assess correct conceptions.

• Measurement: Reasoning about P-values and Statistical Significance (RPASS) scale reliability in this study is Cronbach’s alpha = .76 (37 items).

• Study: Compare Pretest and Posttest proportions of students answering each item correctly on a scatterplot (canoe plot).

• Discussion: Emphasize what students generally learn and what problems tend to persist.

Sharon Lane-Getaz, [email protected]

Subjects and Setting

• Subjects (N = 138) from two introductory-level statistics courses aimed at the social sciences (n1 = 78) and natural sciences (n2 = 60).

• 138 out of 167 enrolled students completed the Pre- and Posttest, and consented to participate (83% response)

• (94) females, (43) males, (1) no response

• (34) first years, (56) sophomores, (30) juniors, (18) seniors.

• Setting: Small liberal arts college (3000 students) in the upper Midwest US, a small town of “cows, colleges and contentment”

• Time: Spring semester 2011.

Sharon Lane-Getaz, [email protected]

Broad range of results with two courses combined: RPASS-9 Pretests and Posttest Totals

Sharon Lane-Getaz, [email protected]

Sharon Lane-Getaz, [email protected]

Pre- and Posttest Totals Gains by Course

Aggregate Results for Both Courses (N = 138)

70% of 37 RPASS-9 Posttest items correct, on average.

Five more Posttest items correct, on average: RPASS-9 Posttest (Mean = 26.1, SD = 5.1)

RPASS-9 Pretest (Mean = 21.0, SD = 4.2)

• What did students learn, by item, … and what did they not learn?

Sharon Lane-Getaz, [email protected]

Item-Level Analysis (Canoe Plot)

Canoe Plot of item-level changes in proportion correct

Scatterplot of Pretest to Posttest proportions by item

95% confidence band along pposttest = ppretest differentiates items with a significant difference in proportions answering correctly from items with insignificant differences (Posttest – Pretest).

Wilson adjusted margins of error: maintains a 95% nominal rate (Agresti & Caffo, 2000).

No family-wise correction, intended for descriptive purposes.

)2/()1(~ iii nXp

Sharon Lane-Getaz, [email protected]

Proportion Correct Responses by RPASS-9 item Pretest on x, Posttest on y (37 items, N = 138)

23 items above the 95% confidence band, 13 within, and 1 below

Improved 14 Correct Conceptionsof the 23 Items “Above the Band”

• Improved Statistical Literacy:• Recognize textbook definitions of p-value (1-1, 6-1)• Link p-value to sampling variation (2-1)• Understand p-value as a rareness measure (3a-2)

• Improved Inferential Reasoning: • Assess significance graphically (3b-1)• Reason about variation (3c-2)• Assess impact of alternative hypothesis on p-value (1-3, 4b-1)• Differentiate small p-values, Type I and II errors (6-2, 6-7)• Reason about sample size impact on p-value (6-4)• Reason about strength of evidence vs. p-value (2-2, 4a-1, 6-3)

Sharon Lane-Getaz, [email protected] (5) Green items indicate pc < .50 on Pretest

Improved (Suppressed) 9 Misconceptionsof the 23 items “Above the Band”

• State conclusions within confines of scope of inference:• Need random sample to generalize sample to population (5-4)• Need random assignment to draw causal conclusion (4a-3).

• Interpret what a P-value is NOT:• Always small or always desired to be low value (3a-3, 3b-3)• Probability the Null Hypothesis is false or true (5-1, 5-2) • Alpha or significance level (4a-1)

• Interpret that a small P-value does NOT mean:• Chance caused results observed (2-4)• Provides definitive, contrapositive proof (3a-1)

Sharon Lane-Getaz, [email protected] (3) Red items indicate pc < .50 on Pretest

No Improvement: “Within the Band”Correct Conceptions (C)

• Reason about variation in boxplot depiction (3c-1) C

• Making correct rejection decision (4b-3) C

• Recognize an informal definition of p-value (1-2) C

• Recognize p-value as a conditional probability (2-3) C

• Use Confidence Intervals for statistical significance (2-5) C

• Differentiate p-values from effects (4a-2) C

• Interpret large p-value (4b-2) C

• Consider impact of sample size on p-values (4b-4, 6-4) C

Sharon Lane-Getaz, [email protected] Green indicates pc < .50 on Pretest

No Improvement: “Within the Band” Misconceptions (M) or Multiple Choice Items

• Belief increased replications = increased sample size (4b-6) M

• Belief p-values always low or desired to be low (3b-2) M

• Differentiate statistical vs. practical significance (4b-5, 6-5) C/M

• Check conditions before making an inference (6-6) C/M

Sharon Lane-Getaz, [email protected] Red indicates pc < .50 on Pretest

The One item “Below the Band”Unlearning, Guessing, Confusion?

Responses for one item suggest better reasoning on the Pretest than on the Posttest (just below the 95% confidence band):

When asked to choose correct direction to shade the p-value in the sampling distribution of means (3b-4)

Students tend to select shade “to the right;” even though the alternative hypothesis suggests that one should shade the larger left tail.

Sharon Lane-Getaz, [email protected]

Remind clients of caveats and limitations of the statistical inference process.

• P-value is an integrated part of the larger statistical process• Logic of inference (how we interpret results) depends on sample size,

relates to effect size and importance, and whether conditions were met.• Scope of inference (what we can conclude) depends on randomness in

study design; how the data were gathered

• Confidence interval (CI) estimates population parameters or true effects, given the sample we observed…and • Provides complementary information than p-values do alone (bounds for

the effect). • Can assess statistical significance. For example, point out whether a null

hypothesis is in the interval or not. Is zero in the interval? Is the interval all positive or all negative?

Sharon Lane-Getaz, [email protected]

Students in a randomization-based curriculum learn more on average, but ironically show no improvement on 5 items associated with the randomization distribution:

• How one- or two-tailed test relates to p-value (4b-2) M• Correct rejection decision (4b-3) C• Impact of sample size on significance (4b-4) M• Significance vs. practical importance (4b-5)• Impact of increasing sample size vs. replications (4b-6) M

Sharon Lane-Getaz, [email protected]

A Surprise Aside

References

Agresti, A, & Caffo, B. (2000), Simple and Effective Confidence Intervals for Proportions andDifferences of Proportions result from Adding Two Successes and Two Failures. The American Statistician, 54(4), 280–288.

Chance, B. L., & Rossman, A. J. (2006), Investigating Statistical Concepts, Applications, andMethods, Belmont, CA: Brooks/Cole – Thomson Learning.

Cobb, G. (2007), The Introductory Statistics Course: A Ptolemaic Curriculum?. TechnologyInnovations in Statistics Education, 1,(1). http://repositories.cdlib.org/uclastat/cts/tise/

Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170-180.delMas, R. C., Garfield, J. B., Ooms, A., & Chance, B. (2007), Assessing Students’ Conceptual

Understanding after a First Course in Statistics. Statistics Education Research Journal [online], (6)2, 28-58. http://www.stat.auckland.ac.nz/serj

Lane-Getaz, S. J. (2013). Development of a Reliable Measure of Students’ InferentialReasoning Ability. Statistics Education Research Journal (SERJ), 12(1), 20-47. http://iase-web.org/documents/SERJ/SERJ12(1)_LaneGetaz.pdf

Lane-Getaz, S. J. (2007). Toward the Development and Validation of the Reasoning about P-values and Statistical Significance Scale. In B. Phillips & L. Weldon (Eds.), Proceedings of the ISI / IASE Satellite Conference on Assessing Student Learning in Statistics, Voorburg, The Netherlands: ISI. http://www.stat.auckland.ac.nz/~iase/publications/sat07/Lane-Getaz.pdf

Utts, J. (2003). What Educated Citizens Should Know about Statistics and Probability. TheAmerican Statistician, 57(2), 74-79.

Sharon Lane-Getaz, [email protected]

Contact Information & Slides

Sharon Lane-Getaz, [email protected] sabbatical this coming year and would love to collaborate with YOU to administer the RPASS at your institution! Let’s talk!

These JSM-2014 presentation slides will be available from:http://sharonlanegetaz.efoliomn.com/JSM2014

The differences in proportions by item appear in the Appendix of this presentation. Please see the proceedings for more!

RPASS-9 item Concept or difficulty assessed p2-p1

6-1 Selects a textbook definition of a p-value given multiple choices.

.41

3b-1 Uses a density curve and an observed value to estimate if the observed value (or more extreme) is statistically significant.

.36

5-3 Reasons smaller p-value, stronger the evidence of a difference or effect.

.36

4a-1 Confuses p-value with significance level a. .35

2-1 Recognizes p-value in terms of variation in a sampling distribution.

.33

1-3 Understands magnitude of p-value depends if test is one- or two-sided.

.30

4a-2 Reasons greater evidence of a difference or effect, smaller the p-value.

.27

2-2 Understands stronger evidence of difference or effect, smaller p-value.

.23

3c-2 Employs graphical reasoning about variation .23

6-2 Understands a small p-value suggests results are statistically significant.

.23

2-4 Believes the p-value is the probability observed results are due to chance or caused by chance, if the null is true.

.22

3a-1 Believes statistics provide definitive proof; misuses the deterministic Boolean logic of contrapositive proof.

.19

Table 1. Proportion Correct on RPASS-9 Posttest item exceeds Pretest Proportion Correct (12 of 23 items)

Note. aItems associated with sampling or randomization distribution. bRequests explanation of reasoning.

Sh

aro

n L

an

e-G

eta

z,

lan

eg

eta

@st

ola

f.ed

u

RPASS-9 item Concept or difficulty assessed p2-p1

4b-1 Interprets a p-value for a one-tailed hypothesis. .18

5-1 Misinterprets a p-value as the probability the null hypothesis is false.

.17

5-2 Believes p-value is the probability that the alternative hypothesis is true.

.17

6-3 Understands stronger evidence of difference or effect, smaller p-value.

.17

6-4 Reasons about impact of a small sample size on statistical significance.

.16

3a-2 Understands the p-value as a rareness measure. .14

4a-3 Believes causal conclusion can be drawn from small p-values regardless of study design.

.14

1-1 Recognizes a formal textbook definition of the p-value without context.

.13

3b-3 Believes p-value is always a low number (or always desired to be a low).

.13

3a-3 Belief p-values are always a low value or are always desired to be a low value

.12

6-7 Differentiates between concepts of Type I and Type II error. .12

Table 1 contd. Proportion Correct on RPASS-9 Posttest exceeds Pretest Proportion Correct (11 of

23 items)

Note. aItems associated with sampling or randomization distribution. bRequests explanation of reasoning.

Sh

aro

n L

an

e-G

eta

z,

lan

eg

eta

@st

ola

f.ed

u

RPASS-9 item Concept or difficulty assessed p2-p1

6-5 Understands small p-value does not mean practical importance. .083b-2 Believes p-value is always a low number (or desired to be low). .074b-4 Relationship between sample size and p-value .07

2-3 Understands p-value is conditioned on the null hypothesis being true.

.06

2-5 Confidence intervals can assess statistical significance, much like p-values are used when hypothesis testing

.06

4b-5 Differentiates statistical sand practical significance .03

4b-2 Difficulty with one versus two-tailed p-value .013c-1 Employs graphical reasoning about variation 04b-3 Understands the rejection decision 0

5-4 Confuses if statistical significance refers to a sample or a population.

-.05

4b-6 Understands impact of increasing number of replications in a simulation versus the impact of increasing the sample size.

-.06

6-6 Understands to conduct a significance test, conditions must be met.

-.06

1-2 Recognizes an informal description of the p-value embedded in context.

-.07

Table 2: Equal Proportion of Students Answer RPASS-9 Item Correctly On Posttest and Pretest (13 items)

Note. aItems associated with sampling or randomization distribution. bRequests explanation of reasoning.

Sh

aro

n L

an

e-G

eta

z,

lan

eg

eta

@st

ola

f.ed

u