fundamentals of statistical reasoning in education

7/17/2019 Fundamentals of Statistical Reasoning in Education

http://slidepdf.com/reader/full/fundamentals-of-statistical-reasoning-in-education 1/3

Reading the Research: Independent-Samples t Test

Santa and Hoien (1999, p. 65) examined the effects of an early-intervention pro-gram on a sample of students at risk for reading failure:

A t-test analysis showed that the post-intervention spelling performance in

the experimental group (M = 59.6, SD= 5.95) was statistically significantlyhigher than in the control group (M = 53.7, SD= 12.4) , t (47 ) = 2.067, p< .05.

Notice that an exact p value is not reported; rather, probability is reported rel-ative to the significance level of .05. The result of this independent-samples t testis therefore deemed significant at the .05 level.

Source: Santa, C. M. & Hoien, T. (1999). An assessment of early steps: A program for early intervention

of reading problems. Reading Research Quarterly, 34(1), 54–79.

Case Study: Doing Our Homework

This case study demonstrates the application of the independent-samples t test.We compared the academic achievement of students who, on average, spend twohours a day on homework to students who spend about half that amount of timeon homework. Does that extra hour of homework—in this case, double thetime—translate into a corresponding difference in achievement?

The sample of nearly 500 students was randomly selected from a population of seniors enrolled in public schools located in the northeastern United States. (Thedata are courtesy of the National Center for Education Statistics’ National Educa-

tion Longitudinal Study of 1988.) We compared two groups of students: those re-porting 4–6 hours of homework per week (Group 1) and those reporting 10–12hours per week (Group 2). The criterion measures were reading achievement,mathematics achievement, and grade-point average.

One could reasonably expect that students who did more homework would

score higher on measures of academic performance. We therefore chose the direc-tional alternative hypothesis, H 1: m1 m2 < 0, for each of the three t tests below.(The \less than" symbol simply reflects the fact that we are subtracting the hypothet-ically larger mean from the smaller mean.) For all three tests, the null hypothesisstated no difference, H 0: m1 m2 ¼ 0. The level of significance was set at .05.

of significant differences between means considerablyeasier than when the groups are already formed on thebasis of some characteristic of the participants (e.g.,sex, ethnicity).

The assumption of random sampling underliesnearly all the statistical inference techniques used byeducational researchers, including the t test and other

procedures described in this book. Inferences to popu-lations from which the samples have been randomlyselected are directly backed by the laws of probabilityand statistics and are known as statistical inferences;inferences or generalizations to all other groups arenonstatistical in nature and involve judgment andinterpretation.

294 Chapter 14 Comparing the Means of Two Populations: Independent Samples



Our first test examined the mean difference between the two groups in readingperformance. Scores on the reading exam are represented by T scores, which, youmay recall from Chapter 6, have a mean of 50 and a standard deviation of 10. (Re-member not to confuse T scores, which are standard scores, with t ratios, whichmake up the t distribution and are used for significance testing.) The mean scoresare shown in Table 14.3. As expected, the mean reading achievement of Group 2( X 2 ¼ 54:34) exceeded that of Group 1 ( X 1 ¼ 52:41). An independent-samples t

test revealed that this mean difference was statistically significant at the .05 level(see Table 14.4). Because large sample sizes can produce statistical significance forsmall (and possibly trivial) differences, we also determined the effect size in order tocapture the magnitude of this mean difference. From Table 14.4, we see that the raw

mean difference of

1.93 points corresponds to an effect size of

.21. Remember,we are subtracting X 2 from X 1 (hence the negative signs). This effect size indicatesthat the mean reading achievement of Group 1 students was roughly one-fifth of astandard deviation below that of Group 2 students—a rather small effect.

We obtained similar results on the mathematics measure. The difference againwas statistically significant—in this case, satisfying the more stringent .001 signi-ficance level. The effect size, d ¼ :31, suggests that the difference between thetwo groups in mathematics performance is roughly one-third of a standarddeviation. (It is tempting to conclude that the mathematics difference is larger thanthe reading difference, but this would require an additional analysis—testing the

Table 14.3 Statistics for Reading, Mathematics, and GPA

n X s s X

READGroup 1 332 52.41 9.17 .500Group 2 163 54.34 9.08 .710

MATHGroup 1 332 52.44 9.57 .530Group 2 163 55.34 8.81 .690

GPAGroup 1 336 2.46 .58 .030Group 2 166 2.54 .58 .050

Table 14.4 Independent-Samples t Tests and Effect Sizes

X 1 X 2 t df p (one-tailed) d

READ 1.93 2.21 493 .014 .210MATH 2.90 3.24 493 .001 .310GPA .08 1.56 500 .059 .140

Case Study: Doing Our Homework 295



statistical significance of the difference between two differences. We have not done

that here.)Finally, the mean difference in GPA was X 1 X 2 ¼ :08, with a correspondingeffect size of .14. This difference was not statistically significant ( p ¼ :059). Even if it were, its magnitude is rather small (d ¼ :14) and arguably of little practical sig-nificance. Nevertheless, the obtained p value of .059 raises an important point. Al-though, strictly speaking, this p value failed to meet the .05 criterion, it is important toremember that \.05" (or any other value) is entirely arbitrary. Should this result,

p ¼ :059, be declared \statistically significant"? Absolutely not. But nor should it bedismissed entirely. When a p value is tantalizingly close to a but nonetheless fails tomeet this criterion, researchers sometimes use the term marginally significant . Al-though no convention exists (that we know of) for deciding between a \marginallysignificant" result and one that is patently nonsignificant, we believe that it is im-portant to not categorically dismiss results that, though exceeding the announced level

of significance, nonetheless are highly improbable. (In the present case, for example,the decision to retain the null hypothesis rests on the difference in probability be-tween 50/1000 and 59/1000.) This also is a good reason for reporting exact p values inone’s research: It allows readers to make their own judgments regarding statistical sig-nificance. By considering the exact probability in conjunction with effect size, readersdraw a more informed conclusion about the importance of the reported result.

Suggested Computer Exercises

Exercises

1. Access the students data set, which containsgrade-point averages (GPA) and television view-ing information (TVHRSWK) for a random sam-ple of 75 tenth-grade students. Test whether thereis a statistically significant difference in GPA be-tween students who watch less than two hours of television per weekday and those who watch twoor more hours of television. In doing so,

(a) set up the appropriate statistical hypotheses,

(b) perform the test (a ¼ :05), and

(c) draw final conclusions.

2. Repeat the process above, but instead of GPA asthe dependent variable, use performance on thereading and mathematics exams.

independent samples

dependent samplessampling distribution of differences between

meansstandard error of the difference between meanspopulation variance

assumption of homogeneity of

variancevariance estimatepooled variance estimateassumption of population

normality

Identify, Define, or Explain

Terms and Concepts

296 Chapter 14 Comparing the Means of Two Populations: Independent Samples

fundamentals of statistical reasoning in education

Documents