Fundamentals of Statistical Reasoning in Education

Download Fundamentals of Statistical Reasoning in Education

Post on 09-Jan-2016




0 download

Embed Size (px)


Fundamentos de estadistica


<ul><li><p>Reading the Research: Independent-Samples t Test</p><p>Santa and Hoien (1999, p. 65) examined the effects of an early-intervention pro-gram on a sample of students at risk for reading failure:</p><p>A t-test analysis showed that the post-intervention spelling performance inthe experimental group (M=59.6, SD=5.95) was statistically signicantlyhigher than in the control group (M=53.7, SD= 12.4), t(47 )=2.067, p &lt; .05.</p><p>Notice that an exact p value is not reported; rather, probability is reported rel-ative to the signicance level of .05. The result of this independent-samples t testis therefore deemed signicant at the .05 level.</p><p>Source: Santa, C. M. &amp; Hoien, T. (1999). An assessment of early steps: A program for early intervention</p><p>of reading problems. Reading Research Quarterly, 34(1), 5479.</p><p>Case Study: Doing Our Homework</p><p>This case study demonstrates the application of the independent-samples t test.We compared the academic achievement of students who, on average, spend twohours a day on homework to students who spend about half that amount of timeon homework. Does that extra hour of homeworkin this case, double thetimetranslate into a corresponding difference in achievement?</p><p>The sample of nearly 500 students was randomly selected from a population ofseniors enrolled in public schools located in the northeastern United States. (Thedata are courtesy of the National Center for Education Statistics National Educa-tion Longitudinal Study of 1988.) We compared two groups of students: those re-porting 46 hours of homework per week (Group 1) and those reporting 1012hours per week (Group 2). The criterion measures were reading achievement,mathematics achievement, and grade-point average.</p><p>One could reasonably expect that students who did more homework wouldscore higher on measures of academic performance. We therefore chose the direc-tional alternative hypothesis, H1: m1 m2 &lt; 0, for each of the three t tests below.(The \less than" symbol simply reects the fact that we are subtracting the hypothet-ically larger mean from the smaller mean.) For all three tests, the null hypothesisstated no difference,H0: m1 m2 0. The level of signicance was set at .05.</p><p>of signicant differences between means considerablyeasier than when the groups are already formed on thebasis of some characteristic of the participants (e.g.,sex, ethnicity).</p><p>The assumption of random sampling underliesnearly all the statistical inference techniques used byeducational researchers, including the t test and other</p><p>procedures described in this book. Inferences to popu-lations from which the samples have been randomlyselected are directly backed by the laws of probabilityand statistics and are known as statistical inferences;inferences or generalizations to all other groups arenonstatistical in nature and involve judgment andinterpretation.</p><p>294 Chapter 14 Comparing the Means of Two Populations: Independent Samples</p></li><li><p>Our rst test examined the mean difference between the two groups in readingperformance. Scores on the reading exam are represented by T scores, which, youmay recall from Chapter 6, have a mean of 50 and a standard deviation of 10. (Re-member not to confuse T scores, which are standard scores, with t ratios, whichmake up the t distribution and are used for signicance testing.) The mean scoresare shown in Table 14.3. As expected, the mean reading achievement of Group 2(X2 54:34) exceeded that of Group 1 (X1 52:41). An independent-samples ttest revealed that this mean difference was statistically signicant at the .05 level(see Table 14.4). Because large sample sizes can produce statistical signicance forsmall (and possibly trivial) differences, we also determined the effect size in order tocapture the magnitude of this mean difference. From Table 14.4, we see that the rawmean difference of 1.93 points corresponds to an effect size of .21. Remember,we are subtracting X2 from X1 (hence the negative signs). This effect size indicatesthat the mean reading achievement of Group 1 students was roughly one-fth of astandard deviation below that of Group 2 studentsa rather small effect.</p><p>We obtained similar results on the mathematics measure. The difference againwas statistically signicantin this case, satisfying the more stringent .001 signi-cance level. The effect size, d :31, suggests that the difference between thetwo groups in mathematics performance is roughly one-third of a standarddeviation. (It is tempting to conclude that the mathematics difference is larger thanthe reading difference, but this would require an additional analysistesting the</p><p>Table 14.3 Statistics for Reading, Mathematics, and GPA</p><p>n X s sX</p><p>READGroup 1 332 52.41 9.17 .500Group 2 163 54.34 9.08 .710</p><p>MATHGroup 1 332 52.44 9.57 .530Group 2 163 55.34 8.81 .690</p><p>GPAGroup 1 336 2.46 .58 .030Group 2 166 2.54 .58 .050</p><p>Table 14.4 Independent-Samples t Tests and Effect Sizes</p><p>X1X2 t df p (one-tailed) dREAD 1.93 2.21 493 .014 .210MATH 2.90 3.24 493 .001 .310GPA .08 1.56 500 .059 .140</p><p>Case Study: Doing Our Homework 295</p></li><li><p>statistical signicance of the difference between two differences. We have not donethat here.)</p><p>Finally, the mean difference in GPA was X1 X2 :08, with a correspondingeffect size of .14. This difference was not statistically signicant (p :059). Even ifit were, its magnitude is rather small (d :14) and arguably of little practical sig-nicance. Nevertheless, the obtained p value of .059 raises an important point. Al-though, strictly speaking, this p value failed to meet the .05 criterion, it is important toremember that \.05" (or any other value) is entirely arbitrary. Should this result,p :059, be declared \statistically signicant"? Absolutely not. But nor should it bedismissed entirely. When a p value is tantalizingly close to a but nonetheless fails tomeet this criterion, researchers sometimes use the term marginally signicant. Al-though no convention exists (that we know of) for deciding between a \marginallysignicant" result and one that is patently nonsignicant, we believe that it is im-portant to not categorically dismiss results that, though exceeding the announced levelof signicance, nonetheless are highly improbable. (In the present case, for example,the decision to retain the null hypothesis rests on the difference in probability be-tween 50/1000 and 59/1000.) This also is a good reason for reporting exact p values inones research: It allows readers to make their own judgments regarding statistical sig-nicance. By considering the exact probability in conjunction with effect size, readersdraw a more informed conclusion about the importance of the reported result.</p><p>Suggested Computer Exercises</p><p>Exercises</p><p>1. Access the students data set, which containsgrade-point averages (GPA) and television view-ing information (TVHRSWK) for a random sam-ple of 75 tenth-grade students. Test whether thereis a statistically signicant difference in GPA be-tween students who watch less than two hours oftelevision per weekday and those who watch twoor more hours of television. In doing so,</p><p>(a) set up the appropriate statistical hypotheses,</p><p>(b) perform the test (a :05), and(c) draw nal conclusions.</p><p>2. Repeat the process above, but instead of GPA asthe dependent variable, use performance on thereading and mathematics exams.</p><p>independent samplesdependent samplessampling distribution of differences between</p><p>meansstandard error of the difference between meanspopulation variance</p><p>assumption of homogeneity ofvariance</p><p>variance estimatepooled variance estimateassumption of population</p><p>normality</p><p>Identify, Dene, or Explain</p><p>Terms and Concepts</p><p>296 Chapter 14 Comparing the Means of Two Populations: Independent Samples</p></li></ul>