Post on 28-Oct-2016




0 download

Embed Size (px)


  • www.elsevier.com/stueduc

    Studies in Educational Evaluation 32 (2006) 317344


    Petra Lietz

    International University Bremen, Germany1


    This study conducts a meta-analysis using hierarchical linear modeling (HLM) toaddress the questions of the extent of gender differences in reading, whether or notthese differences apply similarly across English and non-English speaking countriesand decrease with age. Female secondary students performed 0.19 standard deviationunits above their male peers, regardless of age, language of instruction, and whethereffect sizes were based on mean differences or correlation coefficients. However,differences emerged between assessment programs, with recent Australian Studies,the National Assessment of Educational Progress Studies (NAEP) and the Programmefor International Student Assessment (PISA) reporting greater gender differences.


    The research reported in this article extends across two major areas: one content-related, namely gender differences in reading, and one method-related, namely meta-analysis. Each of these areas is discussed below.

    Gender Differences in Reading Achievement

    The view prevails that boys perform better than girls in mathematics and the naturalsciences whereas the reverse holds in languages, social studies and reading. A closer

    0191-491X/04/$ see front matter # 2006 Published by Elsevier Ltd.doi:10.1016/j.stueduc.2006.10.002

  • P. Lietz / Studies in Educational Evaluation 32 (2006) 317344318

    examination of the research on reading, however, reveals that this is not as clear-cut as itappears and that results can be grouped into two main categories: One showing evidence ofgirl's superiority over boys in reading achievement, and one providing no evidence ofgender differences.

    In a large-scale study of students in Grades 2, 4, and 6 in which sampling wasconfined to small geographical areas in four English-speaking countries, namely Canada,England, Nigeria and the U.S., Johnson (1973-1974) found evidence to support the viewput forward by Preston (1962) that gender differences in reading were culturally rather thanphysiologically determined because gender differences varied between countries: InEngland and Nigeria, boys' mean performance exceeded that of girls whereas the reverseapplied in Canada and the United States.

    In 1990/91, the Reading Literacy Study was undertaken by the InternationalAssociation for the Evaluation of Educational Achievement (IEA). Out of a total of 31education systems that participated at the 14-year-old level, Wagemaker, Taube, Munck,Kontogiannopoulou-Polydorides and Martin (1996) reported significant gender differenceson the total reading score for 12 systems with 10 favouring girls and 2 favouring boys.

    In 1995, a reading study of Grade 6 students was conducted by the Southern AfricaConsortium for Monitoring Educational Quality using the instruments designed by the IEAReading Literacy Study. Saito (1998) calculated differences in mean percentages of correctanswers for boys and girls on the total reading score. While girls performed better thanboys in Mauritius and Zimbabwe, boys achieved higher scores in Namibia, Zambia andZanzibar. These differences, however, were not significant at the 95% confidence level asnone of the differences was greater than or equal to two sampling errors. Likewise, in anapplication of the IEA Reading Literacy Study in Latvia, Dedze (1995) reported highermean achievement of female students (485) than male students (476). Again, however,these differences turned out not to be significant when the standard errors (6.2 for boys and6.4 for girls) were taken into account.

    In Western Australia, the program "Monitoring Standards in Education" includedassessments of reading of students in Years 7 and 10 in 1992, 1995, 1997, and 2001(Department of Education and Training, 2003). Results revealed a decline in averagereading performance over that 10-year-period at the Year 10 level whereas averageperformance remained stable at the Year 7 level. The gender differences in readingachievement were similar at Year 7 and Year 10 levels, except in the 1997 assessmentwhere the gender difference for Year 7 students was far smaller than the one for Year 10students. Girls outperformed boys across all Year levels and on all occasions.

    The reading assessment component of the 2003 National Assessment of EducationalProgress (NAEP) conducted in the United States was the sixth assessment in a series thathad started in 1992. The assessment of the 8th grade students comprised three main typesof exercises, namely reading for literary experience, reading to gain information, andreading to perform a task. Results of the 2003 assessment were averaged across the threetypes of exercises and revealed that scores for female students were 11 points higher onaverage than scores for male 8th graders. This difference was the same as it had been inprevious NAEP assessment cycles (Plisko, 2003).

    There is, however, some supportive evidence for the claim that boys and girlsperform equally well in reading.

  • P. Lietz / Studies in Educational Evaluation 32 (2006) 317344 319

    The first major international comparative study into reading, the ReadingComprehension Study, was conducted by IEA as part of the Six Subject Survey in 1970/71.Thorndike (1973), who reported the findings of the study, found median correlationsbetween selected background variables and reading comprehension scores across all 15participating education systems, namely Belgium (Flemish), Belgium (French), Chile,England, Finland, Hungary, India, Iran, Israel, Italy, the Netherlands, New Zealand,Scotland, Sweden, and the United States of America. While the coefficients at the 14-year-old level were positive, indicating a higher performance of girls, the gender differenceswere not significant in spite of the large and nationally representative samples employed.Indeed the median correlations for sex of students were the smallest when compared toother student background factors, and far behind reading resources and father's occupation.

    Hogrebe, Nist and Newman (1985) undertook a review of research on genderdifferences in reading research at both the primary and secondary school levels. Accordingto their review, no evidence for significant gender differences in reading had emerged fromstudies conducted in the 1930s and 1940s. Their analysis of the 1980 data set of the HighSchool and Beyond data base showed that gender accounted for less than one per cent ofthe variance in reading achievement test scores. This was the case not only in the overallsample but also when gender differences were analyzed for different subgroups for whichsuch differences could be assumed to emerge more readily, e.g., low achieving students.

    Likewise, in a systematic meta-analysis on the topic of gender differences in readingachievement, Knickerbocker (1989) concluded that, in general, studies have not providedany evidence for the superiority of girls over boys.

    In conclusion, it can be argued that while the research discussed above providesmore support for the existence of a gender gap in reading performance in favour of femalestudents, some studies dispute this finding. Moreover, studies provide inconclusiveevidence as regards the extent of gender differences in reading at the secondary schoollevel. Hence, a more systematic approach to integrating research findings, namelystatistical meta-analysis (Glass, McGaw, & Smith 1981; Hunter & Schmidt, 2004) issuggested and discussed in greater detail below.


    The term "meta-analysis" was first introduced by Glass (1976, 1977) to denote asystematic integration of research findings on a specific topic, and has been developedfurther as an analytical technique (Rosenthal, 1984). The need for a more systematic way ofintegrating prior research than by means of narrative research reviews was identified as areaction to criticisms aimed at the social sciences by both funding agencies and the publicas to whether or not any progress was being made in terms of deriving firm knowledgefrom the seemingly abounding and contradictory evidence generated from many researchabundant in the social sciences (Light & Smith, 1971).

    As Hunter and Schmidt (2004, p. 16) emphasize: "In many areas of research, theneed today is not for additional empirical data but for some means of making sense of thevast amounts of data that have been accumulated." Moreover, they point out that thenarrative integration of research findings has serious shortcomings in that this strategy ofintegrating research results often leads to different conclusions if done by different people.

  • P. Lietz / Studies in Educational Evaluation 32 (2006) 317344320

    Statistical meta-analysis, in contrast, as a quantitative way of integrating research findingsshould lead to the same conclusion, regardless of the person applying the procedure.

    Thus, the challenge in the social sciences in general and in educational research inparticular, is to integrate systematically and quantitatively findings from the large numberof research studies that have been undertaken in order to contribute empirically verifiedfacts to the cumulative body of knowledge.

    Various researchers have taken up the challenge and undertaken stringent statisticalmeta-analyses to summarize research efforts in education on a number of different topics.Kulik, Bangert, and Williams (1983), for example, investigated the effects of computer-based teaching on secondary school students. In their meta-analysis of results of 51independent research studies on this issue, they found that students who experiencedcomputer-based instruction performed about 0.32 standard deviations better than studentstaught by traditional methods. The authors also investigated twelve study characteristics ranging from whether the computer was used for managing, tutoring, simulation,programming or drill to whether the instructional materials had been produced locally orcommercially and their potential impact on effect size. Only year of publication andduration of the study were found to have a significant impact on effect size suggesting thatmore recent and shorter studies had a more positive effect on performance than older andlonger studies.

    Bangert-Drowns, Hurley and Wilkinson (2004) reported results of a meta-analysis inwhich they summarized 48 independent studies evaluating school-based writing-to-learnprograms. The effect sizes that the authors used were calculated on the basis ofstandardized mean differences whereby the mean academic performance of theconventional learning group was subtracted from the writing-to-learn group and divided bythe standard deviation pooled across the two groups. In cases where primary study hadreported only t-tests or F-values, the authors used transformations suggested by Glass et al.(1981) to arrive at effect sizes. Bangert-Drowns et al. (2004) argued that assigning greaterweight to larger studies through weighing each effect size by the inverse of the effect size'ssampling error as suggested by Hedges (1994) might not be warranted. In order to shedlight on this issue, they performed the analyses with both weighted and unweighted meansof Cohen's d (Cohen, 1988). Results showed an average unweighted Cohen's d of 0.26standard deviations, with a 95% confidence interval from 0.15 to 0.38, compared with anaverage weighted Cohen's d of 0.17 with a 95% confidence interval from 0.11 to 0.22. Asthe unweighted as well as the weighted average effect size differed significantly from zero,the authors proceeded to examine whether or not a number of variables such as treatmentcontext, treatment intensity or features of the writing task had an impact on effect size.Analyses with both weighted and unweighted effect sizes led to the same conclusion,namely that minutes per in-class writing task, metacognitive stimulation and students in themiddle grades when compared with elementary, high school and college students weresignificantly related to effect size.

    Cohen, Kulik and Kulik (1982) undertook a meta-analysis of 65 independent studiesthat investigated the impact of tutoring on student performance. Like Kulik Bangert andWilliams (1983), the authors ensured that each primary study was only included once in themeta-analysis. To this end, where multiple reports described findings from the same study,the most complete report was used, and where multiple measures of outcomes or measures

  • P. Lietz / Studies in Educational Evaluation 32 (2006) 317344 321

    for different school subjects were reported, these were combined into one compositemeasure. In addition to the effect sizes, the authors examined 13 study features, includingwhether or not the tutors had received training, whether or not students were assignedrandomly to study groups, whether the performance assessed by way of a commercial or ateacher-designed test and the class level of tutors. Six study features were shown to have animpact on the effect size between tutoring and performance of the tutored student, wherebyeffect sizes from studies were higher when: (a) performance was measured by an instructor-developed test; (b) lower-order tasks were assessed; (c) tutoring was structured; (d)duration of tutoring was shorter (i.e., up to 18 weeks); (e) primary study results wereunpublished; and (f) mathematics rather than reading was the subject matter.

    Other meta-analyses have focused on a number of different aspects of readingachievement. Thus, Neuman (1986) undertook a meta-analysis of eight state-wideassessment and data from the 1984 NAEP study in order to examine the link betweenreading achievement, TV viewing and related reading activities. Results of the meta-analysis showed that TV viewing contributed not much to the explanation of differences inreading and that TV viewing did not replace time spent on leisure time reading. In the areaof primary reading instruction, Stahl and Miller (1988) conducted a meta-analysis of 34studies to compare the effectiveness of basal reader approaches and the languageexperience approach. No significant differences in effect sizes emerged depending on thetype of approach employed which suggested similar effectiveness of both approaches.

    However, none of these meta-analyses have focused specifically on genderdifferences in reading. In addition, advances in hierarchical linear modeling (HLM) haveoccurred which allow for the clustered nature of meta-analytic analyses to be taken intoaccount more appropriately. Thus, Raudenbush and Bryk (2002) have argued that the mainpurpose of a meta-analysis is to examine the extent to which effects reported in the resultsof primary study are consistent and to disentangle what part of the variance in study resultsis due to sampling error and which component is due to actual treatment implementation.As a consequence, Raudenbush and Bryk (2002) have proposed an empirical B...


View more >