systematic reviews of the effects of preparatory courses on university entrance examinations in high...

10
Systematic Literature Reviews Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age studentsMontgomery P, Lilly J. Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students This systematic review examines the effects of coaching on ScholasticAssessment Test (SAT) scores and, ultimately, uni- versity entrance.We show a significant effect in score improve- ment for coached students over their uncoached peers in both the Math and Verbal subtests of the examination. This review’s findings indicate treatment group gains over control group of 23.5 points on the Verbal subtest and 32.7 points on the Math subtest, for a combined score of nearly triple that which was previously assumed. As long as coaching remains inaccessible to some students, we urge universities to reconsider the weight given to SAT scores in the undergraduate admissions process. We challenge the designers of the SAT to redesign the exami- nation to eliminate the possibility of score gains from coaching. Finally, we call for researchers to increase the production of high-quality data in this field to ensure accurate estimates of coaching’s effects are made available to all. Paul Montgomery, Jane Lilly Centre for Evidence-Based Intervention, Oxford University, UK Key words: prep courses, SAT, university entrance, examina- tion tests, systematic review, United States of America Paul Montgomery, Centre for Evidence-Based Intervention, Oxford University Barnett House, 32 Wellington Square, Oxford, OX1 2ER, UK E-mail: [email protected] Accepted for publication April 9, 2011 Background University entrance examination scores play an impor- tant and often understated role in the college admissions process, regularly influencing decisions regarding university admission and scholarship award offers (Arenson, 2006; Beller, 2001; Löfgren, 2005; Morgan & Michaelides, 2005). With the significant increase in applicants in recent decades, universities place a growing dependence on standardized reasoning ability examination scores, particularly when awarding schol- arships, and often setting flat cut-off points on test scores to determine who is eligible for scholarship funding (Arenson, 2006; Morgan & Michaelides, 2005). Unlike content-specific examinations (such as A-level, Advanced Placement, or International Bacca- laureate examinations) on which student performance can, and should, be improved through revision of examination content, reasoning-ability examinations are designed to measure general aptitude and skill, qualities that are allegedly uncoachable. The decision to include such examinations in the admissions process was meant to reduce inequalities in the admissions process (Beller, 2001). Coaching for these examina- tions, however, may have threatened the possibility of such an outcome. The implications for social welfare are of concern for universities in many countries grap- pling with equal opportunities issues. Coaching, defined as “instructions given in pre- paration for taking a test that are designed to elicit maximum performance by the coached examinee” (Cole, 1982: 391) and “systematic test preparation for a group of students that involves . . . content review, item drill and practice, and an emphasis on specific test-taking and general test-wiseness” (Briggs, 2004: 7) emerged almost simultaneously with the introduction of university entrance examinations in the USA in the late 1940s and flourished from the start. Coaching delivery has assumed many different forms – book- based, computer-based, classroom-based, and one- on-one tutorials – with each method gaining its own champions and critics. Many coaching enterprises DOI: 10.1111/j.1468-2397.2011.00812.x Int J Soc Welfare 2012: 21: 3–12 INTERNATIONAL JOURNAL OF SOCIAL WELFARE ISSN 1369-6866 Int J Soc Welfare 2012: 21: 3–12 © 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare. Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA 3

Upload: paul-montgomery

Post on 14-Jul-2016

241 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

Systematic Literature ReviewsSystematic reviews of the effectsof preparatory courses onuniversity entrance examinationsin high school-age studentsijsw_812 3..12

Montgomery P, Lilly J. Systematic reviews of the effects ofpreparatory courses on university entrance examinations inhigh school-age students

This systematic review examines the effects of coaching onScholastic Assessment Test (SAT) scores and, ultimately, uni-versity entrance. We show a significant effect in score improve-ment for coached students over their uncoached peers in boththe Math and Verbal subtests of the examination. This review’sfindings indicate treatment group gains over control group of23.5 points on the Verbal subtest and 32.7 points on the Mathsubtest, for a combined score of nearly triple that which waspreviously assumed. As long as coaching remains inaccessibleto some students, we urge universities to reconsider the weightgiven to SAT scores in the undergraduate admissions process.We challenge the designers of the SAT to redesign the exami-nation to eliminate the possibility of score gains from coaching.Finally, we call for researchers to increase the production ofhigh-quality data in this field to ensure accurate estimates ofcoaching’s effects are made available to all.

Paul Montgomery, Jane LillyCentre for Evidence-Based Intervention, Oxford University,UK

Key words: prep courses, SAT, university entrance, examina-tion tests, systematic review, United States of America

Paul Montgomery, Centre for Evidence-Based Intervention,Oxford University Barnett House, 32 Wellington Square,Oxford, OX1 2ER, UKE-mail: [email protected]

Accepted for publication April 9, 2011

Background

University entrance examination scores play an impor-tant and often understated role in the college admissionsprocess, regularly influencing decisions regardinguniversity admission and scholarship award offers(Arenson, 2006; Beller, 2001; Löfgren, 2005; Morgan& Michaelides, 2005). With the significant increasein applicants in recent decades, universities place agrowing dependence on standardized reasoning abilityexamination scores, particularly when awarding schol-arships, and often setting flat cut-off points on test scoresto determine who is eligible for scholarship funding(Arenson, 2006; Morgan & Michaelides, 2005).

Unlike content-specific examinations (such asA-level, Advanced Placement, or International Bacca-laureate examinations) on which student performancecan, and should, be improved through revision ofexamination content, reasoning-ability examinationsare designed to measure general aptitude and skill,qualities that are allegedly uncoachable. The decision to

include such examinations in the admissions processwas meant to reduce inequalities in the admissionsprocess (Beller, 2001). Coaching for these examina-tions, however, may have threatened the possibility ofsuch an outcome. The implications for social welfareare of concern for universities in many countries grap-pling with equal opportunities issues.

Coaching, defined as “instructions given in pre-paration for taking a test that are designed to elicitmaximum performance by the coached examinee”(Cole, 1982: 391) and “systematic test preparation fora group of students that involves . . . content review,item drill and practice, and an emphasis on specifictest-taking and general test-wiseness” (Briggs, 2004: 7)emerged almost simultaneously with the introductionof university entrance examinations in the USA in thelate 1940s and flourished from the start. Coachingdelivery has assumed many different forms – book-based, computer-based, classroom-based, and one-on-one tutorials – with each method gaining its ownchampions and critics. Many coaching enterprises

DOI: 10.1111/j.1468-2397.2011.00812.xInt J Soc Welfare 2012: 21: 3–12

INTERNATIONALJ O U R NA L O F

SOCIAL WELFAREISSN 1369-6866

Int J Soc Welfare 2012: 21: 3–12© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare.Published by Blackwell Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA 3

Page 2: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

are confident enough to guarantee significant scoreimprovements on the Scholastic Assessment Test(SAT), ranging up to more than 130 points or 8.3percent, from pre-test to post-test (Powers, 1993).

However, critics – particularly those employed bythe College Board, the nonprofit organization thatdevelops the SAT – and other test-designing agenciesremain skeptical. They claim that effects from even themost successful coaching companies are marginal atbest (Allalouf & Ben-Shakhar, 1998; Löfgren, 2005;Powers, 1993, Powers & Rock, 1999).

Most experts agree that score gains may reflect testpractice (i.e., simply having taken the test before),growth in the abilities measured by the test, and mea-surement error (Powers, 1993), but whether or notcoaching can make a significant contribution is underheated debate. The test producers (the National Instituteof Testing and Evaluation [NITE] in Israel, the SwedishNational Agency for Higher Education in Sweden,the College Board in the USA, etc.) defend the relative“un-coachability” of their examinations. While theyconcede that the longer the coaching program thegreater the score improvement (Powers, 1993), testmakers stand strongly behind their belief that theexaminations measure verbal reasoning and mathe-matical skills which develop gradually over many yearsfrom both school and nonschool experiences (Messick& Jungeblut, 1981), and are therefore impossibleto coach effectively over a short period of time. Thecoaching industry acknowledges that certain studentswill benefit more from coaching than others dependingon their initial test scores and general ability, but main-tains its claim that many students can achieve notablescore improvements. Influential parties in the industryassert that test scores can be raised by test familiari-zation, practice, instruction in test-taking strategy,and highly focused content teaching (Kulik,Bangert-Drowns, & Kulik, 1984), and these elementsmake up the core of most coaching courses.

The five existing meta-analyses on this subject,all produced between 1980 and 1990 (Becker, 1990;DerSimonian & Laird, 1983; Kulik et al., 1984;Messick & Jungeblut, 1981; Slack & Porter, 1980),have provided an important foundation in the studyof the effects of coaching on university entranceexamination scores. Unfortunately, the inclusion ofuncontrolled studies, poorly designed studies, poorquality outcome measures, questionable comparisongroups, and weak statistical comparisons in everymeta-analysis has only further clouded the body ofresearch in the field.

Objective

This systematic review builds on and clarifies existingresearch to provide a more reliable estimation of the

true effects of coaching on score improvement anduniversity entrance.

Selection criteria

Study types. Only randomized controlled trials andquasi-randomized controlled trials were eligible forthe review.

Population. Participants must have been secondaryschool-aged students (ages 13–19).

Intervention. The coaching had to have been deliv-ered in a commercial classroom-style setting, school-based program, online, one-to-one tutorial, or otherappropriate setting.

Outcome. The study must have reported interventioneffects in quantitative terms and must have includeddata for the end point (post-intervention) test scoresfor both experimental and control groups. Secondaryoutcomes of interest from the students’ perspectivewere considered where reported, including cost andconsumer satisfaction.

Data collection and analysis

The second author ran all electronic searches initiallyin May 2007 and directly contacted all of the leadingscholars in the field who could be located. The elec-tronic searches were updated by the first authorin October 2010. The electronic literature searchesyielded 2,219 titles, of which 311 abstracts wereobtained. The full texts of 14 studies were then ana-lyzed, yielding ten studies qualifying for the review.

All of the data used in the meta-analysis measuredscore gain on the same numerical scale and did notrequire standardization, so a weighted mean differencemodel was utilized in this review. A random effectsmodel was also employed in this analysis, as weassumed that, because of the significant variationbetween methodological characteristics of the coachingcourses, the effects being estimated in the differentstudies were not identical but followed some distribu-tion (Green & Higgins, 2011). This model is well suitedto analyses in which the heterogeneity between studiescannot be readily explained and renders heterogeneityas no longer an issue (Green & Higgins, 2011).

Search strategy for identification of studies

A thorough search for all published and unpublishedempirical studies that met the eligibility criteria forinclusion in this review was performed, firstly in May2007, and updated in October 2010. The electronicdatabases included Educational Resources Information

Montgomery & Lilly

4Int J Soc Welfare 2012: 21: 3–12

© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare

Page 3: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

Center, PsycINFO, Comprehensive DissertationAbstracts, Sociological Abstracts, Social ServicesAbstracts, ASSIA, and the Cochrane Collaborationtrials registers, as well as hand searches of selectededucation journals. The reviewers also made personalcontact with leaders in the field.

The search strategy used three classes of searchterms derived from the controlled vocabulary used toindex articles for databases: intervention, population,and study design. The main search terms related tocoaching, examinations, SAT, students, randomizedcontrolled trials, and comparative studies. Wildcardcharacters were used to identify variants of words(e.g., adolescen$ to find adolescence, adolescent,and adolescents).

Much of the literature in this field is “grey literature”and is often difficult to find through traditional databasesearches. To address this issue, the reviewers contactedleading experts in the field directly to access any otherrelevant information. Reviewers also contacted thestandardized test providers and commercial test pre-paration institutions to request any studies of whichthey were aware. Additionally, a search for eligiblestudies published in languages other than Englishand not located in the general database searcheswas performed.

After identifying a number of studies of which thereviewers were already aware through their initialsearch, they performed secondary searches in eachdatabase for other studies with similar subject classifi-cations. The bibliographies of previous meta-analyses,literature reviews, and articles located through thesedatabase searches provided a second source of studiesfor the review. Follow-up searches on the first andsecond authors of all eligible studies and cited refer-ence searches of eligible articles were also conducted.

Methods of the review

The electronic literature searches yielded a total of2,219 titles for consideration. A total of 1,908 studieswere immediately disqualified as irrelevant. Of theremaining 311 studies, the authors identified 14 full-text studies for further review. Both reviewers made thefinal decision of eligibility through a reading of the fulltext of each document, resolving any disagreementsthrough discussion and a search for further infor-mation. Ultimately, ten studies were included in thereview. Seven of the studies included the means, stan-dard deviations, and number of participants in eachstudy’s treatment groups, and thus could be includedin this meta-analysis. Two of the studies (Hopmeier,1984; Zuman, 1988) were missing standard deviations.Despite multiple attempts to acquire this missing infor-mation from the authors or their affiliated institutions,the relevant data was not available and thus the studies

were excluded from the meta-analysis. The finalexcluded study (McClain, 1999) was missing data onboth means and standard deviations, which neither theauthor nor his affiliated institution was able to supply.The results of these latter three studies were stillconsidered, but only in the narrative analysis anddiscussion of the findings.

Description of studies

The individual descriptions of the ten identified studies,along with their major limitations, are included belowto facilitate interpretation of the review findings. Thestudies described assess the effectiveness of coachingon SAT performance. As previously specified in theinclusion criteria, all ten studies were randomized andquasi-randomized trials. Overall, eight of the ten trialswere set in high schools. The remaining two trials wereset in a test preparation clinic and an at-risk program.Four of the trials included between three and 14 highschools. With the exception of two California-basedtrials, all of the trials were set in the Eastern UnitedStates, primarily along the East Coast. One of the trialslooked singularly at Math training and outcomes; twoother trials were based on Verbal coaching and perfor-mance. Furthermore, seven of the studies examinedclassroom-based coaching (with one study includ-ing some computer practice), and three evaluatedcomputer-based coaching only. The duration of eachcoaching intervention varied from between 4–45 hours.Finally, of the ten included trials, six studies reported atleast one significant outcome of coaching on SAT per-formance; however, the interpretation of these studiesneeds to be performed carefully as substantial limi-tations regarding participants, attrition, outcome mea-sures, analysis, and reporting, were prevalent acrossall included studies.

Roberts and Oppenheim (1966)

This study, the earliest randomized study of coachingeffects on SAT scores, focused on students from 14high schools in urban and rural Tennessee. Each ofthese high schools was classified as individual subcat-egories and is listed as A–N in Figure 1. Schools wererandomly selected to coach for either the SAT-Verbal(schools A–F) or SAT-Math (schools G–N) subtest.Students within the schools were then randomized intoexperiment groups, which received a classroom-basedcoaching intervention (n = 188, SAT-Math; n = 154,SAT-Verbal), and control groups (n = 122, SAT-Math; n= 111, SAT-Verbal), which were put on a wait-list forthe intervention. The coaching program consisted of 15sessions for a total of 7.5 hours. The results showed noeffect from the intervention. This was one of the stron-ger studies in the cohort, but still has some limitations,

Systemic Review

Int J Soc Welfare 2012: 21: 3–12© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare 5

Page 4: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

including: (i) the use of only the “Practice SAT” forpre- and post-intervention skill measures, which is ashorter examination and is therefore not as accurate ameasure; and (ii) the sizes of some of the examinationand control groups at individual schools were quitesmall, which could have countered the bias-reducingeffects of randomization.

Evans and Pike (1973)

This study recruited 12 public high schools in NewJersey, Ohio, and Pennsylvania to implement a school-based coaching program for the Math subtest of theSAT. Eleventh-grade student volunteers were random-ized within their schools and received special trainingon one of three different types of Math questions(classified as individual subcategories and listed DS(data sufficiency), QC (quantitative comparisons), andRM (regular mathematics) in Figure 1) common tothe SAT (n = 337) or were put on the course wait-list(n = 165).The course consisted of seven 3-hour Saturdaysessions; students were tested before and after the

intervention. The authors concluded that of the average55-point improvement by all participating pupils, 25points were due to coaching. The study has a numberof limitations, including: (i) volunteers were screenedfor Mathematical ability and high- and low-performerswere automatically excluded from the study, both ofwhich affect the generalizability of the results; and (ii)pupils were only tested on the specific type of test itemfor which they were trained, which could significantlyaffect the validity of the results and their relevance toprojected score change on the Math subtest as a whole.

Alderman and Powers (1980)

This study included eight high schools, both public andprivate, located in seven different New England states.Each of these eight high schools was classified as indi-vidual subcategories and are listed A–H in Figure 1.Each of these high schools already had an SAT-Verbalcoaching program in place and was able to randomizestudents into those courses, with a wait-list controlgroup at each school. The courses varied significantly

Figure 1. Commercial preparatory courses effect on university admissions exams for high school-age students.

Montgomery & Lilly

6Int J Soc Welfare 2012: 21: 3–12

© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare

Page 5: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

from school to school – program duration ranged from5 to 45 hours – and the success of the courses rangedsignificantly across schools. The authors concluded thatthe programs had little impact on total test scores, andregression estimates of the effect of the courses on theSAT-Verbal subtest ranged from -3 to +28 points, witha weighted average of +8 points. The primary limita-tions of this study are that course duration and designvaried considerably between schools and are knownto significantly contribute to the effect of a program(Kulik, et al., 1984), but they were not subgrouped inthe analysis.

Hopmeier (1984)

Hopmeier’s population consisted of 93 ninth, tenth, and11th graders from a high school in Santa Rosa County,Florida. There were two treatment groups: one in whichstudents used the computer program individually (n =31), and the other in which students used the program ingroups of three or four (n = 30). The computer program,Computer SAT (Harcourt Brace Jovanovich SAT prepa-ration program), consisted of Verbal and Mathematicssections designed to improve SAT scores (Hopmeier,1984). Treatment group participants completed theprogram once during a 2-week period. Control groupstudents (n = 32) received no intervention. The resultsindicated that both treatment groups performed signi-ficantly better than the control group on the post-intervention SAT-Math subtest and that the groupstudy-treatment population performed significantlybetter than the control group on the post-interventionSAT-Verbal subtest; the total SAT score using computercoaching was improved by 13.8 percent when studentsworked in small groups (Hopmeier, 1984). The primarylimitation of this study is the population, which wasdrawn from the geometry classes of one high school.Because of the timing of the study, only the A-gradestudents were able to participate, limiting the popula-tion’s generalizability. The results of this study couldnot be included in the meta-analysis because of missinginformation that could not be imputed from theavailable data or obtained from the author or hisaffiliated institution.

Johnson (1984)

This study is an evaluation of the National Associationfor the Advancement of Colored People’s SAT testpreparation clinic. Students in the San Francisco popu-lation (the only site where students were randomized)were randomly assorted to treatment (n = 35) and wait-list (n = 35) groups. The coaching program consistedof bi-weekly sessions over a 6-week period for a totalof 30 hours of coaching. The results indicated thatthe coaching program was effective. Unfortunately, the

results of this study are negated by a number of majorlimitations. The overarching problem is the size oftreatment and control groups: while each started with35 pupils, only 23 experimental group members and 12control group members completed all of the require-ments for the study and were included in the analysis.The attrition itself raises a number of concerns aboutthe desirability and practicality of the course, but moreimportantly, it threatens the validity of the results alto-gether because the equalizing effects of randomizationwere most likely lost with the withdrawal of half of thestudy population from the experiment.

Laschewer (1986)

Laschewer’s study population was drawn from oneprivate Catholic high school in upstate New York. Stu-dents randomized into the experimental group (n = 27)were offered up to 22 hours of coaching, averaging 8.9hours of exposure to a SAT computer-assisted instruc-tion program designed by the author. Students in thecontrol group (n = 29) were wait-listed for the sametreatment. The results of the study showed no effect forthe program. The study suffered from high rates ofattrition (roughly 30% population loss), a particularlyserious issue given that each of the treatment groupsstarted with only 20 participants. This attrition, whichleft all groups with only 15 or fewer students at the endof the experiment, endangers the validity of the results,as the effects of randomization are likely to have beennullified (similar to problems in the study by Johnson[1984] above).

Zuman (1988)

The randomized population in Zuman’s study con-sisted of 48 low-income students from Brooklyn andManhattan who had been participating in a programdesigned to help at-risk students stay in school. Thestudents were randomized into treatment (n = 16) andwait-list groups (n = 17) for a nine-session, 27-hourcoaching course for the SAT designed from EducationalTesting Service (ETS) examination preparation materi-als. Results showed no significant effect of coaching onVerbal scores, and a significant 57.0-point effect esti-mate of coaching on Math scores (p < 0.001) (Zuman,1988). This study has a number of major limitations,most notably the attrition of the population from apretest n = 48 to a posttest n = 33. Additionally, theauthor noted that attendance among those who com-pleted the experiment was poor and that teachers werenot adequately prepared to work with students withlimited Verbal and advanced Math skills, and little or nofamiliarity with SAT concepts. These limitations com-plicate the interpretation of results and the ability toattribute results to the program itself. We were not able to

Systemic Review

Int J Soc Welfare 2012: 21: 3–12© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare 7

Page 6: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

include the data from this study in the meta-analysisbecause of missing information that could not be im-puted from given data or data recovered from the author.

Shaw (1992)

The pupils in this study were 11th- and 12th-gradestudents from three high schools in the Norwalk-LaMirada school district in California. Pupils receivingthe intervention (n = 61) attended a 1-day, 8-hour,school-based SAT coaching workshop that addressedboth specific content and general test-taking skills. Par-ticipants randomized into the control group (n = 61)received no treatment. The results indicated that therewas no significant difference in scoring results betweenstudents who received the treatment of coaching andthose who did not receive the treatment. The limitationsof this study include: (i) students who indicated theyhad previously participated in a SAT preparation work-shop or taken the SAT were deleted from the study,limiting the generalizability of results; and (ii) theworkshop was designed by the administrators from oneparticipating high school and had not been externallyvalidated.

Holmes and Keffer (1995)

This study tested the effect of computer-based drillingof Latin and Greek root words on Verbal SAT scores.The students, recruited from a high school in northeastGeorgia, were from all fourth graders and had anaverage age of 15.5. The computer program contained atotal of 101 Latin and Greek roots, which students wererequired to properly define at each sitting, as well as800 English derivatives of the root words. Students inthe treatment group (n = 34) could use the computerprogram up to two times per week over a 6-week period,averaging a total of 8 hours with the program. Studentsin the control group (n = 36) received no treatment. Theresults showed a significant main effect from the treat-ment (p < 0.03), with the treatment group mean beingroughly 40 points higher than the control group mean.This study suffered from high attrition rates, in bothtreatment and control groups, losing almost 40 percentof the study population before the conclusion of thestudy. This degree of attrition severely limits the valid-ity of the results.

McClain (1999)

McClain’s study focused on high school students froma single school in Maryland and examined the effectsof two different computer-based coaching courses onSAT scores. Students randomized into treatment groupsreceived 26 hours of exposure to either the Stanford orDavidson computer programs over a 9-week period.

The control group received no intervention. The resultsshowed that both the Stanford and Davidson softwareproduced statistically significant higher mean SATscores in the treatment groups over the control group.Two major weaknesses of this study are: (i) only oneschool was recruited for the study despite the author’smany attempts to recruit others, which limited thepotential study population and its relevance to a widerpopulation of SAT takers; and (ii) the study populationwas entirely African-American, which enables impor-tant study of the effects of computer-based coaching onthat specific population, but further limits the general-izability of the results. Additionally, the author failed toreport important data, including group means and stan-dard deviations, which we could neither recover fromthe author nor from his university, preventing inclusionof this study in the meta-analysis.

Results

Overall effects

A meta-analysis of all qualified studies confirmed whata number of previous meta-analyses have posited:coaching for the SAT does significantly improve treat-ment group scores over control group scores (Becker,1990; DerSimonian & Laird, 1983; Kulik, et al., 1984;Messick & Jungeblut, 1981). One study concluded thataverage score gains of treatment groups over controlgroups were 10.1 and 9.8 points on the Verbal and Mathsubtests, respectively (DerSimonian & Laird, 1983).The results of the current meta-analysis show treatmentgroup gains over control of 23.5 points on the Verbalsubtest and 32.7 points on the Math subtest, for a com-bined score gain of 56.2 points, nearly triple that pre-viously assumed. While these effects might not seemas great as the coaching companies often claim, theynonetheless show a significant improvement overuncoached pupils and could easily determine the dif-ference between acceptance and rejection for a collegeapplicant or scholarship candidate (Arenson, 2006;Morgan & Michaelides, 2005). The three studies notincluded in the meta-analysis had similar positive find-ings for coaching effects.

The test for heterogeneity suggested that selectionbias (publication bias) may have been present in thismeta-analysis. An investigation of the overall effects forthe SAT-Verbal subtest showed three groups of studies:(i) studies with effects relatively similar to the overallaverage; (ii) studies with relatively lower effects; and(iii) studies with relatively higher effects. Three of theAlderman and Powers’s (1980) studies (C, D, & F)showed relatively lower effects, all with means favoringcontrol over treatment. Johnson’s results are skewed inthe opposite direction, showing treatment effects fargreater than that of the overall mean. These deviant

Montgomery & Lilly

8Int J Soc Welfare 2012: 21: 3–12

© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare

Page 7: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

studies represented 237 of the total 1,107 subjects inthe analysis. The heterogeneity was more serious withregard to the studies in the SAT-Math subtest analysis,as there were more participants represented at extremeson both sides of the overall average. While there were afew small studies favoring control over treatment, itappeared that the primary cause for this heterogeneitywas the DS-Math group of the Evans and Pike (1973)study, which consisted of a very large number ofparticipants (335 of 1,355 total subjects) and postedsignificantly higher treatment effects than the otherstudies. One study noted that heterogeneity can alsoresult from differences between trials in terms of meth-odological factors, such as the use of blinding andconcealment of allocation, or if there are differencesbetween trials in the way outcomes are defined andmeasured (Green & Higgins, 2011). As this could bethe case, given the varied nature of the interventionsincluded in the meta-analysis, a random effects modelwas implemented to help control for the heterogeneitybetween studies in this regard.

Duration of coaching

Both short coaching programs (8 hours or less) andlong (more than 8 hours) posted effects close to themean of the overall coaching effects for the Verbalsubtest. This result is in line with previous research thatdeemed the Verbal subtest less “coachable,” with mostcoached students posting only marginal gains regard-less of coaching duration. A comparison of the effectsof long and short coaching programs for the Mathsubtest showed that pupils in long coaching programsposted significantly higher gains over their peers, aver-aging three times the point increase of pupils in shortcoaching programs. It was suggested that improvementof SAT-Math scores was a function of the time andeffort expended and that coaching had a greater relativeimpact on Math scores than on Verbal (Messick &Jungeblut, 1981).

Coaching method

All three coaching methods – school-based, commer-cial classroom-based, and online – yielded significantlyhigher average scores for coached students than fortheir uncoached peers. A subgroup analysis of the threedifferent coaching methods utilized in the includedstudies did not yield any significant differencesbetween methods in effect on test scores.

Socioeconomic status (SES)

Subgrouping by SES showed that all socioeconomicsubgroups benefited from coaching. While the meta-analysis appears to show that low-SES students

received particularly high benefits from coaching in theVerbal subtest of the SAT, the only study included inthis subgroup of the meta-analysis was one which hasbeen shown to have significant flaws (Johnson, 1984).One study, which also looked at the effects of coachingon low-income students, found no significant improve-ment for coached students over control (Zuman, 1988).Zuman’s study did find a significant 57.0-pointimprovement in coached students’ Math subtest scoresover control, which is in line with Johnson’s findingsand may indicate that low-SES students could benefitmore than other students from coaching for the Mathsubtest of the SAT, although future studies of a higherquality are needed to confirm this result. Overall, SESdid not appear to have a significant impact on studentbenefit from coaching, as coached groups of all socio-economic levels seemed to show significant gains overtheir uncoached peers. As students from all socio-economic backgrounds do not have equal access tocoaching opportunities, this result should highlight theinherent inequities of coaching in favor of studentsfrom higher-income families.

Testing setting

Only one study included in the meta-analysis utilizedofficial administrations of the SAT for its pre- and post-test measurements. An unbiased comparison betweenthis study and those that employed artificial adminis-trations could not be performed. While the comparisonshowed no effect, one should still seriously consider theconcerns raised by other authors about the potentiallyconfounding effects of an artificial test administrationfor data collection purposes.

College Board affiliation (vested interest)

Analysis showed that College Board affiliation had nosignificant impact on the effect of coaching reported inthe studies. Previous authors have questioned the accu-racy of publications by authors with a vested interest (inthis case, proving that coaching does not significantlyaffect test scores to validate the examination) (Becker,1990; Slack & Porter, 1980). This subgroup analysisdid not indicate any such effects.

Discussion

During the last 30 years, the coaching industry forstandardized university entrance examinations, particu-larly the SAT in the United States, has become a multi-billion dollar industry. While Sesnowitz, Bernhardt,and Knain (1982) estimated that some 50,000 studentsin the United States spends approximately $10,000,000annually on commercial coaching for standardizedexaminations in 1980, profits for commercial test

Systemic Review

Int J Soc Welfare 2012: 21: 3–12© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare 9

Page 8: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

preparation courses rose to a record high of$726,000,000 in the year 2005, up 25 percent from2001 (Freedman, 2006). In addition to commercialcoaching options, students can access a wide selectionof do-it-yourself study guides ranging from practiceexaminations to computer-based courses. Many highschools have developed their own pro-bono coachingcourses for students who could benefit from coachingbut might not have the funds or the opportunity to seeka course elsewhere. Clearly, coaching has become acentral focus of SAT test preparation.

At the core of the coaching debate rests the issueof equality, and with it, the associated social welfareimplications, for any situation in which some examin-ees can attain an advantage by attending expensivecoaching programs runs counter to standardized test-ing’s traditional goal of promoting opportunities forthe most capable regardless of economic background(Powers, 1985). Unfortunately, the rate at which newtest preparation resources are developed and marketedappears far greater than the rate at which adequateinformation is generated about the effectiveness ofthese offerings (Powers, 1993); therefore, the all-important questions regarding test equality and coach-ing effectiveness remain unanswered. This reviewshould provide much needed clarity on the issue ofcoaching effectiveness, but given the lack of data on thesubject, its generalizability will, at least temporarily,remain limited.

The results of this review confirm the findings ofprevious meta-analyses, demonstrating that coachingfor the SAT does lead to a significant improvement inscores over uncoached peers. The inclusion of onlyrandomized controlled trials – the highest qualityresearch design – in this review ensures that our dataprovide the most reliable assessment of coachingeffects to date. Surprisingly, our results indicate an evengreater effect of coaching on the improvement of SATscores than previous research suggested. These resultsshould lead university admissions committees, highschool faculty, parents, and students to place in signifi-cant doubt the use of the SAT as a measure of innatestudent ability. As long as coaching comes with a pricetag attached and is unavailable as an option for manystudents, SAT results should be seen as a biasedmeasure to be considered solely within the context of astudent’s personal background, specifically his or herfamily’s SES.

The primary limitations of this review all stemfrom the state of research in this field. Only four of thestudies eligible for this review were conducted in thelast 20 years. Within that time frame, the SAT has beenreorganized twice, including the addition of a thirdsubtest to the examination. One of the primary goals ofthe most recent changes to the SAT was to eliminatetest questions that might be easily coached, which

resulted in the deletion of analogies, among otherquestion formats, from the examination. These changesto the existing examination, and the addition of thewriting subtest, may have reduced the examination’scoachability. At the same time, technology has signifi-cantly advanced, particularly with regard to computers,resulting in the introduction of personal computers intomany homes across the world and the advancement ofcomputer-based coaching programs far beyond any ofthe programs included in this review. These new pro-grams, which can better identify an individual’s weak-nesses and adapt to address the user’s specific needs,are likely to post even greater score improvements thanthose computer programs evaluated in this review.

Reviewers’ conclusions

Recommendations to students, parents and highschool faculty

Students need to be aware of the test score benefits thatcome from familiarization with the format and contentof the SAT and other university entrance examinations.While students might not be financially able to enrollin a commercial coaching program or a programoffered through school, they should know that, for bothcoached and uncoached students, the greatest scoreincrease generally occurs between the first and secondsitting of an examination (Beller, 2001; Slack, 1980).Students should be encouraged to take the examinationmore than once, regardless of their initial test results. Aspectrum of coaching materials and programs, rangingin cost from a few dollars to over $2,000, are available,but students should know that computer-based coach-ing programs demonstrate equally positive results ascommercial classroom-based courses. Online courses,which have significantly lower costs than classroom-based courses, could provide a cheaper and similarlyeffective coaching opportunity. Books and other mate-rials that familiarize the pupil with the test formatcould also show positive effects at a fraction of thecost of commercial courses. Students are cautioned,though, to remember that standardized test scores areonly one factor considered in the university admissionsprocess and that it is important to maintain a balancebetween test preparation and other enriching highschool activities.

Recommendations to university admissions officers

The admissions officers at universities, particularlythose utilizing score thresholds in granting admissionor scholarships (as mentioned in Arenson, 2006), areurged to review and redesign these policies in the lightof this new evidence. Policies that penalize (althoughinadvertently) students who are not able to afford

Montgomery & Lilly

10Int J Soc Welfare 2012: 21: 3–12

© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare

Page 9: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

coaching for any high-stakes examination go againstthe core principles of equality of scholarship awardsand university admission, and should be eliminatedwhen evidence of this bias arises. While these testscores provide an important standardized measure bywhich all applicants may be compared, they shouldbe considered within the context of the student’sbackground and other personal factors that might con-tribute to a significant advantage or disadvantage onthe examination.

Recommendations to ETS1 officials

The test designers at ETS need to examine questionformat and content to determine whether or not certainitems can be coached, and alter or eliminate those itemsthat cannot be. If these changes have been made andETS is confident that coaching cannot produce signifi-cantly positive effects, they should provide sufficientfunding to researchers willing to conduct randomizedcontrolled trials to test whether or not this is, in fact, thecase. If the effects of coaching are still found to besignificant, the test should either be relabeled as acontent-specific examination, for which study and revi-sion can improve scores, or not used at all if one isseeking an unbiased measure of an individual’s innateacademic ability.

Implications for research

Recommendations to researchers

Research in this field desperately requires updating.Many well-designed and implemented randomizedcontrolled trials using large study populations are stillnecessary before the true effects of coaching can beconfirmed. Additionally, study of coaching effects onother non-SAT standardized university examinationsmeeting these same standards of quality should bepursued. The designers of these examinations, coach-ing companies, and universities all have a vestedinterest in uncovering the true effects of coaching onuniversity entrance examination scores and could beapproached as partners in the production of more evi-dence on this subject. The need for such researchcannot be overemphasized.

Additionally, researchers should expand theiroutcome criteria to include more variables than scorechange alone. Consumer satisfaction, university accep-tance (to school of choice or in general), and cost ofcoaching are all important factors in the consumer’sdecision to participate in a coaching program. An

exploration of the long-term effects of coaching on astudent’s performance could also provide importantinsights into the lasting effects, if any, of coaching.Examining how coached students perform comparedwith their uncoached peers once they have entereduniversity – how well they perform in classes, whetherthey complete their degrees, what kinds of degrees theyobtain – could reveal some long-term benefits of coach-ing, an investment that has traditionally been consid-ered of only short-term value.

Acknowledgement

We are most grateful to Annie Shrier for her consider-able help with the many revisions of this article.

References

Alderman DL, Powers DE (1980). The Effects of Special Prepa-ration on Sat-Verbal Scores. American Educational ResearchJournal 17: 239–253.

Allalouf A, Ben-Shakhar G (1998). The Effect of Coaching onthe Predictive Validity of Scholastic Aptitude Tests. Journalof Educational Measurement 35: 31–47.

Arenson KW (2006). Colleges Say SAT Mistakes May AffectScholarships. The New York Times online version.

Becker BJ (1990). Coaching for the Scholastic Aptitude Test:Further Synthesis and Appraisal. Review of EducationalResearch 60: 373–417.

Beller M (2001). Admission to Higher Education in Israel andthe Role of the Psychometric Entrance Test: Educationaland Political Dilemmas. Assessment in Education 8: 315–337.

Briggs DC (2004). What can meta-analysis accomplish? A casestudy. University of Colorado, Boulder, pp. 1–47.

Cole N (1982). The Implications of Coaching for Ability Testing.In: Wigdor AK, Garner WR, eds. Ability Testing: Uses, Con-sequences and Controversies, pp. 389–414. Washington, DC,National Academy Press.

DerSimonian R, Laird NM (1983). Evaluating the Effect ofCoaching on SAT Scores: A Meta-Analysis. HarvardEducational Review 53: 1–15.

Evans FR, Pike W (1973). The Effects of Instruction forThree Mathematics Item Formats. Journal of EducationalMeasurement 10: 257–272.

Freedman SG (2006). In College Entrance Frenzy, a Lesson Outof Left Field. The New York Times, p. 30.

Green S, Higgins J, eds. (2011). Cochrane Handbook forSystematic Reviews of Interventions, 5.1.0 section 9.5.4.Available at http://www.cochrane-handbook.org [Lastaccessed 18-04-11].

Holmes CT, Keffer R (1995). A Computerized Method to TeachLatin and Greek Root Words: Effect on Verbal SAT Scores.Journal of Educational Research 89: 47–50.

Hopmeier GH (1984). The Effectiveness of ComputerizedCoaching for Scholastic Aptitude Test in Individual andGroup Modes. Doctoral dissertation, Florida State Univer-sity, pp. 1–69.

Johnson ST (1984). Preparing Black Students for the SAT –Does It Make a Difference? (An Evaluation Report ofthe NAACP Test Preparation Project). New York, NationalAssociation for the Advancement for Colored People.

Kulik JA, Bangert-Drowns RL, Kulik CC (1984). Effectivenessof Coaching for Aptitude Tests. Psychological Bulletin 95:179–188.

1 ETS is a private, nonprofit organization involved in the devel-opment of testing and assessment instruments in the fieldof education.

Systemic Review

Int J Soc Welfare 2012: 21: 3–12© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare 11

Page 10: Systematic reviews of the effects of preparatory courses on university entrance examinations in high school-age students

Laschewer A (1986). The Effect of Computer Assisted Instruc-tion as a Coaching Technique for the Scholastic Aptitude TestPreparation of High School Juniors. Doctoral dissertation,Hofstra University.

Löfgren K (2005). Validation of the Swedish UniversityEntrance System. Umeå Universty pp. 1–30.

McClain B (1999). The impact of computer-assisted coaching onthe elevation of twelfth-grade students’ SAT scores. Doctoraldissertation, Morgan State University.

Messick S, Jungeblut A (1981). Time and Method in Coachingfor the SAT. Psychological Bulletin 89: 191–216.

Morgan DL, Michaelides MP (2005). Setting Cut Scores forCollege Placement. College Board Research Report No.2005-9: 12.

Powers DE (1985). Effect of Coaching on GRE Aptitude TestScores. Journal of Educational Measurement 22: 121–136.

Powers DE (1993). Coaching for the SAT: A Summary of theSummaries and an Update. Journal of Educational Measure-ment 30: 24–30, 39.

Powers DE, Rock DA (1999). Effects of Coaching on SAT I:Reasoning Test Scores. Journal of Educational Measurement36: 93–118.

Roberts SO, Oppenheim DB (1966). The Effect of SpecialInstruction Upon Test Performance of High School Studentsin Tennessee. Princeton, NJ, Educational Testing ServiceRB-66-36.

Sesnowitz M, Bernhardt KL, Knain DM (1982). An Analysisof the Impact of Commercial Test Preparation Courses onSAT Scores. American Educational Research Journal 19:429–441.

Shaw E (1992). The Effects of Short-Term Coaching on theScholastic Aptitude Test. Doctoral dissertation, University ofLa Verne.

Slack WV, Porter D (1980). The Scholastic Aptitude Test:A Critical Appraisal. Harvard Educational Review 50: 154–175.

Zuman JP (1988). The Effectiveness of Special Preparationfor the SAT: An Evaluation of a Commercial CoachingSchool. Paper presented at the annual meeting of theAmerican Educational Research Association, New Orleans.

Montgomery & Lilly

12Int J Soc Welfare 2012: 21: 3–12

© 2011 The Author(s) International Journal of Social Welfare © 2011 Blackwell Publishing Ltd and the International Journal of Social Welfare