the effect of high school socioeconomic status on the predictive validity of sat scores and high...

21
Journal of Educational Measurement Summer 2011, Vol. 48, No. 2, pp. 101–121 The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average Rebecca Zwick Educational Testing Service Igor Himelfarb University of California, Santa Barbara Research has often found that, when high school grades and SAT scores are used to predict first-year college grade-point average (FGPA) via regression analysis, African-American and Latino students, are, on average, predicted to earn higher FGPAs than they actually do. Under various plausible models, this phenomenon can be explained in terms of the unreliability of predictor variables. Attributing overprediction to measurement error, however, is not fully satisfactory: Might the measurement errors in the predictor variables be systematic in part, and could they be reduced? The research hypothesis in the current study was that the overpredic- tion of Latino and African-American performance occurs, at least in part, because these students are more likely than White students to attend high schools with fewer resources. The study provided some support for this hypothesis and showed that the prediction of college grades can be improved using information about high school socioeconomic status. An interesting peripheral finding was that grades provided by students’ high schools were stronger predictors of FGPA than were students’ self-reported high school grades. Correlations between the two types of high school grades (computed for each of 18 colleges) ranged from .59 to .85. The predictive value of the SAT at an undergraduate institution is typically as- sessed by performing a regression analysis in which SAT scores and high school grade-point average (HSGPA) are used to predict first-year college grade-point av- erage (FGPA). Studies of this kind have two related purposes: First, they are used to provide validity evidence for the SAT and may become part of the test validity literature (e.g., Cleary, 1968). Second, they may be used at an institutional level to determine how heavily the SAT should be weighted in determining eligibility for admission (e.g., Geiser & Studley, 2004). Discussions of the findings of such regression analyses tend to focus primarily on the correlations between each of the predictors and FGPA and on the multiple cor- relation (R) obtained by using both HSGPA and SAT scores simultaneously. When evaluated in this way, HSGPA is usually found to be a stronger predictor of FGPA than are SAT scores. This recurrent result has often been invoked as an argument for eliminating or deemphasizing the SAT. For example, in The Case Against the SAT , Crouse and Trusheim (1988) argued that the typical increment in R (or R 2 ) achieved by adding SAT scores to the regression model is so small as to make the SAT useless. 1 Recently, Geiser and Santelices (2007) argued that because of the high Copyright c 2011 by the National Council on Measurement in Education 101

Upload: rebecca-zwick

Post on 22-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Journal of Educational MeasurementSummer 2011, Vol. 48, No. 2, pp. 101–121

The Effect of High School Socioeconomic Statuson the Predictive Validity of SAT Scoresand High School Grade-Point Average

Rebecca ZwickEducational Testing Service

Igor HimelfarbUniversity of California, Santa Barbara

Research has often found that, when high school grades and SAT scores are usedto predict first-year college grade-point average (FGPA) via regression analysis,African-American and Latino students, are, on average, predicted to earn higherFGPAs than they actually do. Under various plausible models, this phenomenoncan be explained in terms of the unreliability of predictor variables. Attributingoverprediction to measurement error, however, is not fully satisfactory: Might themeasurement errors in the predictor variables be systematic in part, and could theybe reduced? The research hypothesis in the current study was that the overpredic-tion of Latino and African-American performance occurs, at least in part, becausethese students are more likely than White students to attend high schools with fewerresources. The study provided some support for this hypothesis and showed that theprediction of college grades can be improved using information about high schoolsocioeconomic status. An interesting peripheral finding was that grades providedby students’ high schools were stronger predictors of FGPA than were students’self-reported high school grades. Correlations between the two types of high schoolgrades (computed for each of 18 colleges) ranged from .59 to .85.

The predictive value of the SAT at an undergraduate institution is typically as-sessed by performing a regression analysis in which SAT scores and high schoolgrade-point average (HSGPA) are used to predict first-year college grade-point av-erage (FGPA). Studies of this kind have two related purposes: First, they are usedto provide validity evidence for the SAT and may become part of the test validityliterature (e.g., Cleary, 1968). Second, they may be used at an institutional level todetermine how heavily the SAT should be weighted in determining eligibility foradmission (e.g., Geiser & Studley, 2004).

Discussions of the findings of such regression analyses tend to focus primarily onthe correlations between each of the predictors and FGPA and on the multiple cor-relation (R) obtained by using both HSGPA and SAT scores simultaneously. Whenevaluated in this way, HSGPA is usually found to be a stronger predictor of FGPAthan are SAT scores. This recurrent result has often been invoked as an argumentfor eliminating or deemphasizing the SAT. For example, in The Case Against theSAT , Crouse and Trusheim (1988) argued that the typical increment in R (or R2)achieved by adding SAT scores to the regression model is so small as to make theSAT useless.1 Recently, Geiser and Santelices (2007) argued that because of the high

Copyright c© 2011 by the National Council on Measurement in Education 101

Page 2: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Zwick and Himelfarb

correlation between HSGPA and college grades (among other reasons), standardizedtests should be deemphasized in college admissions.

The correlational findings, while indisputably important, tell only part of the storyabout the relative utility of HSGPA and SAT scores. Another perspective is obtainedby examining the patterns of prediction errors for key student groups. For each stu-dent, the difference between the predicted FGPA (obtained from the regression) andthe actual FGPA—the residual or prediction error—can be calculated. Although itis a property of ordinary least squares regression that these residuals must sum tozero across all individuals in the analysis, this need not be true within key studentgroups. In fact, it has long been known among psychometricians that, for African-American and Latino students, FGPA tends to be overpredicted by the SAT (e.g.,Cleary, 1968; Linn, 1983). That is, these students are, on average, predicted to earnhigher FGPAs than they actually do. Based on 11 studies in which SAT scores andHSGPA were used to predict FGPA, Young (2004, pp. 293–294) found that, on aver-age, African-American students were overpredicted by .11 (on a 0–4 scale); based oneight studies, Latino students were overpredicted by an average of .08. Identical val-ues were obtained by Mattern et al. (2008, p. 11) for a sample of more than 150,000students who took the SAT in 2006.

An aspect of the ethnic-group overprediction phenomenon that is less well knownis that overprediction tends to be more severe when HSGPA alone is used as a predic-tor than when HSGPA and SAT scores are used in conjunction. For example, Zwickand Schlemer (2004), who studied 1997 and 1998 freshman cohorts at UC SantaBarbara, found that in 1997 Latino students whose best language was not Englishwere, on average, overpredicted by .23 (about a quarter of a grade-point) when HS-GPA alone was used to predict FGPA. This was reduced to .00 when SAT scoreswere included as predictors along with HSGPA. In 1998, an overprediction of .20 forthis group was reduced to .06. Mattern et al. (2008) found similar reductions for stu-dents who identified themselves as African-American or “Hispanic, Latino, or LatinAmerican,” but not for students whose best language was not English. (The studydid not examine ethnicity and language simultaneously.)

There are a number of conjectures about the reasons for these recurrent overpre-diction findings for Black and Latino students (see Zwick, 2002, pp. 117–124). Onetheory is that, in college, minority students do not fulfill their academic potential(which is assumed to be accurately captured by the tests). This “underperformance”could occur because of outright racism or because of a campus environment that isinhospitable to people of color, or it could be related to a greater occurrence amongminority students of life difficulties, including financial problems, that interfere withacademic performance. It has also been hypothesized that anxieties, low aspirations,or negative attitudes may interfere with the academic success of minority students(e.g., Bowen & Bok, 1998, p. 84).

A more technical explanation is that overprediction occurs because both SATscores and high school grades are imprecise measures of academic abilities. Theeffect is simplest to understand in the case of a single test score that is being usedto predict subsequent grade-point average (GPA). Under typical classical test theoryand regression assumptions, the unreliability of the score can be shown to producea regression line that is less steep than the line that would theoretically be obtained

102

Page 3: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Effect of High School SES on the Predictive Validity of SAT Scores

with an error-free predictor. Specifically, if the slope of the regression of GPA onthe true test score is β, the slope of the regression of GPA on the observed test scorewill be ρβ, where ρ is the test reliability (see Snedecor & Cochran, 1967, pp. 164–166). Therefore, to the degree that the test score is affected by measurement error,groups with lower test scores will tend to be overpredicted while those with higherscores will tend to be underpredicted. A somewhat different explanation of the wayin which measurement error leads to overprediction, based on Kelley’s paradox, isoffered by Wainer and Brown (2007).2

Explaining overprediction simply by invoking measurement error, however, is notfully satisfactory: Might the measurement errors in the predictor variables be system-atic in part, and could they be reduced? The research hypothesis in the current studywas that the overprediction of Latino and African-American performance occurs, atleast in part, because these students are more likely than White students to attendhigh schools with fewer financial resources and well-trained teachers. (The otherside of the coin is that White students, who are more likely to attend better highschools, are underpredicted.) Consider a college admissions office that estimates aregression equation for predicting FGPA based on students who come from manydifferent high schools. Because HSGPA distributions tend to be similar across highschools, regardless of school quality (see Zwick & Green, 2007), using HSGPA aloneas a predictor is expected to yield a misleadingly high predicted FGPA for studentsattending low-quality high schools. This overprediction is likely to be mitigated (butnot eliminated) by the inclusion of SAT scores in the prediction equation since SATscores have essentially the same meaning across schools. Under this hypothesis, thetypical pattern of ethnic-group prediction errors occurs in part because ethnicity is,in effect, serving as a rough proxy for high school quality. The main purpose ofthis study was to explore this hypothesis and to investigate ways of improving theprediction of college grades using information about high schools.

Previous efforts to incorporate high school quality in models for predicting collegeperformance are described by Linn (1966), Pike and Saupe (2002), Rothstein (2005),and Young and Johnson (2004). This study differs from past work in that it focuseson the amelioration of prediction biases for ethnic groups and on the development ofan index that would be suitable for use in the admissions process.

Method

To investigate these issues, this study used data provided by the College Board,which funded this research in response to a proposal. The subsequent sections de-scribe the data set and analysis procedures.

Data Set

The file provided by the College Board had been assembled from three sources,described below.

(1) College Board Validity Studies database: As part of its ongoing research onSAT validity, the College Board collected data on students who enrolled in 110undergraduate institutions in 2006. For our research, we requested data from

103

Page 4: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Zwick and Himelfarb

colleges that had at least five feeder high schools with 20 students each. Datafrom 34 public undergraduate institutions, referred to for convenience as “col-leges,” proved to be suitable for our analysis. The data included FGPA, as wellas high school attended. Limited information about the college, such as geo-graphical area and institutional size, was also provided. Seven were from theMidwest region, five from the Middle States region, four from New England,six from the West, eight from the Southwest and four from the South. Two weremedium-to-large (2,000–7,499 students), 13 were large (7,500–14,499 students)and 19 were very large (15,000 or more students). Two of the 34 colleges had ad-mission rates under 50%, 21 had admission rates of 50–75%, and 11 had admis-sion rates exceeding 75%. According to the Carnegie classifications (CarnegieFoundation for the Advancement of Teaching, 2008), 24 of the colleges wereResearch I universities, two were Research II universities, and eight were bac-calaureate/arts and sciences colleges.

(2) SAT data base: The College Board’s data base on SAT-takers was the sourceof information for SAT critical reading (formerly verbal), math, and writingscores; high school attended; and responses to the SAT Questionnaire, whichSAT takers are asked to complete when they register. The questionnaire dataincluded (student-reported) high school GPA (HSGPA-SQ), parent education,and family income. (Some colleges also included high school GPA [HSGPA-C]in their data bases. For these schools, we compared the results obtained usingthe two different sources of high school GPA information, as described below.)

(3) Data on the characteristics of the high schools attended by the students, ob-tained from a College Board contractor: These data included school enrollment,district-level per-pupil expenditure, and ethnic composition. High school datawere available only for public schools; therefore, students who attended privatehigh schools were not included in our study.

The College Board conducted all matching of student records and then removedidentifying information. High schools and colleges were labeled with numericalcodes. The data set transmitted to us contained records for a total of 123,385 stu-dents at the 34 colleges. These students had attended a total of 5,789 high schools.Subsequent exclusion of cases with missing data on key variables reduced the overallsample size to 70,712 students who had attended a total of 5,702 high schools.

Table 1 compares the characteristics of our original data set and the analysis dataset to those of the 2006 college-bound seniors, a data set maintained by the CollegeBoard (2006) that contains the records of all high school graduates in the year 2006who took the SAT through March 2006. (If students took the SAT more than once,only their latest data are included.) Table 1 shows that the analysis data set closelyresembles the original data set in terms of gender, ethnicity, high school grades,SAT scores, language background, and socioeconomic status (SES), suggesting thatdeleting cases with missing data did not substantially affect the distributions of thesekey variables. Students included in the study tend to have higher grades and testscores than the college-bound seniors, which is to be expected since those includedin the study had, by definition, completed at least one year of college. Table 1 also

104

Page 5: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 1Characteristics of Study Data and College-bound Seniors Data

College-boundAnalysis Data Set Original Data Set Seniors 2006 Data

Number of Students 70,712 123,385 1,465,744

Percentage∗ of Students in Each Response CategoryFemale 52.8 51.5 54.0Ethnicity

Native American .5 .6 .7Asian-American 9.8 10.5 10.4African-American 5.9 6.4 11.3Hispanic/Latino 7.2 7.5 11.4White 70.2 72.2 62.1Other 6.3 2.9 4.1

HSGPA-SQA+ 11.7 11.3 6.7A 27.6 27.1 18.3A– 24.7 24.4 18.4B 33.9 34.9 45.6C 2.1 2.3 10.7D, E, or F .1 .1 .3

Class RankTop Tenth 42.7 42.1 31.42nd Tenth 29.8 29.6 25.42nd Fifth 17.6 17.7 20.3Final Three Fifths 10.0 11.3 23.0

First Language SpokenEnglish only 84.3 83.8 77.2English and another 10.2 10.6 13.9Another language 5.5 5.6 8.9

Family IncomeLess than $10,000 1.7 1.8 4.2$10,000–$20,000 3.7 3.9 7.0$20,000–$30,000 5.4 5.3 8.1$30,000-$40,000 7.1 7.3 9.5$40,000–$50,000 7.1 7.0 8.2$50,000–$60,000 8.3 8.3 8.8$60,000–$70,000 8.3 8.1 8.2$70,000–$80,000 9.6 9.5 8.6$80,000–$100,000 17.4 16.9 13.5More than $100,000 31.2 32.3 23.9

Highest Level of Parent EducationNo High School Diploma 1.9 2.0 3.8High School Diploma 22.4 21.7 27.6Associate Degree 7.3 7.1 7.5Bachelor’s Degree 37.0 36.6 26.5Graduate Degree 31.4 32.8 23.9

Mean Values for Key VariablesSAT-CR 550.6 550.9 503.0SAT-M 574.5 573.1 518.0SAT-W 542.2 543.0 497.0∗Table entries give the percentages of students for whom data were available.

105

Page 6: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 2Sample Sizes by College and Ethnic Group

Native Asian- African-Colleges American American American Latino White Other Total

1 3 7 39 13 872 37 9712 10 45 51 23 1,246 85 1,4603 11 218 137 86 1,952 159 2,5634 23 264 130 100 3,522 204 4,2435 6 60 73 14 913 72 1,1386 1 331 68 65 800 100 1,3657 6 29 53 31 988 58 1,1658 7 181 287 74 1,220 134 1,9039 11 137 69 44 1,583 110 1,95410 8 105 134 25 1,703 120 2,09511 3 21 21 27 538 43 65312 5 26 54 26 910 77 1,09813 9 289 116 98 2,175 229 2,91614 6 49 92 82 1,536 113 1,87815 18 39 35 118 1,031 89 1,33016 12 101 19 33 1,089 81 1,33517 12 25 30 68 243 47 42518 19 195 42 82 1,592 153 2,08319 35 1,035 98 146 2,102 297 3,71320 5 607 14 28 209 67 93021 20 148 36 37 1,350 155 1,74622 4 40 178 25 1,616 87 1,95023 3 352 107 58 1,221 112 1,85324 7 248 278 81 2,576 201 3,39125 10 462 377 172 1,380 237 2,63826 7 31 364 31 808 63 1,30427 14 54 59 40 2,031 122 2,32028 30 268 174 640 3,718 288 5,11829 14 129 322 283 1,508 140 2,39630 12 33 96 419 1,379 111 2,05031 15 76 91 215 1,848 133 2,37832 26 1,021 265 975 2,652 350 5,28933 15 307 256 922 1,289 179 2,96834 0 6 3 6 71 7 93

Total 387 6,939 4,168 5,087 49,671 4,460 70,712

Note. “Other” column includes students who gave a response of “other” or gave no ethnic groupinformation.

shows that, compared to the college-bound seniors, students included in the studytend to come from wealthier and more educated families and are less likely to beAfrican-American or Latino. For students in the analysis data set, Table 2 providessample sizes by ethnic group within each college.

106

Page 7: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 3Definition of Key Variables

Variables Descriptions

FGPA First year college GPA (ranging from 0 to 4.27)SAT-CR SAT Critical Reading score (ranging from 200 to 800)SAT-M SAT Math score (ranging from 200 to 800)SAT-W SAT Writing score (ranging from 200 to 800)HSGPA-SQ Self-reported high school GPA (11 categories ranging

from 1 to 4.33)HSGPA-C High school GPA as reported by the college (ranging

from 1.30 to 4.96)Aggregated Father Education High school mean value of (student-reported) father

education (9 categories)Aggregated Mother Education High school mean value of (student-reported) mother

education (9 categories)Poverty District poverty level (1–4 scale, with 1 indicating a

high poverty percentage and 4 indicating a lowpercentage)

High School SES Index Aggregated father education + aggregated mothereducation + poverty

Data Analyses

The next two subsections describe the derivation of our high school SES index andthe regression analyses we conducted for predicting FGPA. Table 3 provides briefdefinitions of the key variables used in our analyses, which are described in furtherdetail below. Table 4 gives the means and standard deviations of these variables foreach ethnic group.

High school SES index. Our original aim was to derive a measure of high schoolquality that would reflect the availability and quality of instructional resources. How-ever, because of limitations inherent in the data, our goals (and hence the meaning ofthe index) had to be modified somewhat. Some variables we hoped to include, suchas teacher-student ratio and instructional expenditures, were simply not available. Inother cases, preliminary analyses showed that seemingly promising variables werenot useful for our purposes. The variables we initially considered for inclusion weredistrict-level per-pupil expenditure, Title 1 funding (yes or no), local income (me-dian income for neighborhood surrounding the school), district poverty level (1–4scale, with 1 indicating a high poverty percentage, and 4 indicating a low povertypercentage), district enrollment, school enrollment, community type (rural, urban,suburban), and ethnic composition of the school (percentages of African-American,Asian-American, Latino, Native American, and White students).

Because of the unavailability of direct measures of school resources, we alsoevaluated the inclusion of several measures of the SES of the students attend-ing the school, which we presumed to be related to the resources available to theschool. Using the data from the SAT Questionnaire, we constructed the following

107

Page 8: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Tabl

e4

Mea

nsan

dSt

anda

rdD

evia

tion

sof

Key

Vari

able

sby

Eth

nic

Gro

up

Nat

ive

Am

eric

anA

sian

-Am

eric

anA

fric

an-A

mer

ican

Lat

ino

Whi

teO

ther

Var

iabl

eM

ean

SDM

ean

SDM

ean

SDM

ean

SDM

ean

SDM

ean

SD

Firs

tYea

rG

PA2.

77.7

83.

02.6

92.

56.7

52.

63.8

32.

97.7

22.

97.7

4H

SGPA

-SQ

3.54

.54

3.68

.46

3.44

.54

3.62

.51

3.63

.49

3.61

.49

HSG

PA-C

3.39

.45

3.65

.40

3.38

.48

3.47

.46

3.55

.44

3.55

.44

SAT-

CR

540.

488

.355

1.7

99.5

501.

784

.950

9.6

87.0

557.

685

.956

3.8

93.6

SAT-

M55

3.0

86.7

613.

296

.250

4.1

86.6

528.

589

.457

9.7

87.4

575.

892

.4SA

T-W

523.

184

.854

8.1

97.2

492.

882

.150

5.0

82.5

548.

683

.855

1.9

90.7

HS

SES

Inde

x15

.19

2.22

15.6

42.

2414

.24

2.65

13.4

83.

2115

.69

2.00

15.6

52.

21

Sam

ple

Size

387

(191

)∗6,

939

(3,7

93)∗

4,16

8(1

,667

)∗5,

087

(1,1

20)∗

49,6

71(2

1,95

2)∗

4,46

0(2

,161

)∗

∗ Sam

ple

size

sfo

rH

SGPA

-Car

esh

own

inpa

rent

hese

s.Se

ete

xtfo

rex

plan

atio

n.

108

Page 9: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Effect of High School SES on the Predictive Validity of SAT Scores

high-school-level aggregates of student-level variables: aggregated father education(high school average of student-reported father education), aggregated mother edu-cation (high school average of student-reported mother education), and aggregatedfamily income (high school average of student-reported family income). Studentswho were missing data on the parent education or income variables nevertheless hadvalues for the high-school-level variable, provided they were not the only student intheir high school. Limitations of these high-school-level variables are discussed in alater section.

We conducted several types of analyses to determine which variables appearedto be candidates for inclusion in the index. The analyses that proved most usefulwere conducted within each college, with the high school as the unit of analysis. Weobtained, for each college, a correlation matrix that included the candidate variables,the average FGPA for the high school, and the average residual from the traditionalmodel for predicting FGPA (i.e., a regression with high school GPA and SAT scoresas predictors). Results were also examined for each ethnic group within each college.In addition, we conducted discriminant analyses, in which the high school was theunit of analysis, and the goal of the analysis was to distinguish high schools withhigh (positive) average residuals from those with low (negative) average residualsfrom the traditional regression model. Based on these analyses, we eliminated highschool variables that had weak or inconsistent relationships with each other or withthe average residuals. Ultimately, we formed the following index:

High School SES Index = aggregated father education + aggregated mother education+ district poverty level.

Given the eventual composition of the index, we view it as a measure of the SESof the school, rather than as a broad index of school quality. School SES has beenfound to be strongly related to student achievement. For example, a comprehensivestudy of the impact of school resources on student achievement by the Public PolicyInstitute of California led the authors to conclude, “By far, the most important factorrelated to student achievement [on the school level] . . . is our measure of SES—the percentage of students receiving free or reduced-price lunches” (Betts, Rueben,& Danenberg, 2000, p. 207). These researchers also found that teacher experienceand credential status were the resource variables that were most strongly associatedwith achievement, and that “variations in teacher characteristics [were] systemati-cally related to differences in student economic status” (p. 205). (In our data, averagestudent-reported family income, in contrast to average student-reported parental ed-ucation, proved to have an inconsistent relationship with average academic achieve-ment. This could be due in part to high school students’ lack of awareness of theirparents’ income and to the absence of any correction for variations in cost of living.)

We also explored the inclusion of “output” variables in one version of our highschool index. Such variables could serve as a rough measure of the academic achieve-ment level of the high school. We considered aggregated SAT (high school averageof total SAT score) and aggregated HSGPA-SQ (high school average of self-reportedHSGPA). We determined that aggregated HSGPA-SQ was not useful for our pur-poses because of its inconsistent relationship with college achievement. (The su-perior predictive value of aggregated SAT, as compared to aggregated HSGPA, is

109

Page 10: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Zwick and Himelfarb

consistent with the findings of Rothstein (2005) and Zwick and Green (2007).) Wetherefore constructed an index that was a weighted average of the main SES indexand aggregated SAT score. The two high school indexes had a correlation of .98 (withhigh schools as the unit of analysis) and were virtually interchangeable in terms oftheir impact on regression results. Therefore, the second index is not discussed fur-ther in this article.

Regression analyses. Our data set had a complex structure that did not lend it-self to the usual type of multi-level analysis. We were interested in the prediction ofFGPA at the 34 colleges. However, the institutional variables that were of most inter-est to us were not, in fact, the college-level variables, but high school characteristics.That is, we were not primarily interested in the variation of the regression coefficientsacross colleges; instead, we were interested in the degree to which college-level pre-diction accuracy was affected by high-school characteristics. In our data set, a givenhigh school could, in theory, be represented in as few as one and as many as 34 col-leges. The number of high schools represented within a college ranged from 479 to5,652, with a median of 2,243. Given the unusual structure of the data, we elected toanalyze the data from each college separately and then summarize the results acrosscolleges.

Within each undergraduate institution, three regression models for predictingFGPA were estimated, using the following predictors:

Model 1: HSGPA-SQ only;Model 2: HSGPA-SQ, SAT Critical Reading (SAT-CR), SAT Math (SAT-M), and

SAT Writing (SAT-W); andModel 3: HSGPA-SQ, SAT-CR, SAT-M, SAT-W, and the High School SES Index.

Model 1 is typically proposed as the best approach for predicting FGPA by thosewho oppose admission tests. Model 2 is, of course, the approach used in most stan-dard test validity and institutional studies; it includes the predictors ordinarily usedin admissions decisions that involve the SAT. Model 3 includes the newly devel-oped high school index. (As noted below, we also considered models that includedtwo-way interactions of the predictors, but these interactions were not found to sub-stantially improve prediction accuracy.)

For each model within each undergraduate institution, a single prediction equationwas estimated based on all members of the freshman cohort who had complete dataon the variables in all three models. Therefore, within a college, each model wasestimated using the same set of students.

Within each college, residuals from the three regression models were obtained forall students. Students were classified according to their ethnicity and whether theirhigh school was above or below the median value (15.7) of the high school SESindex. The average residuals for each of the resulting categories were then tabulatedto illustrate the effect of ethnicity and high school SES on the pattern of residuals foreach model.

Eighteen of the 34 colleges included HSGPA data in their own files. These GPAs,labeled “HSGPA-C” from this point on) could differ from those obtained from theSAT Questionnaire (HSGPA-SQ) for reasons explained below. For each of these

110

Page 11: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Effect of High School SES on the Predictive Validity of SAT Scores

colleges, regression analyses were conducted using HSGPA-C. Results were com-pared to those obtained using HSGPA-SQ.

Results

Group Differences on Key Variables

Table 4 shows differences across ethnic groups in SAT scores and grades, withAsian-American and White students typically attaining higher values than NativeAmerican students, who, in turn, performed better than Latino and African-Americanstudents. As expected, the high school SES index varied systematically across eth-nic groups. Average values for White and Asian-American students were roughlya standard deviation higher than those for Latino students; the averages for NativeAmericans and African-American students fell in between. This finding implies that,on average, White and Asian-American students attended higher-SES high schoolsthan other groups. On average, Latino students attended lower-SES high schools thanany other group.

Regressions for 34 Colleges Using HSGPA-SQ

Average residuals from the regression analyses, along with R2 values for the threemodels, were computed for each college. Analyses did not reveal any associationbetween the patterns of residuals and college-level characteristics. The college-levelresults were therefore summarized by combining the residuals across colleges. Thisproduced results identical to those obtained by computing weighted averages acrossthe college, with weights determined by sample size. Only these summary results arediscussed here. Table 5 gives the results obtained by combining the residuals fromall 70,712 students in the regression analysis,3 along with the weighted average ofthe R2 values for the three models. Clearly, the only substantial increase in R2 valuesoccurs when SAT scores are added (in Model 2) to a prediction equation that includesHSGPA-SQ only (Model 1). The average R2 increases from .15 to .21 at this step;then increases to .23 when the high school SES index is added. The average residualsin Table 5 are discussed in detail below.

A different perspective is obtained by examining the standardized regression co-efficients for Model 3, shown in Table 6. These coefficients indicate the strengthof each predictor, given the particular set of predictors included in the model. Theweighted average of these coefficients across the 34 colleges is .31 for HSGPA-SQ,.14 for SAT-W, .11 for the SES index, and .06 for both SAT-CR and SAT-M. For thisset of predictors, then, the contribution of high school grades is largest, as is typical.The contribution of the high school SES index is larger than that of SAT-CR andSAT-M, but smaller than that of SAT-W.

The column means for Model 1 (Table 5a), which uses only HSGPA-SQ to predictFGPA, show that, on average, students tended to be overpredicted (an average resid-ual of –.10) if they attended low-SES high schools and underpredicted (an averageresidual of +.10) if they attended high-SES high schools, as hypothesized. This pat-tern also occurs within each ethnic group except African-Americans, who are over-predicted in both cases, but much less so for high-index schools than for low-indexschools (–.30 vs. –.13). For Model 2, the column means are –.06 and +.06 for the

111

Page 12: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 5Average Residuals (Observed Minus Predicted) for 34 Colleges Combined, UsingHSGPA-SQ

Low-SES High-SESHigh Schools High Schools Total Group

Ethnic Group Mean N Mean N Mean N

(a) Average Residuals from Model 1Native American −.21 229 .06 158 −.10 387Asian-American −.06 3,264 .11 3,675 .03 6,939African-American −.30 2,813 −.13 1,355 −.25 4,168Latino −.29 3,655 .02 1,432 −.20 5,087White −.05 23,447 .11 26,224 .03 49,671Other −.06 2,072 .11 2,388 .03 4,460

Total Group −.10 35,480 .10 35,232 .00 70,712

(b) Average Residuals from Model 2Native American −.16 229 .04 158 −.08 387Asian-American .00 3,264 .04 3,675 .02 6,939African-American −.15 2,813 −.06 1,355 −.12 4,168Latino −.13 3,655 .02 1,432 −.09 5,087White −.04 23,447 .07 26,224 .02 49,671Other −.04 2,072 .06 2,388 .01 4,460

Total Group −.06 35,480 .06 35,232 .00 70,712

(c) Average Residuals from Model 3Native American −.12 229 −.01 158 −.08 387Asian-American .04 3,264 .00 3,675 .02 6,939African-American −.08 2,813 −.12 1,355 −.09 4,168Latino −.03 3,655 −.04 1,432 −.03 5,087White .00 23,447 .02 26,224 .01 49,671Other .00 2,072 .01 2,388 .01 4,460

Total Group −.01 35,480 .01 35,232 .00 70,712

Note. Model 1: HSGPA-SQ only; average R2 = .150. Model 2: HSGPA-SQ, SAT-CR, SAT-M, SAT-W;average R2 = .212. Model 3: HSGPA-SQ, SAT-CR, SAT-M, SAT-W, and HS SES Index; average R2 =.226.

For Model 1, approximate standard errors (computed under homoscedasticity and simple randomsampling assumptions), are .003 for the column means for low-SES and high-SES high schools.Approximate standard errors of ethnic group row means are .033 for Native Americans, .008 forAsian-Americans, .010 for African-Americans, .009 for Latinos, .003 for Whites, and .010 for “Other.”Standard errors are slightly smaller for Models 2 and 3.

low- and high-SES students, respectively, and for Model 3, they are –.01 and +.01,indicating that, as expected, there is virtually no over- or underprediction for the highschool SES categories once the high school SES index is taken into account.

By comparing the row averages of Table 5, we can see that the inclusion ofSAT scores and the SES index in the regression models does have a notable im-pact on the average residuals for Latino and African-American students, for whom

112

Page 13: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 6Standardized Regression Coefficients for Model 3

College HSGPA-SQ SAT-CR SAT-M SAT-W HS SES Index

College 1 .43 .03 .07 .11 .01College 2 .37 .06 .06 .12 .09College 3 .36 .02 .04 .12 .11College 4 .40 −.03 .01 .14 .11College 5 .40 −.02 .03 .17 .20College 6 .20 .08 −.01 .07 .10College 7 .38 .02 .01 .19 .03College 8 .32 .03 .03 .16 .08College 9 .36 .08 .01 .15 .09College 10 .35 .08 .02 .18 .10College 11 .24 .11 .08 .09 .07College 12 .23 .01 −.08 .14 .06College 13 .27 .04 .08 .16 .09College 14 .31 .13 .00 .16 .10College 15 .39 .02 .04 .20 .09College 16 .37 .07 .04 .15 .08College 17 .39 .22 −.07 .14 .05College 18 .34 .04 .07 .21 .07College 19 .26 .14 .07 .16 .12College 20 .31 .12 .08 .07 −.02College 21 .29 .19 .10 .11 .11College 22 .30 .07 .13 .13 .12College 23 .33 .02 .15 .09 .11College 24 .28 .09 .15 .15 .11College 25 .33 .09 .02 .19 .06College 26 .44 −.02 .05 .11 .17College 27 .42 .07 .02 .09 .04College 28 .25 .09 .09 .14 .14College 29 .31 .03 .05 .09 .15College 30 .27 .01 .02 .12 .19College 31 .38 .01 −.01 .14 .11College 32 .16 .06 .13 .22 .17College 33 .31 .03 .11 .11 .20College 34 .55 .00 .08 .06 .04

Weighted average .31 .06 .06 .14 .11

Note. Weighted averages are computed using college sample sizes as weights.

prediction errors have typically been found to be most severe. Because these groupsare disproportionately represented in the low-SES high schools, their performanceis, on average, substantially overpredicted (indicated by the minus sign) by Model1 and, to a lesser degree, by Model 2, as shown in the row means of Tables 5a and5b. Table 5a shows that for African-American students, the average prediction errorfor Model 1 is a quarter of a grade-point (–.25). The average prediction error forAfrican-Americans is reduced from –.25 to –.12 when SAT scores are added to theprediction equation (Model 2) and is further reduced to –.09 when the SES index is

113

Page 14: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Zwick and Himelfarb

added (Model 3). The tables show that the results for Latino students have a similarpattern. The average prediction error is –.20 for Model 1, –.09 for Model 2, and –.03for Model 3.

Inclusion of SAT scores and high school SES also serves to reduce the under-prediction of students in high-SES schools. In particular, students in these schoolswho are Asian-American, White, or of “Other” ethnicity are underpredicted by .11when HSGPA only is used as a predictor (Table 5a). The average prediction errorsfor these same groups range from .00 to .02 in Model 3 (Table 5c). (The “Other” eth-nicity group includes those who gave a response of “other” and those who gave noresponse to the ethnicity question on the SAT Questionnaire.) The occurrence of sub-stantial prediction errors for some student groups when only high school grades areused as a predictor (Table 5a) is rarely mentioned in discussions of the pros and consof using standardized tests in college admissions. This pattern cannot be inferred byexamining only the correlation coefficients between predictors and FGPA.

When results are averaged over all high schools, the magnitude of prediction errorsfor Asian-American test-takers tends to be much smaller than for African-Americanand Latino test-takers, decreasing from an average of .03 for Model 1 to an averageof .02 for Models 2 and 3. Average prediction errors for White students tend to besimilarly small, which is to be expected because of the high proportion of White stu-dents. In 29 of the 34 schools (all but Colleges 6, 20, 28, 32, and 33), the proportion ofWhite students is at least 50%. In 21 schools, it exceeds 70%. Therefore the commonregression equation in most colleges is likely to be similar to the equation that wouldhave been obtained by including only White students in the regression equation. Thecommon equation was used as the reference equation in our analyses because that iswhat admissions offices would typically estimate. However, for present purposes, theuse of the common equation does have a disadvantage: The magnitude of the aver-age prediction error for each group depends in part on the ethnic composition of thecollege, making the college results less comparable to each other. This dependence isa result of the constraint that the sum of the residuals within a college must be equalto zero.

Some aspects of the regression results were not according to prediction. As notedabove, results for African-Americans were less consistent with the research hypoth-esis than those for Latinos in that they were overpredicted (averaging across col-leges) in both low- and high-SES schools in all regression models. Also, Table 5cshows that, while including the high school SES index in the regression model re-duces overprediction considerably for Latinos (a reduction of .10 grade points) andAfrican-Americans (a reduction of .07 grade points) in the low-index high schools,it does not improve prediction for these groups in the high-index schools. That is, theoverall reduction in prediction error for these ethnic groups results from the large im-provement in the low-index schools. The high-index Latinos and African-Americansactually have a greater magnitude of overprediction than their low-index counter-parts in Model 3. These puzzling results led us to investigate regression models thatincluded interactions between high school SES and the other predictor variables. Itseemed plausible that the effects of high school grades and SAT scores could varywith high school SES. However, these alternative models did not substantially alterthe patterns of residuals.

114

Page 15: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Effect of High School SES on the Predictive Validity of SAT Scores

One noteworthy point about the results in Table 5 is that an increase in R2 doesnot imply a reduction in average prediction errors for all subgroups. Analyses of pre-diction error by gender provide a clear illustration of this fact. Zwick and Himelfarb(2009) obtained the average residuals for women and men for Models 1–3 in the34 colleges, and then obtained weighted averages of the results (as in Table 5). Theprediction error for gender groups was smallest in magnitude (an average underpre-diction of .05 for women) in Model 1, which included only HSGPA-SQ. The sizeof the average error actually grew (to an underprediction of .07 for women) whenSAT was added in Model 2, despite the increase in average R2 from .15 to .21. (Thisis a characteristic pattern for gender results; see Mattern et al., 2008; Zwick, 2002,p. 148). The average error then shrank trivially in magnitude when the high schoolSES index was added.

Under our research hypothesis, adding the high school index to the model was notexpected to reduce average residuals for gender groups, since, in general, males andfemales do not attend different high schools. The observed patterns suggest that therecurrent prediction errors for gender groups do not have the same source as predic-tion errors for ethnic groups. Although results are mixed, some evidence supports thehypothesis that the pattern of prediction errors results from the tendency of womento select college majors that are less stringently graded than those chosen by men.Another conjecture is that women tend to be more diligent and studious in collegethan men with equivalent qualifications (see Young, 2004 and Zwick, 2002, pp. 146-151 for a summary of this literature). These differences may become more apparentin regression models that are more complete (Models 2–3) than in Model 1, whichincludes only high school grades as a predictor.

Regressions for 18 Colleges Using HSGPA-C

As noted earlier, 18 of the 34 colleges included HSGPA data in the files they pro-vided to the College Board. These GPAs could differ from the HSGPA-SQ valuesfor a variety of reasons: They come from high school data that were transmitted tocolleges, rather than from student self-reports; they may have been collected at a dif-ferent time from the self-reported data and thus include different sets of courses; andthey may have been calculated using only selected high school courses (e.g., thoseconsidered to be “academic”), in keeping with rules imposed by the high schoolsor colleges. Finally, HSGPA-SQ was collected in categorical form and converted tonumerical values using the standard procedures of the College Board.4

For students who had data for both HSGPA-SQ and HSGPA-C, Table 7 gives thecorrelation between the two, as well as the means and standard deviations, withineach college. The correlations between the two kinds of GPA data ranged from .59to .85 across the 18 colleges, with a median .of .75. In 15 of 18 colleges, HSGPA-SQhad a slightly higher mean than HSGPA-C; in 17 colleges, it had a higher standarddeviation.

Regressions and residual analyses were conducted for these 18 colleges usingHSGPA-C in place of HSGPA-SQ. The same students were used to estimate all threeregression models. Results are summarized in Tables 8a–c. Only these combined re-sults are discussed here. Except in one school, the sets of students for whom data

115

Page 16: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 7Descriptive Statistics for HSGPA-SQ and HSGPA-C

HSGPA-SQ HSGPA-CCollege Mean (SD) Mean (SD) N Correlation

1 3.15 (.64) 3.08 (.57) 971 .852 3.73 (.43) 3.73 (.39) 1,460 .667 3.42 (.47) 3.35 (.42) 1,165 .748 3.46 (.46) 3.28 (.42) 1,903 .7112 3.09 (.50) 3.04 (.51) 1,098 .7613 3.39 (.42) 3.49 (.40) 2,916 .6815 3.41 (.56) 3.46 (.47) 1,330 .7816 3.58 (.49) 3.50 (.36) 1,335 .8317 3.52 (.57) 3.50 (.45) 425 .7918 3.55 (.48) 3.54 (.36) 2,083 .7919 3.79 (.36) 3.69 (.25) 3,713 .7520 3.48 (.49) 3.47 (.36) 930 .7321 3.57 (.41) 3.52 (.30) 1,746 .8023 3.83 (.38) 3.72 (.26) 1,853 .7324 3.87 (.33) 3.72 (.28) 3,391 .7225 3.73 (.41) 3.96 (.43) 2,638 .5927 3.34 (.58) 3.25 (.48) 2,320 .7734 3.66 (.41) 3.55 (.34) 93 .80

Note. Means and standard deviations are based on the students who had data for both HSGPA variables.

were available were not identical for the two GPA variables. Therefore, except inthis school, sample sizes are not the same as those reported in the 34-school analy-sis. Also, because students were divided into above- and below-median high schoolsusing the SES index median for all students (not merely the ones in the 18-collegeanalysis), the numbers of students above and below the median differ substantially.

Results for the 18-college HSGPA-C analyses revealed patterns similar to thoseobtained in the 34-college HSGPA-SQ analysis (Table 5), but the average R2 valueswere considerably larger, ranging from .23 to .29 across the models, compared to arange of .15 to .23 for the HSGPA-SQ analysis. Correspondingly, the average pre-diction errors in the 18-college analysis were not as large as those in the 34-collegeanalysis. This finding is difficult to interpret since neither the set of colleges nor theset of students within the colleges were the same as in the earlier analysis. However,for College 1, exactly the same sample (N = 971) was available for the two anal-yses, allowing a meaningful comparison. Whereas the HSGPA-SQ analysis yieldedR2 values of .28, .30, and .30 for Models 1, 2, and 3, respectively, the HSGPA-Canalysis produced R2 values of .39, .40, and .40. These results suggest that HSGPA-C may better capture students’ academic capabilities, a finding that is consistent withthe differing sources of information for these two GPA measures.

The column means of the residuals for Model 1 (Table 8a), which uses onlyhigh school grades to predict FGPA, showed that, on average, students tended tobe overpredicted if they attended low-SES high schools and underpredicted if theyattended high-SES high schools. This was also true within each ethnic group except

116

Page 17: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Table 8Average Residuals (Observed Minus Predicted) for 18 Colleges Combined, Using HSGPA-C

Low-SES High-SESHigh Schools High Schools Total Group

Ethnic Group Mean N Mean N Mean N

(a) Average Residuals from Model 1Native American −.09 121 .12 70 −.01 191Asian-American −.09 1,951 .00 1,842 −.05 3,793African-American −.20 1,010 −.11 657 −.16 1,667Latino −.19 631 −.01 489 −.11 1,120White −.05 9,985 .08 11,967 .02 21,952Other .00 949 .07 1,212 .04 2,161

Total group −.07 14,647 .06 16,237 .00 30,884

(b) Average Residuals from Model 2Native American −.05 121 .13 70 .01 191Asian-American −.03 1,951 −.01 1,842 −.02 3,793African-American −.11 1,010 −.06 657 −.09 1,667Latino −.11 631 .00 489 −.06 1,120White −.04 9,985 .05 11,967 .01 21,952Other .01 949 .04 1,212 .03 2,161

Total group −.04 14,647 .04 16,237 .00 30,884

(c) Average Residuals from Model 3Native American −.01 121 .09 70 .03 191Asian-American .01 1,951 −.05 1,842 −.02 3,793African-American −.05 1,010 −.09 657 −.06 1,667Latino −.04 631 −.03 489 −.04 1,120White .00 9,985 .02 11,967 .01 21,952Other .05 949 .00 1,212 .02 2,161

Total group .00 14,647 .00 16,237 .00 30,884

Note. Model 1: HSGPA-C only; average R2 = .229. Model 2: HSGPA-C, SAT-CR, SAT-M, SAT-W;average R2 = .281. Model 3: HSGPA-C, SAT-CR, SAT-M, SAT-W, and HS SES Index; average R2 =.290.

For Model 1, approximate standard errors (computed under homoscedasticity and simple randomsampling assumptions) are .005 and .004 for the column means for low-SES and high-SES high schools,respectively. Approximate standard errors of ethnic group row means are .040 for Native Americans,.009 for Asian-Americans, .014 for African-Americans, .017 for Latinos, .004 for Whites, and .012 for“Others.” Standard errors are slightly smaller for Models 2 and 3.

African-Americans, who were overpredicted in both cases, but less so for the high-index schools, as in the analysis of Table 5a. Model 2 results (Table 8b) had a similarpattern. Because Latinos and African-Americans were disproportionatelyrepresented in the low-SES high schools, their performance was, on average, over-predicted by Models 1 and 2 (see the row means of Tables 8a and 8b), although lessso than in the 34-college analysis. For Model 3 (Table 8c), the column means areclose to zero, indicating that, as expected, there was virtually no over- or underpre-diction for the high school SES categories once the index was taken into account.

117

Page 18: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Zwick and Himelfarb

As in the previous analysis, the inclusion of both SAT scores and the index in theregression models had an impact on the average residuals for Latino and African-American students. The row margins of Table 8a shows that for African-Americanstudents, the average prediction error for Model 1 is –.16. The average error is re-duced from –.16 to –.09 when SAT scores are added to the prediction equation(Model 2) and is further reduced to –.06 when the high school SES index is added(Model 3). Results for Latino students have a similar pattern. The average predictionerror is –.11 for Model 1, –.06 for Model 2, and –.04 for Model 3. The magnitude ofthe prediction errors for all other ethnic groups tends to be small.

Some results were not according to prediction, as in the analysis of Table 5. Inparticular, while including the high school SES index in the regression model clearlyreduces overprediction for African Americans in the low-index high schools, it doesnot improve prediction for this group in the high-index schools. In fact, the high-index African-Americans have a greater magnitude of overprediction than their low-index counterparts in Model 3, as in the 34-college analysis.

Summary and Discussion

We developed an index of high-school-level SES by combining the high school av-erage of student-reported parental education with a measure of district-level poverty.As expected, this index varied systematically by ethnicity. Average values for Whiteand Asian-American students were roughly a standard deviation higher than thosefor Latino students; the values for Native Americans and African-American studentsfell in between.

Results obtained by averaging over all colleges showed that incorporating the highschool SES index in the regression model used to predict first-year college GPAhelped to reduce average prediction errors for ethnic groups. Models that did notinclude the index generally produced overpredictions of students in low-SES schoolsand underpredictions of those in high-SES schools. Because African-American andLatino students are disproportionately represented in low-SES schools, they tend tobe overpredicted when the regression models traditionally used in admissions areapplied. This has been a recurrent finding in admissions research.

Some aspects of the regression results were not consistent with predictions. Re-sults combined across colleges show that FGPAs for African-American studentstended to be overpredicted in both low- and high-SES schools, although the degreeof overprediction was much less in the high-SES schools. Also, although includingthe high school SES index in the regression model reduced overprediction consid-erably for Latinos and African Americans overall, it did not improve prediction forthese groups in the high-index schools. For unknown reasons, the high-index Latinosand African-Americans displayed slightly more overprediction than their low-indexcounterparts in the models that included the high school SES index.

Limitations in the data that were available to construct our index presented a sub-stantial challenge in this study. We did not have access to such instructionally relevantvariables as teacher-student ratio or instructional expenditures. Also, as noted earlier,private high schools could not be included in our study because of incomplete data.Pike and Saupe (2002) found that the public-private distinction was useful in pre-dicting first-year college GPA. Other variables were less than ideal for our purposes.

118

Page 19: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Effect of High School SES on the Predictive Validity of SAT Scores

Both per-pupil expenditure data and poverty data, for example, were available at thedistrict, but not the school level.

Interestingly, two of the most useful high-school level variables were aggregatedfather education and aggregated mother education, despite the fact that these werederived from test-takers’ responses to the SAT Questionnaire and therefore subjectto possible errors in students’ reports of their parents’ education. In addition, thehigh-school-level averages are based on only the students included in our data set,rather than on all students attending the high school. A related limitation is that thenumber of students for whom we have data is very small in some high schools. In theextreme case in which there is only one student, which occurs in several instances,the “aggregated” father and mother education for that student are actually based ononly that student’s responses. In these cases, the “aggregated” values are clearly aninadequate measure of high school characteristics. We investigated the effect of elim-inating high schools that were poorly represented in our data set, but found that thisled to substantial reductions in sample size. We therefore elected to include theseschools.

From the perspective of interpretation, a possible objection to the aggregatedparental education measures in the present context is that they represent home envi-ronment characteristics rather than school factors. However (except in schools withonly 1 student in the data set), it is not the education of the individual student’s par-ents, but the high school average of parental education, that is included in the index.In the absence of more direct measures of school resources, this high school averageis intended to serve as a rough measure of school SES.

We undertook a variety of approaches to improve the fit of our models and toenhance the interpretability of our results. In particular, we sought additional highschool data that would enhance the validity of the index as a measure of high schoolresources. Unfortunately, we did not have access to additional data. We also inves-tigated regression models that included interactions between high school SES andthe other predictor variables, but these alternative models did not substantially al-ter the patterns of residuals. Finally, we conducted an analysis to determine whethercollege-level factors, such as size and admission rate, were predictive of the patternsof residuals that were obtained. We did not find this to be the case.

The degree to which our findings are generalizable is unknown. Our 34 collegeswere not randomly selected, nor were students within institutions. Additional workis needed to investigate the utility of high school SES indexes in data sets involvinga broader set of colleges and high schools, including private high schools. We hopethat our work can serve as a springboard for future research.

It is encouraging that, despite its limitations, our high school index did serveto reduce systematic errors in the prediction of first-year college grade-point aver-ages. In some cases, the reduction was substantial. Future work in this area shouldfocus on the development of an improved high school index, ideally including suchvariables as teacher-student ratio, teacher experience, percentage of fully creden-tialed teachers, Advanced Placement and International Baccalaureate offerings, per-pupil instructional expenditure, and percentage of students participating the free orreduced-price lunch program. To increase the stability of such high school indexes, itmay be advantageous to derive them using multiple years of data. The inclusion of a

119

Page 20: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Zwick and Himelfarb

high school quality index in the standard prediction equations used by undergraduateinstitutions could improve the accuracy of predictions of college performance.

Acknowledgments

We are grateful to the College Board for sponsoring this research, which was un-dertaken when the first author was at the University of California, Santa Barbara.In particular, we thank Wayne Camara and Andrew Wiley for providing the dataneeded for the project. We also appreciate the comments of Brent Bridgeman andJohn Young. An earlier version of this article was presented at the annual meeting ofthe American Educational Research Association (Zwick & Himelfarb, 2009).

Notes1See Bridgeman, Burton, and Pollack (2008) for a counterargument.2One major finding argues against measurement error as an all-purpose explana-

tion for overprediction: On standardized admissions tests, women often score lowerthan men, yet their later grades tend to be underpredicted. This is contrary to the pre-diction that would be obtained from a simple measurement error model. This issueis further discussed in a later section.

3In splitting the students into below-median and above-median groups, studentswith tied index values were kept together. The pattern of ties (resulting from the factthat students within a high school shared a common index value) produced a mediansplit in which the two groups differed in size by 248 students.

4A+ = 4.33, A = 4, A– = 3.67, B+ = 3.33, B = 3, B– = 2.67, C+ = 2.33, C =2, C– = 1.67, D+ = 1.33, D = 1, E or F = 0.

References

Betts, J. R., Rueben, K. S., & Danenberg, A. (2000). Equal resources, equal outcomes? Thedistribution of school resources and student achievement in California. San Francisco: Pub-lic Policy Institute of California.

Bowen, W. G., & Bok, D. (1998). The shape of the river: Long-term consequences of consid-ering race in college and university admissions. Princeton, NJ: Princeton University Press.

Bridgeman, B., Burton, N., & Pollack, J. (2008). Predicting grades in college courses: A com-parison of multiple regression and percent succeeding approaches. The Journal of CollegeAdmission, 199, 19–25.

Carnegie Foundation for the Advancement of Teaching (2008). Carnegie Classifications DataFile. Retrieved March 18, 2009 from http://www.carnegiefoundation.org/classifications/index.asp?key=809

Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and White students in integratedcolleges. Journal of Educational Measurement, 5, 115–124.

College Entrance Examination Board (2006). College-bound seniors. RetrievedFebruary 2, 2010 from http://professionals.collegeboard.com/data-reports-research/sat/archived/2006.

Crouse, J., & Trusheim, D. (1988). The case against the SAT . Chicago, IL: University ofChicago Press.

Geiser, S., & Santelices, M. V. (2007). Validity of high-school grades in predicting studentsuccess beyond the freshman year. (Research & Occasional Paper Series: CSHE.6.07.)Berkeley: University of California Center for Studies in Higher Education.

120

Page 21: The Effect of High School Socioeconomic Status on the Predictive Validity of SAT Scores and High School Grade-Point Average

Effect of High School SES on the Predictive Validity of SAT Scores

Geiser, S., & Studley, R. (2004). UC and the SAT: Predictive validity and differential impactof the SAT and SAT II at the University of California. In R. Zwick (Ed.), Rethinking theSAT: The future of standardized testing in university admissions (pp. 125–153). New York,NY: RoutledgeFalmer.

Linn, R. L. (1966). Grade adjustments for prediction of academic performance: A review.Journal of Educational Measurement, 3, 313–329.

Linn, R. L. (1983). Predictive bias as an artifact of selection procedures. In H. Wainer & S.Messick (Eds.), Principles of modern psychological measurement: A Festschrift for Fred-eric M. Lord (pp. 27–40). Hillsdale, NJ: Erlbaum.

Mattern, K. D., Patterson, B. F., Shaw, E. J., Kobrin, J. L., & Barbuti, S. M. (2008). Differentialvalidity and prediction of the SAT (College Board Research Report No. 2008-4). New York,NY: The College Board.

Pike, G. R., & Saupe, J. L. (2002). Does high school matter? An analysis of three methods ofpredicting first-year grades. Research in Higher Education, 43, 187–207.

Rothstein, J. M. (2005, April). SAT scores, high schools, and collegiate performance pre-dictions. Presented at the meeting of the National Council on Measurement in Education,Montreal.

Snedecor, G. W., & Cochran, W. G. (1967). Statistical methods (6th ed.). Ames: The IowaState University Press.

Wainer, H., & Brown, L. M. (2007). Three statistical paradoxes in the interpretation of groupdifferences: Illustrated with medical school admission and licensing data. In C.R. Rao and S.Sinharay (Eds.), Handbook of statistics Vol. 26: Psychometrics (pp. 893–918). Amsterdam,The Netherlands: North-Holland.

Young, J. W. (2004). Differential validity and prediction: Race and sex differences in collegeadmissions testing. In R. Zwick (Ed.), Rethinking the SAT: The future of standardized testingin university admissions (pp. 289–301). New York, NY: RoutledgeFalmer.

Young, J. W., & Johnson, P. M. (2004). The impact of an SES-based model on a college’sundergraduate admissions outcomes. Research in Higher Education, 45, 777–797.

Zwick, R. (2002). Fair game? The use of standardized admissions tests in higher education.New York, NY: RoutledgeFalmer.

Zwick, R., & Green, J. G. (2007). New perspectives on the correlation of SAT scores, highschool grades, and socioeconomic factors. Journal of Educational Measurement, 44, 23–45.

Zwick, R., & Himelfarb, I. (2009, April). The effect of high school quality on the predictivevalidity of SAT scores and high school grade-point average. Presented by I. Himelfarb atthe Annual Meeting of the American Educational Research Association, San Diego.

Zwick, R., & Schlemer, L. (2004). SAT validity for linguistic minorities at the University ofCalifornia, Santa Barbara. Educational Measurement: Issues and Practice, 25(2), 6–16.

Authors

REBECCA ZWICK is a Distinguished Presidential Appointee at Educational Testing Service,MS 12T, Rosedale Road. Princeton, NJ 08541; [email protected]. Her areas of expertise areapplied statistics and psychometrics, test validity and fairness, and higher education admis-sions testing.

IGOR HIMELFARB is a Graduate Student at the Department of Education and the Depart-ment of Statistics and Applied Probability at the University of California, Santa Barbara,Santa Barbara, CA 93106; [email protected]. His areas of expertise are applied statisticsand psychometrics.

121