grades or scores: predicting future college mathematics peflormance

6
Grades or Scores:Predicting Future College Mathematics Peflormance Cathy Kessel Marcia C. Linn Universityo f California,Berkeley Do men and women with similar admissions test scores earn similar grades in mathematics? What factors may explain underprediction for women? Do men and women differ in persistence in mathematics study? hat are the best indicators of success in mathematics, and how are recent reforms in mathe- matics courses changing the rela- tionships between predictors and success? Whereas grades and en- trance examination scores each pre- dict success in college mathematics within gender groups, scores under- predict female success. This has a va- riety of consequences, not only in undergraduate admissions but in other situations where entrance ex- amination scores are used. Speed in selecting answers to short problems is an important factor in entrance examinations. In contrast, recent re- forms in high school and college mathematics emphasize communica- tion, both oral and written, different pedagogical styles, and more sus- tained problem solving. We review student performance on entrance examinations and in high school and college courses and examine the va- lidity of entrance examinations. In addition, we analyze mathemati- cians’ views of examinations and their relationship with calculus reform. Predictive Validity of Grades and Scores Multiple-choice college entrance ex- aminations are a pervasive part of American life. Based on the notion that college admissions and scholar- ships should reflect merit, rather than privilege, standardized entrance examinations have taken on roles undreamed of by their creators. In particular, the SAT-M now figures in college admissions, scholarship and fellowship decisions, and research on mathematical ability, Over the years, scores from the voluntary sample of college-bound high school students taking the SAT- M or ACT-M have shown a consis- tent gap between male and female performance of about 0.4 standard deviation units. In the 1980s, this was explained as reflecting differ- ences in high school course experi- ence (National Assessment of Edu- cational Progress [NAEP], 1988). Data from the Educational Testing Service indicate that, for students with the same course experience, the performance gap on SAT remains about 0.4 standard deviation units (Linn & Kessel, 1996). Recent data from the National Science Founda- tion show that differences in course experience have decreased. College- bound females average 3.7 years, and males average 3.8 years of high school mathematics. Females com- prise 43% of the students taking the Advanced Placement Calculus examination (Linn & Kessel, 1996). Though the gap in high school course experience has narrowed, the score difference has hardly changed (Linn & Kessel, 1996). In contrast, females earn higher grade point averages than males, both in high school and in college. A synthesis of studies of over 100,000 students at colleges and universities across the United States shows that this trend continues in college (Linn & Kessel, 1996). Moreover, there is evidence that women complete their undergraduate studies more quickly than men. Bank (19951,in a study of 424 students at a major state univer- sity, found that not only did women receive significantly higher grades but they also made significantly faster progress completing under- graduate degrees. Women also tend to earn higher grades in college mathematics. A di- verse set of studies of over 39,000 students show that females perform as well as or better than males in required and advanced undergradu- ate mathematics courses. Studies of mathematics majors, and students in the top 10% of the university, report that women earn equal or higher grades than men in science and mathematics (Leonard & Jiang, 1995; Linn & Kessel, 1996). A study of a representative sample of bache- lor’s degree recipients from 81 insti- tutions in the U. S. and Puerto Rico found that 31.5% of female mathe- matics majors earned GPAs over 3.5. Cathy Kessel is Assistant Editor of Key Curriculum Press, Visiting Fellow at the University of Melbourne, and Visiting Scholar at the University of California at Berkeley, Graduate School of Education, 4611 Tolman Hall, No. 1670, Berkeley, CA 94720-1 670. Her specializations are research in mathematics teaching and learning, and mathematics and gender. Marcia C. Linn is a Professor at the Graduate School of Education, University of California at Berkeley, 4611 Tolman Hall, No. 1670, Berkeley, CA 94720-1670. Her specializations are instructional tech- nology, science education, and gender. 10 Educational Measurement: Issues and Practice

Upload: cathy-kessel

Post on 21-Jul-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Grades or Scores: Predicting Future College Mathematics Peflormance

Grades or Scores: Predicting Future College Mathematics Peflormance Cathy Kessel Marcia C. Linn University of California, Berkeley

D o men and women with similar admissions test scores earn similar grades in mathematics? What factors may explain underprediction for women? D o men and women differ in persistence in mathematics study?

hat are the best indicators of success in mathematics, and

how are recent reforms in mathe- matics courses changing the rela- tionships between predictors and success? Whereas grades and en- trance examination scores each pre- dict success in college mathematics within gender groups, scores under- predict female success. This has a va- riety of consequences, not only in undergraduate admissions but in other situations where entrance ex- amination scores are used. Speed in selecting answers to short problems is an important factor in entrance examinations. In contrast, recent re- forms in high school and college mathematics emphasize communica- tion, both oral and written, different pedagogical styles, and more sus- tained problem solving. We review student performance on entrance examinations and in high school and college courses and examine the va- lidity of entrance examinations. In addition, we analyze mathemati- cians’ views of examinations and their relationship with calculus reform.

Predictive Validity of Grades and Scores Multiple-choice college entrance ex- aminations are a pervasive part of American life. Based on the notion that college admissions and scholar- ships should reflect merit, rather than privilege, standardized entrance

examinations have taken on roles undreamed of by their creators. In particular, the SAT-M now figures in college admissions, scholarship and fellowship decisions, and research on mathematical ability,

Over the years, scores from the voluntary sample of college-bound high school students taking the SAT- M or ACT-M have shown a consis- tent gap between male and female performance of about 0.4 standard deviation units. In the 1980s, this was explained as reflecting differ- ences in high school course experi- ence (National Assessment of Edu- cational Progress [NAEP], 1988). Data from the Educational Testing Service indicate that, for students with the same course experience, the performance gap on SAT remains about 0.4 standard deviation units (Linn & Kessel, 1996). Recent data from the National Science Founda- tion show that differences in course experience have decreased. College- bound females average 3.7 years, and males average 3.8 years of high school mathematics. Females com- prise 43% of the students taking the Advanced Placement Calculus examination (Linn & Kessel, 1996). Though the gap in high school course experience has narrowed, the score difference has hardly changed (Linn & Kessel, 1996).

In contrast, females earn higher grade point averages than males, both in high school and in college. A

synthesis of studies of over 100,000 students at colleges and universities across the United States shows that this trend continues in college (Linn & Kessel, 1996). Moreover, there is evidence that women complete their undergraduate studies more quickly than men. Bank (19951, in a study of 424 students at a major state univer- sity, found that not only did women receive significantly higher grades but they also made significantly faster progress completing under- graduate degrees.

Women also tend to earn higher grades in college mathematics. A di- verse set of studies of over 39,000 students show that females perform as well as or better than males in required and advanced undergradu- ate mathematics courses. Studies of mathematics majors, and students in the top 10% of the university, report that women earn equal or higher grades than men in science and mathematics (Leonard & Jiang, 1995; Linn & Kessel, 1996). A study of a representative sample of bache- lor’s degree recipients from 81 insti- tutions in the U. S. and Puerto Rico found that 31.5% of female mathe- matics majors earned GPAs over 3.5.

Cathy Kessel is Assistant Editor of Key Curriculum Press, Visiting Fellow at the University of Melbourne, and Visiting Scholar at the University of California at Berkeley, Graduate School of Education, 461 1 Tolman Hall, No. 1670, Berkeley, CA 94720-1 670. Her specializations are research in mathematics teaching and learning, and mathematics and gender.

Marcia C. Linn is a Professor at the Graduate School of Education, University of California at Berkeley, 4611 Tolman Hall, No. 1670, Berkeley, CA 94720-1670. Her specializations are instructional tech- nology, science education, and gender.

10 Educational Measurement: Issues and Practice

Page 2: Grades or Scores: Predicting Future College Mathematics Peflormance

The corresponding figure for males was 21.1% (National Science Foun- dation, 1994).

Scores tend to underpredict the grades of females relative to those of males in mathematics courses. For example, Gross (Linn & Kessel, 1996) studied more than 4,000 Mary- land high school students. Girls took the same advanced math courses as boys, in the same classrooms and with the same teachers. Though girls earned higher grades, their SAT scores were lower by about 0.4 stan- dard deviation units. Another study of students selected as gifted because of their high scores on SAT tests taken in Grade 7 found that girls’ av- erage SAT-M scores in 7th and 12th grade were lower than those for boys but girls received higher high school mathematics grades (Benbow, 1992). In college, the females from the same population continued to earn higher grades and a slightly greater per- centage of undergraduate degrees in mathematics. Moreover, a larger per- centage were college valedictorians (Benbow, 1992).

A similar pattern of grades and scores occurs for college mathematics courses. Wainer and Steinberg (Linn & Kessel, 1996) analyzed a data set consisting of the SAT-M scores and first-year college mathematics grades of almost 47,000 students attending 51 colleges and universities. They found that females who received the same grades in the same courses av- eraged 33 fewer points on the SAT-M. This was true for courses in remedial mathematics, regular mathematics, precalculus, calculus, and courses be- yond calculus.

Scores also tend to underpredict women’s college grade-point aver- ages relative to those of men. In a comprehensive study at the Univer- sity of California at Berkeley, Leonard and Jiang (1995) analyzed grade point averages of all 10,000 students who entered between 1986 and 1988 and were admitted solely on the basis of high school grades and SAT scores. Because of the use of SATs in the selection formula, males were selected whose under- graduate grades were lower than those predicted by the study for the rejected females. When field of study was controlled, the formula under- predicted women’s cumulative col- lege GPA by a small but significant

amount. (Interestingly, in the case of humanities, SAT-M was negatively related to college GPA for both men and women.) Had women with lower SATs been selected instead of the men at the cutoff point, the women would have earned higher GPAs than the men who attended the uni- versity. Needless to say, faculty con- sider grades more valid than scores and would prefer to admit students who will succeed in their courses.

Other universities-among them, MIT, Rutgers, and Princeton-have reported similar underprediction of women’s grades (Linn & Kessel, 1996). Some have modified their admissions policies in response to such findings. For instance, MIT no longer restricts admissions to stu- dents who score over 750 on the SAT-M. In 1986, 60% of MIT’s first- year class scored below 750, and 8% scored 500 or below (Behnke, as cited in Rosser, 1989). MIT’s change in admissions policy has eliminated the gap between women’s and men’s grades while increasing the overall quality of the class (Linn & Kessel, 1996).

Over 200 colleges and universi- ties-among them, Bates, Bowdoin, and Hampshire, and many colleges in the California State University system-have made submissions of SAT and ACT scores for admissions optional (FairTest, 1995). A study conducted at Bates revealed that successful applicants who did not submit SAT scores averaged 80 points less than successful appli- cants who did. However, they did not differ in first-year GPA and academic standing (Linn & Kessel, 1996).

Though the SAT and ACT were designed for use in college admis- sions, they have been used for other purposes-for instance, as a mea- sure of high school achievement in awarding scholarships. The under- prediction of female grades by SAT scores argues against their use in awarding scholarships (Connor & Vargyas, 1992). Legal cases disputing the fairness of standardized tests have had some success (for a review, see Connor & Vargyas, 1992). In par- ticular, the practice of awarding a scholarship on the basis of high school achievement was found to be unfair when high school achieve- ment was measured solely by en- trance examinations.

The same argument applies to use of SAT and ACT to determine en- trance to special programs in mathe- matics, such as summer programs for high school students or research opportunities for undergraduates. Because the tests underpredict grades and have not been validated for these programs, this use may exclude talented women (Connor & Vargyas, 1992).

The SAT-M is sometimes used as a placement examination for college mathematics courses. Bridgeman and Wendler (1989, p. 23) studied 3,499 first-year mathematics stu- dents’ grades and scores at 10 col- leges and found that “the SAT-M score by itself is a relatively poor predictor of success in college math- ematics courses when compared to tests specifically designed for place- ment purposes.” Studies at Arizona State University and Embry-Riddle Aeronautical University found that correlations between the ACT-M or SAT-M and course grades ranged be- tween .30 and .50 or lower (Hassett, Downs, & Jenkins, 1994). Changes in placement methods (which in- cluded dropping the use of SAT-M and ACT-M scores) decreased stu- dent enrollment in remedial courses, increased enrollment in calculus, and (counter to many intuitions) in- creased student success rates.

Cognitive Analysis of Learning and Performance on Standardized Tests Why do women earn better grades in both high school and college, yet earn worse scores on entrance exam- inations? One explanation may be that females learn more than males in mathematics courses.

Learning Practices A synthesis of studies (Linn & Kessel, 1996) suggests that females tend to have study procedures which are more likely to yield a compre- hensive and robust understanding of mathematics than those of males. Females report spending more time reflecting on similarities among problems, organizing and linking their ideas, and reviewing material. The experiences of a female engi- neering student at Princeton illus- trate these findings.

I did notice, however, that the women tended to work together

Winter 1996 11

Page 3: Grades or Scores: Predicting Future College Mathematics Peflormance

on homework. This seemed to be because we worked in the same style. The two or three women I worked with the most usually started the problem sets early, then compared them with each other. . . . Lots of the men I know started the problem sets the night before they were due and just did their best and handed them in. We (my women friends and I) almost never turned a problem set in that wasn’t perfect. I don’t know whether these are good general- izations or if I simply attracted people like that around me be- cause I am that way, (Linn & Kessel, 1996, p. 115)

A cognitive analysis of group dif- ferences in classroom learning sup- ports this view. Gallagher (in press) studied the solution methods used by students with high SAT-M scores and found that more females than males use classroom procedures in solving SAT items. As Gallagher points out, learning the methods ad- vocated in the class is appropriate but may mean that the best learners are not the speediest SAT-takers. Gallagher conjectured that using classroom procedures but not short- cuts might account for some of the gender differences in college en- trance examinations. A student com- ment supports this view,

With hindsight, I can understand that my lack of confidence and eventual fear of things mathemat- ical were due in great part to a lack of experience and never hav- ing had an opportunity to develop shortcuts for calculations. I tended to try to do mentally what I’d been taught to do on paper and with pencil-borrow from the ten, carry the one, etc.-never rounding off or estimating generally. (Linn & Kessel, 1996, p. 114) Some mathematics teachers dis-

courage students’ use of non-text- book algorithms and shortcuts (Corwin, 1989). This supports Gal- lagher’s contention that the best learners may not be the best SAT- takers.

Test-Taking Practices Success on SAT-M and ACT-M re- quires students to solve problems efficiently. A recent version of SAT-M had sections of 25 and 35 problems, each administered in 30 minutes. To create the gap in scores, females, on average, get about three more items wrong than males.

The performance of males and fe- males on non-high-stakes examina- tions such as general tests of mathematical ability does not show a gender gap (Linn & Kessel, 1996). This suggests that the context of taking an examination which many high school students consider an im- portant determinant of the future and a test of ability may be a factor in explaining performance differ- ences between males and females on entrance examinations. In a series of psychological experiments, Spencer and Steele (in press) asked under- graduates who had received good grades in calculus courses to take examinations consisting of mathe- matics questions from the Graduate Record Examination. The experi- menters simulated the presence or absence of expectations by telling or not telling the students that they were expecting gender differences in performance. When students took easy examinations, this produced no effect. However, for difficult exami- nations, there was a gap in average scores for males and females when the experimenters announced that they expected gender differences.

Coaching reflects the view that entrance exams have little to do with classroom mathematics and much to do with efficiency. Coaching schools inform students that questions at the beginning of each section are easy and questions at the end are hard and teach students to allocate their time accordingly and not to skip easy questions. Students are also taught strategies involving use of item an- swers and to classify test items ac- cording to the strategies that can be used to solve them. For instance, on the following SAT question,

20. If 0 < x < 1, which of the follow- ing is the greatest?

(A) x2 (Bl x ( C l x 1 (D)? 1 1 (El?,

a coaching school student would be taught to use a plug-in strategy- that is, to pick a number for x and to use that to select the correct answer (Testtakers, 1993). (This is different from a similarly named strategy de- scribed by Gallagher, though it is one of the strategies she describes.) A student might also use the coaching school observation that answer choices are almost always linearly ordered and simply choose E. These

strategies are appropriate for a forced choice question, but they may be of little or no use in course exam- inations. If, as Gallagher’s work sug- gests, more males than females use such strategies in entrance examina- tions, then this might explain why scores overpredict the grades of males relative to females. If this view holds, then, on average, entrance ex- aminations test females’ speed in using classroom algorithms, while males are tested on their use of short-cuts.

Patterns of Persistence On the basis of their study habits, high school mathematics prepara- tion, and superior grades, one might predict that women would predomi- nate in mathematics. However, just the opposite is true. Overall, 43% of BAS in mathematics granted by U. S. institutions go to females. Women receive 48% of the BAS awarded by 4-year colleges. In contrast, women receive 35% of the BAS awarded by PhD-granting institutions. Of the students entering graduate school in mathematics, 35% are women, yet only about 20% of PhD’s in mathe- matics go to females. Only 3% of the faculty at the “top ten” mathematics departments are women (Linn & Kessel, 1996).

Many students who begin college intending a mathematics or science major graduate without degrees in those fields. About two thirds of the students who enter college inter- ested in mathematics switch to other areas. In a national sample of over 800,000 students entering college in 1987, 72% of females and 62% of males who declared mathematics or statistics majors later changed fields (Linn & Kessel, 1996). In contrast, students who declared majors out- side mathematics rarely switched into mathematics.

One might argue that the students who switch out of mathematics are those who “should,” that they are “people whose mathematical past has caught up with them and should have flunked out earlier” (Linn & Kessel, 1996, p. 114). However, many studies indicate that switching and persisting in science or mathematics at a particular college or university are not predicted by high school GPA, college GPA, or SAT-M (Linn & Kessel, 1996). Neither grades nor scores explain why students switch

12 Educational Measurement: Issues and Practice

Page 4: Grades or Scores: Predicting Future College Mathematics Peflormance

out of mathematics and science. Why do they switch?

Seymour and Hewitt’s ethno- graphic study of 335 undergraduates on seven campuses offers some in- sights (Linn & Kessel, 1996). These students had declared majors in mathematics or science and, because their SAT-M scores exceeded 650, might have been expected to receive degrees in mathematics or science. Many did not.

Switchers and persisters had a lot in common. Large percentages of both groups were concerned with poor teaching, inadequate advising, and the competitive culture of math- ematics and science. Between 20% and 35% of both switchers and per- sisters in Seymour and Hewitt’s study reported that (a) the curricu- lum is fast-paced, (b) they prefer the teaching in nonmathematics and nonscience courses, and (c) they have encountered conceptual di ffi cul ties,

Three factors distinguished switchers from persisters. First, over 80% of the switchers and 60% of the persisters mention poor teaching in mathematics and science as a con- cern. Second, more switchers than persisters mentioned inadequate ad- vising and better education in majors outside of math and science. Third, the competitive culture influenced switchers more than persisters.

Although both persisters and switchers are dissatisfied with teach- ing and advising and discouraged by the competitive culture of mathe- matics, more women than men switch because of these conditions. One factor may be women’s ten- dency to be better learners-they may learn not only the subject mat- ter but lessons about the nature of mathematical ability

In entrance examinations, with very few exceptions (Young, 19801, there is always one right answer, and questions should be answered quickly. Entrance examinations dif- fer from high school and lower divi- sion college examinations because questions are nonroutine in the sense that they aren’t clones of text- book exercises and draw on different mathematical domains. This seems to be part of the basis for the claim by some researchers (e.g., Benbow, 1992) that entrance examinations, rather than grades, measure mathe- matical ability. Using entrance ex-

aminations as a measure of mathe- matical ability is consistent with the view that mathematical activity is not the creation or discovery of new mathematics but clever and speedy manipulation of that which is al- ready known.

Mathematicians’ Views of Examinations and Mathematical Ability In contrast, professional mathemati- cians value the solution of difficult problems. Often such problems have remained unsolved for a long time (one example is Fermat’s last theo- rem which was stated, but not proved, in the seventeenth century) and require far more than one page and 5 minutes for their solution. Such results are sometimes obtained by making connections between dif- ferent representations of the same object or generating a new represen- tation of an object.

In this view of mathematics, suc- cess in test taking is valued, but fail- ure by those considered talented is overlooked. Even sophisticated ex- aminations are not considered a particularly good measure of math- ematical ability. The Hungarian Eotvos competitions for high school students consist of three nonroutine problems requiring only high school level mathematics that are to be solved within 4 hours. Some winners of this competition later became fa- mous. However, other well-known and extremely successful Hungarian mathematicians were not successful examtakers. Paul Erdos (known for his peripatetic lifestyle, many collab- orations, and impressive work in number theory, set theory, and finite combinatorics) told Hersh and John- Steiner (1993) “I did not do terribly well at these [Eotvos] competitions.” George Pdya, renowned for his mathematics as well as his work on problem solving, did not hand his Eotvos competition paper in (Hersh & John-Steiner, 1993).

In the United States, mathemati- cians have similar views about the outcomes of mathematics competi- tions. The Putnam exam, aimed mainly at undergraduates, is similar in nature to the Eotvos competition. Bruce Reznick, who has been a Put- nam competitor, coach, grader, and question writer, says, “There are also many excellent, influential, and

successful mathematicians who also did badly on the Putnam,” and “there is little hard evidence that doing extremely well is significant” (1994, p. 23). Loren Larson, who has served as the Mathematical Associa- tion of America’s liason to the Put- nam Questions Committee, says the Putnam is unrepresentative of math- ematics, that it “does not purport to measure deep theoretical under- standing . . . or the qualities of mind that are important in mathematical research: perseverance, initiative, powers of concentration over ex- tended periods of time, the ability to ask worthwhile questions that open up new ideas and new vistas, or the ability to frame truly worthwhile abstractions and generalizations”

In short, students often view mathematics exams as work samples of professionals, whereas, for many mathematicians, the attitude toward examinations, even sophisticated ex- aminations, is “This is a test; this is only a test” (Reznick, 1994). Though exams such as the Putnam and the Eotvos competition are an enjoyable experience for some and help to gen- erate interest in mathematics, they are but one means to that end. They do not measure all the abilities im- portant in mathematical research and are not considered important predictors of future success in math- ematics. Furthermore, though their problems are sophisticated and re- quire far more than one minute to solve, the examinations are not con- sidered representative of the disci- pline of mathematics.

Reform College Calculus Courses Reform of college calculus courses reflects concerns of faculty both in mathematics and client departments such as engineering that students lack understanding and express boredom. Traditional calculus syl- labi are packed with numerous dif- ferentiation and integration tech- niques, all of which receive fleeting coverage. In contrast, reformed cal- culus is “lean and lively,” that is, it covers fewer topics in greater depth, focuses less on techniques and algo- rithms, and more one conceptual understanding. Several different re- formed calculus textbooks and courses are now in use) Linn &

(1994, pp. 32-33).

Winter 1996 13

Page 5: Grades or Scores: Predicting Future College Mathematics Peflormance

Kessel, 1996). Some large universi- ties such as Duke and the University of Michigan now offer only reformed calculus courses; others offer both.

To make courses deeper, most ask students to work in groups, write reports, and carry out projects. Some courses feature extensive use of technological tools such as Maple, Mathematica, MathCAD, and ISETL. Many of the activities in the new courses ask for more prolonged at- tention to one problem than is the case in traditional college or high school mathematics courses. An ex- ample of such a task from the calcu- lus project at New Mexico State University is given below.

Your parents are going to knock out the bottom of the entire length of the south wall of their house and turn it into a greenhouse by replacing some bottom portion of the wall by a huge sloped piece of glass (which is expensive). They have already decided they are going to spend a certain fixed amount. The triangular ends of the greenhouse will be made of various materials they already have lying around.

The floor space in the green- house is only considered usable if they can both stand up in it, so part of it will be unusable, but they don’t know how much. Of course this depends on how they configure the greenhouse. They want to choose the dimensions of the greenhouse to get the most us- able floor space in it, but they are at a real loss to know what the di- mensions should be and how much usable space they will get. Fortunately they know you are taking calculus. Amaze them! (Gillman, 1994, p. 124)

Gillman (1994) says this problem “is intended to require reasoning over a period of days and so reward good thinkers rather than good test takers” (p. 124). Calculus students at New Mexico State are also asked to show that the series 1 - 1/2 + 1/3 -1/4 + . . . can be rearranged to converge to any number. Students’ reaction is that it is impossible. Faculty report that deriving the result “revolution- izes [students’] view of mathematics’’ (Gillman, 1994, p. 124).

Assessment New forms of high school assessment in the United States, the Nether- lands, England, Australia, and other

countries make use of portfolios, projects, and examinations requiring solutions to several long problems. Many instantiations of these forms of assessment do not differentially advantage males (Linn & Kessel, 1996). Moreover, these forms of as- sessment reflect the view that sus- tained problem solving is an important aspect of mathematical activity, conveying a more accurate picture of what mathematics is and the kinds of performances valued in reformed mathematics courses.

If these assessments are better ap- proximations of what is required in college mathematics, they may show a smaller gender differential when used as predictors of college perfor- mance. Better learners rather than better test-takers may be successful on these assessments. Finally, re- formed assessments and courses can convey the excitement of mathemat- ics previously reserved for those whose confidence and interest in mathematics could survive tradi- tional courses and examinations.

Note This material is based on research

supported by the National Science Foun- dation under Grant Numbers MDR- 8954753, MDR-9155744, and MDR- 9453861. Any opinions, findings, and conclusions or recommendations ex- pressed in this publication are those of the authors and do not necessarily re- flect the views of the National Science Foundation. This material was partially prepared while one author (Linn) was a Fellow at the Center for Advanced Study in the Behavioral Sciences, with support provided by The Spencer Foundation. Thanks to Dawn Davidson for assistance with the production of this manuscript.

References Bank, B. J. (1995). Gendered accounts:

Undergraduates explain why they seek their bachelor’s degree. Sex Roles, 32(7/8), 527-544.

Benbow, C. P (1992). Academic achieve- ment in mathematics and science of students between ages 13 and 23: are there differences among students in the top one percent of mathematical ability? Journal of Educational Psy-

Bridgeman, B., & Wendler, C. (1989). Prediction of grades in college mathe- matics courses as a component of the placement validity of SATMathemat- ics scores (College Board Report No. 89-91. New York College Entrance Ex- amination Board.

chology, 84(1), 51-60.

Connor K., & Vargyas, E. (1992). The legal implications of gender bias in standardized testing. Berkeley Wom- en’s Law Journal, 7.

Corwin, R. B. (1989). Multiplication as original sin. Journal of Mathematical Behauior, 8,223-225.

FairTest. (Ed.). (1995). 235 schools where SAT and ACT scores are op- tional for admission into bachelor degree programs. Cambridge, MA: Au- thor.

Gallagher, A. M. (in press). Sex differ- ences in problem-solving strategies used by high scoring examinees on the SATM, New York: College Entrance Examination Board.

Gillman, L. (1994). Teaching programs that work. In B. A. Case (Ed.), You’re the professor, what next (Vol. MAA Notes No. 35, pp. 121-127)? Washing- ton, DC: Mathematical Association of America.

Hassett, M., Downs, E, & Jenkins, J. (1994). Mathematics education: A case for placement testing. In B. A. Case (Ed.), You’re the professor, what next (Vol. MAA Notes No. 35, pp. 121-1271? Washington, DC: Mathematical Asso- ciation of America.

Hersh, R., & John-Steiner, Y (1993). A visit to Hungarian mathematics. The Mathematical Intelligencer, 15(2),

Larson, L. (1994). Comments on Rez- nick. In A. Schoenfeld (Ed.), Mathe- matical thinking and problem solving (pp. 30-38). Hillsdale, N J Erlbaum.

Leonard, D., & Jiang, J. (1995, April). Gender bias in the college predictions of the SAT Paper presented at the Annual Meeting of the American Edu- cational Research Association. San Francisco.

Linn, M. C., & Kessel, C. (1996). Success in mathematics: Increasing talent and gender diversity among college ma- jors. In J. Kaput, A. Schoenfeld, & E. Dubinsky (Ed.), Research in collegiate mathematics education (Vol. 2, pp. 101-144). Providence, RI: American Mathematical Society.

National Assessment of Educational Progress. (1988). The mathematics re- port card: Are we measuring up? Trends and achievement based on the 1986 national assessment (No. 17-M- 01). Princeton, N J Educational Test-

13-26.

ing Service. National Science Foundation. (1994).

Women, minorities, and persons with disabilities in science and engineering: 1994 (No. NSF 94-333). Arlington, VA Author.

Reznick, B. (1994). Some thoughts on writing for the Putnam. In A. Schoen- feld (Ed.), Mathematical thinking and

(Continued o n page 38)

14 Educational Measurement: Issues and Practice

Page 6: Grades or Scores: Predicting Future College Mathematics Peflormance

W Michael Shaffer, The College Board Alicia E! Schmitt, Educational Testing Service Lorrie Shepard, University of Colomdo, Boulder Stephen G. Sireci, University of Massachusetts,

Jeffrey K. Smith, Rutgers University I. Leon Smith, Professional Examination Services Richard J. Stiggins, Assessment Training Institute Brenda Sugrue, University of Northern Colorado Walter Vispoel, University of Iowa

Douglas R. Whitney, Regents College, S U m

R. Craig Wood, University of Florida John W Young, Rutgers University Michael J. Zieky, Educational Testing Service

Albany

Amherst

Author Index, Volume 15,1996 Burton, Nancy, No. 4, p. 5 Cizek, Gregory J. No. I, p. 13; No. 2, p. 20 Downing, Steven M. No. I, p. 5 Fan, Meichu. See Ryan, Katherine E. Flexer, Roberta J. See Shepard, Lorrie A. Guo, Fanmin, No. 4, p. 28 Haladyna, Thomas M. See Downing, Steven M. Hiebert, Elfrieda H. See Shepard, Lorrie A. Impara, James C. No. 2, p. 14 Kessel, Cathy, No. 4, p. 10

Lane, Suzanne, No. 4, p. 21 Linn, Marcia C. See Kessel, Cathy Loyd, Douglas E. No. 3, p. 5 Magone, Maria. See Lane, Suzanne Marion, Scott l? See Shepard, Lorrie A. Mayfield, Vicky. See Shepard, Lorrie A. Nitko, Anthony J. See Guo, Fanmin Phelps, Richard €? No. 3, p. 19 Phillips, S. E. No. 2, p. 5 Plake, Barbara S. See Impara, James C.

Ryan, Katherine E., No. 4, p. 15 Shepard, Lorrie A,, Flexer, Roberta A,,

Hiehert, Elfrieda H., Marion, Scott l?, Mayfield, Vicky, & Weston, Timothy J. No. 3, p. 7

Thissen, David. See Wainer, Howard Wainer, Howard No. 1, p. 22 Wang, Ning. See Lane, Suzanne Weston, Timothy J. See Shepard, Lorrie A.

Title Index, Volume 15,1996 Are U. S. Students the Most Heavily Tested on

Earth?, Phelps, No. 3, p. 19 Effects of Introducing Classroom Performance

Assessments on Student Learning, Shepard, Flexer, Hiebert, Marion, Mayfield, and Weston, No. 3, p. 7

Examining Gender DIF on a Multiple-Choice

Grades or Scores: Predicting Future College Mathematics Performance, Kessel and Linn, No. 4, p. 10

Graduate Programs That Prepare Educational Measurement Specialists, Guo and Nitko, No. 4, p. 28

Have Changes in the SAT Affected Women’s

Legal Defensibility of Standards: Issues and Pol- icy Perspectives, Phillips, No. 2, p. 5

Model for Evaluating High-Stakes Testing Pro- grams: Why the Fox Should Not Guard the Chicken Coop, A, Downing and Haladyna, No. 1, p. 5

Scores, Cizek, No. 2, p. 20

ment for Educational Administrators: An In- structional Framework, Impara and Plake, No. 2, p. 14

Standard-Setting Guidelines, Cizek, No. 1, p. 13 Remembering a Past President, Loyd, No. 3, p. 5

NCME lnstmctional Test of Mathematics: A Confirmatory Ap- Mathematics Performance?, Burton, No. 4, Professional Development in Student hsess-

P. 5 How Is Reliability Related to the Quality of Test

Scores? What Is the Effect of Local Depen- dence on Reliability?, Wainer and Thissen, No. 1, p. 22

proach, Ryan and Fan, No. 4, p. 15 Gender-Related Differential Item Functioning on

a Middle-School Mathematics Performance Assessment, Lane, Wang, and Magone, No. 4, p. 21

Grades or Scores test of reading comprehension. Applied differential test functioning. In l? Hol- (Continued from page 14) Psychological Measurement, 5, 159- land & H. Wainer (Eds.), Differential

173. item functioning (pp. 197-240). Hills- problem (PP. 19-29). Mazzeo, J., Schmitt, A. p, & Bleistein, C. dale, N J Erlbaum.

A. (1993). Sex-related performance dzf- Shepard, L. A., Camilli, G., & Williams, N J Erlbaum. ferences on constructed-response and D. M. (1984). Accounting for statistical Rosser, l? (1989). The SAT gender gap:

IdentifVing the causes* Washington, multiple-choice sections of advanced artifacts in item bias research. Journal DC: Center for Studies’ mathematics (Research Report NO. 93- of Educational Statistics, 9, 93-128. Spencer, S. J., & Steele, C. M. (in press). 5). Princeton: Educational Testing Ser- Stanley, J. C. (1991, May). Differences on Under suspicion of inability: Stereo- vice. the College Board achievement tests

Nandakumar, R. (1991). Traditional ver- and the Advanced Placement Exmi- performance. American Psychologist. sus essential unidimensionality. Jour- nations: Effect sizes versus Some Testtakers. (1993). Testtakers, student rial O f Educational Measurement, 28, upper-tail ratios. In The Henry B. and 99-118. Jocelyn Wallace national research sym-

manual. San Francisco: Author. Young, c’ (1980)‘ The repre- Nandakumar, R. (1993). Simultaneous posium on talent development. s p p o -

sium conducted at The Connie Belin

and women’s math

sentation Of geometrical

3(1), 123-144. DIF amplification and cancellation: She&‘-Stout’s test for DIE JoW%al of Educational Measurement, 30, University of Iowa, Iowa city. 293-312.

Journal of Mathematical Behavior, National Center for Gifted Education,

Stout, W (1987). A nonparametic ap- preach for assessing latent trait uni-

(1993, December), Item and test bias: Multidimensional IRT viewpoint and detection via ‘IBTEST. Paper pre- sented at a colloquium at the Univer-

DeP&ment,Urbana. Stout, W, & ROUSSOS, L. (1995). SIB-

TEST user manual. Urbana, IL: Uni- versity of Illinois.

Weinzweig, A. I., & Wilson, J. lV (1977). Suggested tables of specifications for the IEA mathematics tests: Working Paper I . Unpublished manuscript.

Educational Measurement: Issues and Practice

Noddings, N. (1992). Variability: A perni-

tional Research, 62,85-88. O’Neill, K. A., & McPeek, W M. (1993).

Item and test characteristics that are associated with differential item func- tioning. In l? W Holland & H. Wainer (Eds.), Differential item functioning

Second International Mathematics Study. (1985). Summary report for the United States. Champaign, IL: Stipes.

Second Study of Mathematics. (1985). Technical Report 1: United States. Champaign, IL: Stipes.

Shealy, R., & Stout, W (1993). An item response theory model for test bias and

Examining Gender DIF (Continued from page 20) ciOus hflothesis+ O f Educa- dimensionality. Psychometrika, 52,

589-617. matics performance assessment. Paper presented at the Annual Meeting of the American Educational Research Association, San Francisco.

Linn, M. C., & Hyde, J. S. (1989). Gen- der, mathematics, and science. Educa-

Linn, M. C., & Kessel, C. (1995, April). Grades or scores: Predicting college mathematics performance. Paper pre- sented at the Annual Meeting of the American Educational Research Asso- ciation, San Francisco.

Linn, R. L., Levin, M. K, Wardrop, J. L., & Hastings, C. N. (1981). Item bias in a

Stout,

tional Researcher, 18(8), 17-27. (pp. 255-276). Hillsdale, NJ Erlbaum. sity of Illinois, Educational P~YChokY

38