detecting construct-irrelevant variance in an - ets · pdf filedetecting construct-irrelevant...
TRANSCRIPT
Detecting Construct-Irrelevant Variance in an Open-Ended, Computerized Mathematics Task
Ann Gallagher Randy Elliot Bennett
Cara Cahalan
GRE Board Report No. 9513P
October 2000
This report presents the findings of a research project funded by and carried
out under the auspices of the Graduate Record Examinations Board
Educational Testing Service, Princeton, NJ 08541
Researchers are encouraged to express freely their professional judgment. Therefore, points of view or opinions stated in Graduate
Record Examinations Board Reports do not necessarily represent official Graduate Record Examinations Board position or policy.
********************
The Graduate Record Examinations Board and Educational Testing Service are dedicated to the principle of equal opportunity, and their programs,
services, and employment policies are guided by that principle.
EDUCATIONAL TESTING SERVICE, ETS, the ETS logo, the modernized ETS logo, GRADUATE RECORD EXAMINATIONS, and GRE are
registered trademarks of Educational Testing Service.
Educational Testing Service Princeton, NJ 08541
Copyright 0 2000 by Educational Testing Service. All rights reserved.
Abstract
The purpose of this study was to evaluate whether variance due to computer-based presentation
was associated with performance on a new constructed-response type -- Mathematical Expression -- that
requires examinees to build mathematical expressions using a mouse and an on-screen tool palette.
Participants took parallel computer-based and paper-based tests consisting of Mathematical Expression
items, plus a test of their skill in entering and editing data using the computer interface. Comparisons of
mean performance, reliability, speededness, and relations with external indicators were conducted across
the paper-based and computer-based tests; also, computer-based math score was regressed on edit/entry
score after controlling for paper-and-pencil math score and background information. Although no
statistical evidence of construct-irrelevant variance was detected, some examinees reported mechanical
difficulties in responding and indicated a preference for the paper-and-pencil test.
Keywords: Computer-based testing, Item sets, Mathematics, Speededness
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................... 1
Method .......................................................................................................................................................... 3
Participants ....................................................................................................................................... 3
Instruments ....................................................................................................................................... 3
Procedure .......................................................................................................................................... 4
Data Analysis ................................................................................................................................... 4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*..............................................................................*.................... 6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................ 8
Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*.................................................................................. 10
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .......................... 13
Author Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ........................ 14
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................ 15
List of Tables
Table 1. A Mathematical Expression Key and Example Responses . . . . . . . . . . . . ..*..................*........................ 10
Table 2. Means, Standard Deviations, and Coefficient Alpha Reliabilities for Mathematical Expression and Edit/Entry Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..*....................................................... 10
Table 3. Correlations Between the Mathematical Expression Test, Edit/Entry Test, and Other Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............... 11
Table 4. Hierarchical Multiple Regression of Computer-Based Mathematical Expression Scores on Paper-and-Pencil Scores, Background Variables, and Edit/Entry Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Table 5. An Example Paper-and-Pencil Response That Would Not Have Fit in the Computer-Based Mathematical Expression Answer Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ..~............. 12
List of Figures
Figure 1. The Mathematical Expression interface with an example item and a correct response. . . . . . . . . . . . .12
Introduction
One of the promises of computer-based testing is the ability to present examinees with open-
ended tasks that are more like the ones they encounter in academic and work settings (Bennett, 1993).
Mathematical Expression (ME) is one such response type. ME was created as part of an experimental test
for admission to quantitatively oriented graduate programs. This response type can be used with any
question for which the answer is a rational expression, including questions that ask the examinee to
mathematically model a problem situation. ME is particularly exciting because it permits the developers
of computer-based mathematics tests to use automatically storable, open-ended items, the correct
answers to which may take many different surface forms (see Table 1 for an example key and a few
equivalent responses). Because these responses can be scored in real time using symbol manipulation
techniques, ME items can be included in computer-adaptive tests.
In delivering a test on computer, one key concern is fmding a way for examinees to respond that
is insensitive to individual differences in computer familiarity. For open-ended items, the challenge is
particularly complex. By definition, these items require examinees to enter more information and, thus,
could potentially require greater computer skill.
In developing the ME interface, considerable care was taken to keep computer-skill requirements
to a minimum. For example, the interface is completely mouse driven: Examinees build their expressions
by clicking symbols in an on-screen palette (see Figure 1). This strategy circumvents the need for
keyboard facility as well as the problem that some mathematical symbols have no keyboard equivalents.
On the palette, digits and arithmetic operators appear in the standard calculator configuration, which
makes them easy to find.
In addition, the interface provides for exponent and subscript modes, so that users do not have to
enter syntactic markers, such as carats, to denote these positions. The user simply clicks on the Exponent
or Subscript button to make the next number he or she selects appear in the intended position. The
interface also provides graphical displays of complex expressions involving division that use a horizontal
division bar rather than the less visually meaningful slash. The natural, graphical representation of
exponents, subscripts, and division makes it easier for users to parse expressions they have just entered
and minimizes the chances of a mismatch between the system’s interpretation of an expression and the
user’s intention.
To limit construct-irrelevant errors (such as typos), and to facilitate interpretation and scoring, the
M.E interface imposes certain minimal constraints on the entry of expressions. For example, the interface
disables certain buttons on the tool palette based on the entry mode selected. If the user has selected
exponent-mode, for instance, the interface disables the entry of certain mathematical operators, like
multiplication and division, as well as alphabetic characters. Also, when users submit their final answers,
the interface checks these expressions for syntactic correctness and flags those that display
inappropriately juxtaposed operators (e.g., a multiplication symbol followed immediately by a division
symbol), malformed numbers (e.g., a number containing two decimal points), or unbalanced parentheses.
The ME interface obviously requires some orientation. To accomplish this, a brief tutorial is used
to familiarize examinees with the response type prior to taking the test. The tutorial introduces the symbol
palette and demonstrates how examinees can formulate expressions using the Subscript and Exponent
buttons, the variable and constants menu (accessed by pressing the a-z key shown in Figure l), and other
features.
Although every effort was made to design an ME interface that required minimal computer skill,
building an expression with the interface is still a more complex task than writing one with paper and
pencil. For this reason, facility with the ME interface could well produce an unwanted performance
effect. Preliminary evidence provided by Bennett, Steffen, Singley, Morley and Jacquemin (1997) seems
to indicate that ME tasks do not introduce any more construct-irrelevant variance than do other task
types. These investigators compared the functioning of ME items to other computer-delivered item types,
including standard multiple-choice questions, questions requiring entry of numeric values, and questions
asking the examinee to shade portions of a coordinate system. Their results showed that ME items have
roughly the same distribution of difficulty as these other response types. In addition, ME questions had
item-total correlations similar to those for the other items. Third, ME items took no longer to answer than
other constructed-response problems written to measure mathematical modeling skills (though both types
took longer than multiple-choice modeling questions). Finally, ME showed gender differences
comparable to those for the other quantitative questions.
Whereas the data provided by Bennett et al. (1997) are encouraging, they provide only an indirect
evaluation of whether the ME interface introduces irrelevant variance. In the current study, our goal was
to test more directly the hypothesis that individual differences in facility with the ME interface affect
performance on computer-based mathematical tests.
2
Method
Participants
We recruited 226 volunteers from 10 colleges and universities located in different regions of the
United States to participate in this study. Of these individuals, 48 were eliminated because they either
were not enrolled in quantitatively oriented undergraduate majors or they were not close to making the
transition to graduate school. Of the 178 remaining participants, 57% were college seniors and 43% were
first-year graduate students. Thirty-six percent of the participants were women and 79% were U.S.
citizens. The racial/ethnic distribution of the sample was 58% White, 15% Asian American, 13%
Hispanic, 7% other, and 5% Black. Most participants (53%) reported an undergraduate major in
engineering, with the remainder distributed among mathematics (23%), physical science (16%), and
computer science (8%). The majority (47%) indicated an intention to pursue a masters’ degree, while
many (35%) said they would be pursuing doctorates.
Of the 178 students in the sample, 75 (42%) reported a score from the quantitative section of the
Graduate Record Examinations (GREB) General Test. Of these 75 participants, most (71%) were first-
year graduate students, and very likely a more select group than the sample as a whole. The mean score
of those participants reporting GRE scores was 759 (SD = 41), which is substantially above the average
scores for all of their undergraduate fields. For example, in our sample, engineering majors had a mean
GRE quantitative score of 760, whereas in the 1995-96 academic year, students intending graduate study
in engineering scored a mean of 687 (Graduate Record Examinations Board, 1997).
All but one of our participants reported an undergraduate grade-point average (UGPA). UGPA
data were reported in six categories ranging from “Below 1.5” to “3.5-4.0,” with the latter marking the
high end of the scale. Most participants reported a UGPA of either 3.5-4.0 (41%) or 3.0-3.49 (33%).
Instruments
Mathematical Expression test. Two 16-item ME tests were created for the study. These tests were
designed to contain equal proportions of easy and difficult items, based on both mathematics content and
the procedural complexity of entering the response.
Edit/entry test. This computer-based test was designed to measure participants’ skill in using the
ME ‘interface. The test consisted of five editing items and five entry items. Editing items required the
examinee to modify a given mathematical expression to match a given example. Entry items asked the
examinee to enter a given expression. Editing and entry items were designed to cover a range of
difficulty, with emphasis on mathematical expressions that were somewhat more complex than those that
would norrnally appear on an operational mathematical reasoning test.
Questionnaire and interview. Participants also completed a questionnaire about their personal
background, computer experience, perception of the ME tasks, and plans for graduate study. A debriefing
interview was conducted to ensure that important information about the interface was not overlooked and
to respond to any questions or concerns subjects may have had.
Procedure
Each examinee took part in a three-hour session, for which they received $45. All individuals
took both ME tests, one on paper and the other on computer, with one hour allotted for each test. Students
were assigned randomly to one of four order conditions:
l ME test 1 on computer, ME test 2 on paper, edit/entry test
l ME test 2 on computer, ME test 1 on paper, edit/entry test
l ME test 1 on paper, ME test 2 on computer, edit/entry test
l ME test 2 on paper, ME test 1 on computer, edit/entry test
The edit/entry test was administered after the ME tests to avoid providing additional practice to
students before taking the computer-based ME test. The session concluded with the questionnaire and
debriefing interview.
Data Analvsis
To locate evidence of irrelevant variance due to the ME interface, we conducted several analyses.
The f?rst set of analyses was targeted at determining the extent to which the paper-and-pencil ME test
forms were approximately equivalent to their computer-delivered counterparts. To the extent that they
were equivalent, we presumed the case for irrelevant variance would be considerably weakened.
4
To assess equivalence, we first compared coefficient alpha reliabilities across test modes --
computer versus paper-and-pencil -- within each ME test form. Second, we compared mean scores
resulting from different test modes within test forms, and vice versa. For the former comparison, we used
a between-subjects one-way analysis of variance for each test form, with ME scores as the dependent
variable and test mode as the independent variable. For the latter comparison, we used a between-subjects
one-way analysis of variance for each test mode, with ME score as the dependent variable and test form
as the independent variable.
Third, we looked at speededness across test modes within each test form, computing the
proportion of students completing the test and the proportion reaching all but the last item. These
measures are, at best, a very loose approximation of speededness and one that is not precisely comparable
across test modes, because in computer mode, we required examinees to respond to an item before they
could be presented with another item -- something we could not control on the paper version. As a result,
participants’ skipping behavior is readily detected on paper as a blank response; on computer, omits are
less obvious as test takers could skip questions simply by making any response.
Fourth, we compared the pattern of relations of the paper-and-pencil and computer test modes
with other variables, including edit/entry scores, GRE quantitative scores, undergraduate major (coded as
engineering vs. other), gender, and level of education (college senior vs. first-year graduate student).1 For
this and subsequent analyses, we combined ME scores across test forms within computer and paper-and-
pencil test modes to increase statistical power. To achieve this combination, we first standardized
participants’ ME scores for each 16-item test forrn within each mode, and then we collapsed them across
the order conditions.
For our second set of analyses, we used hierarchical multiple regression to examine the extent to
which skill in using the ME interface was directly related to performance on the computer-based ME test.
For this analysis, we used ME score on the computer-delivered test as the dependent variable. We first
entered paper-and-pencil ME score into the equation, followed by background information -- major
(coded as engineering vs. other), level of education (college senior vs. first-year graduate student), and
gender -- to control for any group differences in computer-based ME performance. Finally, we entered
edit/entry score -- our measure of mechanical skill in responding to the computer-based ME test. Here,
’ We used “engineering versus other” for undergraduate major because just over half of our sample indicated an engineering major.
5
we presumed that any significant effect for edit/entry score, after controlling for paper-and-pencil ME
score and background information, would suggest construct-irrelevant variance due to lack of facility
with the ME interface.
Results
Table 2 shows mean performance and coefficient alpha reliabilities for both test forms for both
the computer-based and paper-based ME tests, and for the edit/entry measure. The reliabilities for the ME
tests ranged from .79 to .85, with no indication of differences between the computer-delivered and paper-
and-pencil versions. The reliability of the lo-item edit/entry task was .72.
Analyses of the mean scores showed no performance differences between the computer and
paper test versions (_F [ 1,176] = .55, p > .05 for the first paper-and-pencil form vs. the first computer-
based form; F [ 1,176] = .29, p > .05 for the second paper-and-pencil form vs. the second computer-based -
form). There were mean differences, however, between the two ME paper-and-pencil forms F [ 1,176] =
25.99, p < .OOl), and between the two forms delivered on computer p [ 1, 1761 = 22.48, p < .OOl),
suggesting that one form was harder than the other. Eta-squared was computed for each within-mode
comparison and revealed effect sizes of. 13 and . 11 for paper-based and computer-based tests,
respectively. According to Cohen (1988), these eta-squares are characterized as medium effect sizes.
With respect to timing, 98% of those taking the paper version of ME test 1 finished the test,
compared with 85% of those taking that test on computer, a statistically significant difference @ = 3.06,~
< .Ol). For ME test 2, 85% completed the paper version and 90% finished the computer version, which
was not a significant difference @= -.91, p > .05). Regarding the percentages of participants who reached
all but the last item on each test, the differences were significant for both ME forms, but in opposite
directions. For ME test 1, 100% of those taking the paper version reached the next to last item, while 93%
of those taking the computerized test went that far @= 2.53@ .05). For ME test 2, 87% of examinees
taking the paper-and-pencil test completed the penultimate question, while 96% of those taking the
computer-based test did so @= -2.12, p < .05).
Table 3 shows correlations found among ME score, edit/entry test score, and various external
criteria after combining the standardized scores on the two ME forms. The observed correlation between
the ME paper-based and computer-based scores was .78; corrected for attenuation, that value was .97,
6
suggesting that the two modes were measuring the same construct. 2 Consistent with this suggestion is
that the ME computer-based and paper-based scores also showed the same pattern of relations with
external criteria; no statistically significant differences were found between the correlation of the ME
computer-delivered test with any given external variable, and the correlation of the ME paper-and-pencil
test with the same external variable Q, range = -.40 to 1.69, dfrange = 72 to 175). Both ME versions were -
significantly related to UGPA, GRE quantitative score, gender, and level of education. Similarly, both
ME tests were unrelated to the edit/entry test or to undergraduate major. Finally, the edit/entry test was
unrelated to any measure of accomplishment -- GRE quantitative score, UGPA, or level of educational
level -- suggesting that, although reliable, the construct it measured was generally irrelevant to academic
study.
Table 4 presents the results of regressing computer-based ME score on paper-based ME score,
background variables, and the edit/entry test. The paper-based ME score accounted for 6 1% of the
variance in computer-delivered ME score (F [ 1,176] = 272.92, p < .OOl). Adding the background
information accounted for another 3% of the variance. Finally, and most importantly, no significant
variance was attributable to the edit/entry measure.3
Compiled responses to the ME interface questionnaire can be found in the Appendix. With
respect to computer familiarity, all participants indicated using a computer almost daily, and all but one
indicated almost always using a mouse. Regarding the computer-based format, 57% found it easy to use
the computer to take the ME test, 42% found it somewhat difficult, and 2% thought it was very difficult.
Of those who found it somewhat or very difficult, the difficulty cited by the largest portion of participants
(29%) was that the on-screen palette was hard to use. When asked if they had difficulty entering
fractions, exponents/subscripts, or expressions involving square roots, 48% said that they had no
difficulty with any of these functions, but 30% cited problems with entering fractions.
2 The correction for attenuation requires a reliability for each measure and the correlation between the two measures. Because there were two paper-based ME forms and two computer-based ME forms, we estimated a reliability for the two paper-and-pencil measures by taking the (geometric) mean of their coefficient alpha reliabilities, and then estimated a reliability for the two computer-delivered measures in the same way. To estimate the relationship between the computer-delivered and paper-and-pencil measures, we computed the paper-computer correlation for each of the four administration orders and then took the mean of these four values using the r-to-z transformation. 3 We reran this regression including participants who had been eliminated because they either did not have quantitatively oriented undergraduate majors or they were not close to making the transition to graduate school. Even with this larger and more diverse sample (n = 2 19), the results were substantively identical to those presented - here.
Polled as to whether they would prefer to take an ME test on computer or paper, 77% opted for
paper-and-pencil and only 7% chose computer. Consistent with this preference, 44% percent of
participants felt that taking the test on computer was more tiring than taking it on paper, compared with
15% who found it more tiring on paper and 41% who believed the two modes were equivalent. Finally,
48% thought that, had the test been real, they would have been more anxious about taking the computer-
delivered test than they would the paper-and-pencil form; 43 percent would have felt about as anxious
either way, and only 8% would have felt less anxious with the computer version.
Conclusion
This study found no strong evidence to support the hypothesis that individual differences in
facility with the ME computer interface would affect performance on open-ended, computerized
mathematics tasks. Mean performance, reliability, and relations with other variables were closely similar
for both paper-and-pencil and computerized test modes. Although one computer-based test form appeared
speeded relative to its paper-and-pencil counterpart, the reverse was true for the second test form,
weakening any claim that speededness might be a result of lack of interface familiarity. Regression
results also showed no signs of irrelevant variance connected with the ME interface. Our edit/entry test
added nothing to the prediction of computer-based mathematical performance and, indeed, had about the
same level of zero-order relationship to the computer-based ME test as it did to the paper-and-pencil one.
These results complement the indirect evidence, reported by Bennett et al. (1997), that ME items function
similarly to other computer-based response types (including multiple-choice) written to test advanced
mathematical content.
Whereas the statistical evidence does not support the presence of an interface competency effect,
examinee perceptions did suggest that the interface was not always easy to use. This perception came
through most clearly with respect to the use of the on-screen palette, the method by which examinees
create mathematical expressions.‘Using this palette is clearly more time-consuming and cumbersome than
writing an expression by hand, especially if the expression is a complex one.
To better understand this phenomenon, we retrospectively sampled examinee paper-and-pencil
responses and then tried to enter them on computer, finding that some paper responses were, in fact, too
long for the on-screen answer box (see Table 5). We suppose that some examinees tried to enter such
expressions on the computer-based ME test, but were forced to reformulate them to make them fit the
8
required frame. If this is so, these individuals were able to complete this reformulation quickly enough to
avoid a negative impact on their scores (which we otherwise should have detected in our statistical
analyses). With more stringent time limits than those imposed here, however, an effect might well have
appeared.
The fact that some students had difficulty with the interface suggests that we should continue our
efforts to improve it, or at least that we should make sure time limits are generous enough to allow for the
mechanics of responding using the interface. In the end, however, it is hard to envision a mouse-driven
interface that is as natural for entering mathematical expressions as paper and pencil. Given that, the ideal
solution may be handwriting the expression on some digital surface that recognizes free-form symbolic
input and that is connected to the computing device on which the testing software resides. This concept is
evident in today’s personal digital assistants, which recognize a form of textual entry.
While the current findings provide some insights, this study had several limitations. First, the
sample size was relatively small, so marginal effects could not easily be detected. Second, for those who
did report GRE quantitative scores, the mean was unusually high. Thus, our findings may not be
generalizable to students with lower mathematical ability levels; such students might experience greater
difficulty with the ME interface. Third, our failure to prove the irrelevant variance hypothesis does not
confirm that such contamination is absent, as the null hypothesis cannot be proven.
Finally, this study needs to be viewed as one part of a larger validation program. The study is
meaningful only in the context of theoretical rationales and empirical results that converge to support a
larger validity argument (Messick, 1989). As a response type, ME is characteristic of a growing class of
open-ended computer-based tasks. The larger validity argument for these tasks begins with the contention
that, by their open-ended nature, they replicate some of the complexity inherent in the problems
encountered in academic and work settings. At the same time, however, our renditions of these tasks can
add irrelevant complexity in, among other things, the way we structure the human-computer interaction.
This research highlights the need to approach with care how we render those tasks and illustrates one
method of monitoring the success of our development efforts.
Tables and Figures
Table 1. A Mathematical Expression Key and Example Responses
Mathematical Expression key
cm - 2P)@ - 2p) 4
Some example correct responses
(n-2P)(m -2P) 4
.25(-2p + m)(-2p + n)
p2-pnl2-pm/2+mnl4
Table 2. Means, Standard Deviations, and Coefficient Alpha Reliabilities for Mathematical Expression and Edit/Entry Tests
Test Mean Standard deviation
Coefficient alvha
ME test 1
Computer-based 10.07 4.09 .85
Paper-based 10.5 1 3.83 .83
ME test 2
Computer-based 7.34 3.58 .79
Paper-based 7.63 3.70 .80
Edit/entry test 6.29 2.56 .72
Note. Each Mathematical Expression (ME) test contained 16 items. The edit/entry test included 10 items. Eighty-nine participants took each ME test, while all 178 participants took the edit/entry test.
10
Table 3. Correlations Between the Mathematical Expression Test, the Edit/Entry Test, and Other Variables
ME -- Edit/
paper entry version test
UGPA GRE Level of
quantitative Undergraduate
major Gender
education score
ME -- computer version
.78** .08 .46”” .55** -.ll .25** .41**
ME -- paper .lO .43** .52** -.13 S7” .36** version
Edit/entry test
UGPA
.09 .18 -.08 .09 .12
GRE quantitative score
-.03 .15 .26*
Undergraduate major
Gender
-.14 .13
.21**
Note. All correlations are based on a sample size of 177-178, except for those with GRE quantitative score, which are based on 75 participants. Undergraduate major was coded as engineering (0) versus other (1). Gender was coded as female (0) versus male (1). Level of education was coded as college senior (0) versus first-year graduate student (1). * p < .05 ““p<.Ol
Table 4. Hierarchical Multiple Regression of Computer-Based Mathematical Expression Scores on Paper-and-Pencil Scores, Background Variables, and Edit/Entry Test*
Block and independent variable R2 Increment in R2 - -
1. ME -- paper version .61*** .61***
2. Background data
Gender
Undergraduate major
Level of Education
3. Edit/entry test
.64*** .03**
_64*“* .oo
* n = 178 **-p < .Ol *** p < .OOl
11
Table 5. An Example Paper-and-Pencil Response That Would Not Have Fit in the Computer-Based Mathematical Expression Answer Box
Example paper response
( c c,+c2+c3+c4 2
I
_ cI+c2+c3+cP j2 + ( c _cI+cz+c3+c4 2+ c _c1+c*+c3+c4 2 2 ) ( 3
4
4 4
) +(c*- 4 )
Figure 1. The Mathematical Expression interface with an example item and a correct response. Copyright (c) Educational Testing Service, 1996.
12
References
Bennett, R. E. (1993). On the meanings of constructed response. In R. E. Bennett & W. C. Ward (Eds.), Construction vs. choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment (pp. l-27). Hillsdale, NJ: Lawrence Erlbaum Associates.
Bennett, R. E., Steffen, M., Singley, M. K., Morley, M., & Jacquemin, D. (1997). Evaluating an automatically storable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests. Journal of Educational Measurement, 34 163- 177. -’
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Graduate Record Examinations Board. (1997). Sex, race, ethnic& and performance on the GRE General Test: A Technical Report. Princeton, N.J.: Educational Testing Service.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.). New York: MacMillan.
13
Author Note
Correspondence concerning this article should be addressed to Ann Gallagher, MS 17R,
Educational Testing Service, Princeton, NJ 0854 1; or [email protected].
14
Appendix
Mathematical Expression Interface Questionnaire
15
The following set of questions ask about your reaction to the computer administration of the
Mathematical Expression items. N
1. In answering the questions on this test, how easy was it to use the computer?
1. Easy (go to question 3) 101
2. Somewhat difficult 74
3. Very difficult 3
2. If you found it “Somewhat d@cult” or “Very difficult” to use the computer:
(Circle all that apply.)
1. The computer screens were confusing (difficult to interpret).
2. The on-screen keyboard made it difficult to change or enter my answer.
3. The mouse was hard to use.
4. Any other problems:
%
56.7
41.6
1.7
13 7.3
52 29.2
14 7.9
a.
b.
C.
d.
e.
f.
g*
h.
i.
j.
k.
Tool was slow 16
Rather skip problems than review
Wanted hints to solve
Usually write directly on problem
Required extra time to copy from paper to computer
Hand-eye coordination problem
Restrictions on format
Rather use real keyboard
Tiring on eyes
Difficult reading from screen and paper
Wished for scrap paper
41
1
8
11
1
14
22
1
2
1
3. Did you have dlflculties entering any of the following?(Circle all that apply.)
1. I had no difficulties entering any of the following.
2. Fractions
3. Exponents and/or subscripts
4. Expressions involving square roots
9.0
23.0
0.6
4.5
6.2
0.6
7.9
12.4
0.6
1.1
0.6
86 48.3
54 30.3
31 17.4
20 11.2
16
N % 4. Did you have any trouble seeing the words and/or symbols on the screen?
1. No (go to question 6) 168 94.4
2. Yes 9 5.1
3. Did not answer 1 0.6
5. Ifyou answered “Yes, ” which of the following were problems for you?
(Circle all that apply.)
1. The size of the type
2. Too many words on each screen
3. The contrast (the brightness of the letters against a dark background)
4. The lighting in the room causing glare on the screen
6. Which statement(s) reflect your reaction(s) to the explanation of how to use
the mouse? (Circle all that apply.)
1. Adequate explanation; I wouldn’t change it
2. I already knew the information presented from past experience
3. Too long, too much information
4. Too little opportunity to practice
5. Information is not clear
7. If you could take a computer test or a paper-and-pencil test that covered
the same material as this test, which would you prefer to take?
1. Computer test
2. No preference; either is fine
3. Paper-and-pencil test
8. Compared to answering paper-and-pencil questions of the same length, these
computerized questions were.
1. Less tiring than answering paper-and-pencil questions
2. About as tiring as answering paper-and-pencil questions
3. More tiring than answering paper-and-pencil questions
2 1.1
3 1.7
5 2.8
4 2.2
40 22.5
137 77.0
22 12.4
4 2.2
4 2.2
13 7.3
28 15.7
137 77.0
27 15.2
73 41.0
78 43.8
17
N % 9. If this computer-based test had been a real test (one that counted), how anxious
would you have been compared with taking a real paper-and-pencil test?
1. Less anxious than taking a paper-and-pencil test 15 8.4
2. About as anxious as taking a paper-and pencil test 77 43.3
3. More anxious than taking a paper-and-pencil test 86 48.3
The next set of questions asks about your computer experience.
9. Have you used a computer before?
1. Yes
2. No (go to page 4, the “Background Questions” section.)
10. For what kinds of activities do you use a computer? (Circle all that apply.)
1. Graphics
2. Games
3. Statistical Analysis
4. Spreadsheets
5. Database Management
6. Word Processing
7. Other (programming)
11. For which of the following have you used a computer? (Circle all that apply.)
1. School
2. Work
3. Personal
4. Hobbies
12. How often do you use a computer?
1. Routinely (almost daily use)
2. Regularly (some time each week)
3. Rarely (only a few times in the last five years)
177 99.4
1 0.6
122 68.5
135 78.8
87 48.9
126 70.8
49 27.5
165 92.7
134 75.3
173 97.2
143 80.3
157 88.2
101 56.7
178 100.0
0 0.0
0 0.0
18
N % 13. When you use a computer, how often do you use a mouse?
1. Routinely (almost always uses the mouse to perform functions) 177 99.4
2. Regularly (sometime uses the mouse to perform functions) 1 0.6
3. Rarely (usually uses the keyboard to perform functions) 0 0.0
4. Never 0 0.0
14. Do you own a personal computer?
1. Yes, IBM/IBM Compatible 93
2. Yes, Mac/Apple 13
3. Yes, Other 3
4. No 69
15. If you answered “No” to the previous question, do you have a personal
computer available for your use?
1. Yes
2. No
3. Did not answer
52.2
7.3
1.7
38.8
62
7
109
34.8
3.9
61.2
19
Background Questions
I. Gender
1. Male
2. Female
2. Do you understand English as well as or better than any other language?
1. Yes
2. No
3. How do you describe yourself?
1. African-American/Afro-American/Black (non-Hispanic)
2. American Indian/Native American/Alaska Native
3. Asian American/Pacific American/Pacific Islander American
4. Caucasian/White (non-Hispanic)
5. Hispanic/Latino/Chicano/Mexican American/Puerto Rican
6. Other
7. Did not answer
4. Are you a U.S. citizen or resident alien?
1. Yes
2. No
5. Please indicate any permanent disabilities you have (circle all that apply)
N %
64 36.0
114 64.0
160 89.9
18 10.1
9 5.1
0 0.0
27 15.2
103 57.9
23 12.9
13 7.3
3 1.7
140 78.7
38 21.3
1. None 130 73.0
2. Physical disability 1 0.6
3. Learning disability 1 0.6
4. Deafness or other hearing impairment 0 0.0
5. Visual impairment (other than blindness) including glasses or contact lenses 43 24.2
6. Blindness 0 0.0
7. Did not answer 3 1.7
20
N %
6. What is your current educational status?
1. Senior 101 56.7
2. First-year graduate student 69 38.8
3. Summer after senior year 8 4.5
7. Undergraduate Major
1.
2.
3.
4.
9.
Electrical Engineering
Chemical Engineering
Mechanical Engineering
Civil Engineering
Industrial Engineering
Other Engineering
Computer Science
Mathematics
Physical Sciences
8. What is your overall undergraduate grade point average to date?
1.
2.
3.
4.
5.
6.
7.
(based on a system where 4.0 = A)
3.5 - 4.0
3.0 - 3.49
2.5 - 2.99
2.0 - 2.49
1.5 - 1.99
Below 1.5
Did not answer
9. Are you a graduate student or do you plan to apply to graduate school?
1. Yes
2. No (go to question 11)
27
13
23
14
4
13
15
40
29
15.2
7.3
12.9
7.9
2.2
7.3
8.4
22.5
16.3
73 41.0
59 33.1
35 19.7
9 5.1
1 0.6
0 0.0
1 0.6
150 84.3
28 15.7
21
N % 10. If YES, in which of the following major fields.
1. Electrical Engineering
2. Chemical Engineering
3. Mechanical Engineering
4. Civil Engineering
5. Industrial Engineering
6. Other Engineering
7. Computer Science
8. Mathematics
9. Physical Sciences
22
7
13
9
8
10
13
25
24
12.4
3.9
7.3
5.1
4.5
5.6
7.3
14.0
13.5
10. Biological Sciences 4 2.2
11. Economics 1 0.6
12. Other 14 7.9
13. Did not answer 28 15.7
N % I I. If you plan to apply to graduate school, which graduate degree will you seek?
1. Masters degree 83 46.6
2. Doctoral degree 63 35.4
3. Did not answer 32 18.0
22