selection testing via the internet: practical considerations and exploratory empirical findings

32
PERSONNEL PSYCHOLOGY 2004, 57, 1003–1034 SELECTION TESTING VIA THE INTERNET: PRACTICAL CONSIDERATIONS AND EXPLORATORY EMPIRICAL FINDINGS DENISE POTOSKY Great Valley School of Graduate Professional Studies The Pennsylvania State University PHILIP BOBKO Department of Management Gettysburg College Despite a growing body of applied research on using the Internet for some human resource management practices, few studies have provided equivalence information or practical lessons concerning selection testing via the Internet. We identify several issues associated with measurement and validity, the role of several individual characteristics, respondents’ reactions and behaviors, and other considerations concerning Internet test administration. We also report results from an exploratory study of the correlation between paper-and-pencil and Internet-administered cognitively oriented selection tests (including timed and untimed, proc- tored tests). Our empirical results suggest modest degrees of cross-mode equivalence for an untimed situational judgment test (r = .84) and for a timed cognitive ability test (r = .60). Further, some types of items (math, verbal, spatial) in the timed cognitive ability test seem to play a differential role in the reduced cross-mode equivalence. New issues regarding the perception of, and reaction to, items presented via the In- ternet are presented, and a variety of practical issues are derived and discussed. A small but steadily growing body of research has focused on mea- surement issues related to Internet applications used in human resource management. In his recent review of research on new technology used in selection, Anderson (2003) noted that “the most central question (in terms of equivalence) is whether use of new technology produces the same quantity and quality of applicants for an organization” (p. 126). Few studies, however, have examined the equivalence of Internet-administered The authors gratefully acknowledge the support of Penn State Great Valley School of Graduate Professional Studies throughout this research project. We also thank Carol Kalman, Sihar Oz, Chris Bobko, Katie Fitzgerald, Bill Gill, and Juris Piece for their assis- tance during study administration and data collection. Correspondence and request for reprints should be addressed to Denise Potosky, The Pennsylvania State University, Great Valley School of Graduate Professional Studies, 30 E. Swedesford Rd., Malvern, PA 19355; [email protected]. COPYRIGHT C 2004 BLACKWELL PUBLISHING, INC. 1003

Upload: denise-potosky

Post on 21-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

PERSONNEL PSYCHOLOGY2004, 57, 1003–1034

SELECTION TESTING VIA THE INTERNET:PRACTICAL CONSIDERATIONS ANDEXPLORATORY EMPIRICAL FINDINGS

DENISE POTOSKYGreat Valley School of Graduate Professional Studies

The Pennsylvania State University

PHILIP BOBKODepartment of Management

Gettysburg College

Despite a growing body of applied research on using the Internet forsome human resource management practices, few studies have providedequivalence information or practical lessons concerning selection testingvia the Internet. We identify several issues associated with measurementand validity, the role of several individual characteristics, respondents’reactions and behaviors, and other considerations concerning Internettest administration. We also report results from an exploratory studyof the correlation between paper-and-pencil and Internet-administeredcognitively oriented selection tests (including timed and untimed, proc-tored tests). Our empirical results suggest modest degrees of cross-modeequivalence for an untimed situational judgment test (r = .84) and fora timed cognitive ability test (r = .60). Further, some types of items(math, verbal, spatial) in the timed cognitive ability test seem to playa differential role in the reduced cross-mode equivalence. New issuesregarding the perception of, and reaction to, items presented via the In-ternet are presented, and a variety of practical issues are derived anddiscussed.

A small but steadily growing body of research has focused on mea-surement issues related to Internet applications used in human resourcemanagement. In his recent review of research on new technology usedin selection, Anderson (2003) noted that “the most central question (interms of equivalence) is whether use of new technology produces thesame quantity and quality of applicants for an organization” (p. 126). Fewstudies, however, have examined the equivalence of Internet-administered

The authors gratefully acknowledge the support of Penn State Great Valley Schoolof Graduate Professional Studies throughout this research project. We also thank CarolKalman, Sihar Oz, Chris Bobko, Katie Fitzgerald, Bill Gill, and Juris Piece for their assis-tance during study administration and data collection.

Correspondence and request for reprints should be addressed to Denise Potosky, ThePennsylvania State University, Great Valley School of Graduate Professional Studies, 30 E.Swedesford Rd., Malvern, PA 19355; [email protected].

COPYRIGHT C© 2004 BLACKWELL PUBLISHING, INC.

1003

1004 PERSONNEL PSYCHOLOGY

and traditional tests, and there is a notable lack of published research thatprovides the kind of equivalence information most relevant to top-down se-lection decisions. Only one study to date has reported the Internet versuspaper-and-pencil cross-mode correlation for a personality test (Salgado& Moscoso, 2003), and no studies have reported the cross-mode corre-lation for cognitive ability tests and/or other tests common to selectionpractices.

The shortage of published research that includes equivalence estimatesfor selection tests administered via the Internet has not hindered the de-velopment of Web-based applicant screening tools and services. Nor hasit lessened the demand for Web-based selection tools or the need for prac-tical guidance for implementing selection processes over the Internet. Thetechnological capabilities for recruiting and screening applicants over theInternet are continually improving (see Jones & Dages, 2003), and test pub-lishers have been developing Internet versions of many commonly usedselection tests. Organizations that post job openings and recruit applicantsover the Internet might anticipate enhanced convenience, cost effective-ness, and efficiency (Jones & Dages, 2003), provided that applicants canbe effectively (and securely) evaluated via the Internet. Individuals whopost their resumes and apply for jobs over the Internet might also appreci-ate the potentially expedited selection process that Internet-based testingcan offer.

This study, involving paper-and-pencil and Internet administrationof two cognitively oriented tests to a working adult sample in a sim-ulated selection context, had several purposes. Based on both a quan-titative and qualitative analysis, we wanted to identify emergent issuesassociated with measurement and validity, the role of certain individ-ual characteristics in Web-based testing, respondents’ reactions and test-taking behaviors regarding Internet-based tests, and practical mattersassociated with Internet-based test administration. Given that researchto date has not reported cross-mode correlations for cognitive abil-ity or situational judgment tests, we also wanted to report the cross-mode correlations as well as the mean score comparisons that wefound.

Several aspects surrounding the equivalence and administrative issuesassociated with Web-based tests were anticipated based upon availableWeb-based selection research as well as research that examined othercomputerized tests. We provide a brief review of this literature in order toframe our research questions for the cognitively oriented selection testsused in this study. We also present an array of broader issues concerningthe individual characteristics of computer and Internet users that may bea factor in performance on Web-based tests.

POTOSKY AND BOBKO 1005

Research on the Cross-Mode Equivalence of Internetand Paper-and-Pencil Tests

The majority of studies focused on Internet-administered measures todate have been concerned with survey administration and data collectionvia electronic mail and the World Wide Web (e.g., Bruvold, Comer, &Rospert, 1990; Cho & LaRose, 1999; Foster Thompson, Surface, Martin,& Sanders, 2003; Schmidt, 1997; Simsek & Veiga, 2001; Sproull, 1986;Stanton, 1998; Stanton & Rogelberg, 2001a; Tse, 1998). Recent literaturereviews by Anderson (2003) and Lievens and Harris (2003) have focusedon issuesmore directly related to human resource selection, and this litera-ture includes studies of respondents’ reactions to Internet recruitment andselection practices (e.g., Chapman & Webster, 2003; Harris, Van Hoye,& Lievans, 2003; Reynolds, Sinar, & McClough, 2000; Stanton, 1998;Stanton & Rogelberg, 2001b). Very few studies have focused on selectiontest score equivalence issues.

Untimed selection tests. The research literature on the equivalence ofInternet and paper-and-pencil selection tests consists of a limited numberof equivalence studies that used a between-subjects study design in order tocompare cross-mode means, distributions of scores, and factor structuresfor untimed tests (Buchanan & Smith, 1999; Cronk & West, 2002; Davis,1999; McManus & Ferguson, 2003; Pasveer & Ellard, 1998; Ployhart,Weekley, Holtz, & Kemp, 2003). We found only one equivalence studythat incorporated a repeated measures design to report cross-mode (Web-based vs. paper-and-pencil) correlations for a measure commonly used inselection practices, that is, the Big Five personality dimensions (Salgado &Moscoso, 2003). The distinction between repeated measures and between-group comparisons is important because between-group comparisons areless informative to issues concerned with test equivalence and top-downselection procedures in which rank order is a primary concern (Mead &Drasgow, 1993; Potosky & Bobko, 1997). Still, given the paucity of se-lection research focused on Internet versus paper-and-pencil equivalence,results from both types of comparisons helped us to establish our expec-tations for the current investigation.

Equivalence results based upon between-group comparisons of un-timed Internet versus paper-and-pencil tests are somewhat mixed. In onestudy, for example, Pasveer and Ellard (1998) found similar factor struc-tures and internal consistency estimates across modes of administrationbut a smaller mean and standard deviation in the Web-based version of theself-trust scale administered. Buchanan and Smith (1999) compared 963Internet-based responses to 224 traditional, paper-and-pencil responses byundergraduate volunteers to a revised version of Snyder’s Self-Monitoring

1006 PERSONNEL PSYCHOLOGY

Scale (SMS-R; Gangstead & Snyder, 1985). Their results showed similarfactor structures and a smaller mean score, but a larger standard deviationand internal consistency estimate, for the Web-based version of the scale.In another study, Ployhart et al. (2003) compared scores on personality,biodata, and situational judgment measures administered to job applicantsvia the Internet (N = 2356) versus paper-and-pencil versions adminis-tered to job applicants (N = 2544) and to job incumbents (N = 425).Their results also showed smaller means and larger internal consistencyestimates in the Web-based versions of all measures. In addition, in theirstudy the variances for the conscientiousness personality measure and thesituational judgment test were greater in the Web-based applicant groupthan for either paper-and-pencil group. Davis (1999) reported a largerinternal consistency estimate, but a larger mean and a smaller standarddeviation, for the Web-based version of the Ruminative Responses Scale.Cronk and West (2002) also reported a larger mean in a Web-administeredscale, in this case the Visions of Morality scale, but results regarding thestandard deviation comparisons varied according to where respondentstook the Web-based version of the scale. In sum, results from the smallnumber of studies employing between-groups comparisons suggest thatfactor structures and internal consistency estimates are probably not sub-stantially altered when untimed tests are administered via the Internet.Results pertaining to mean scores and the distribution of scores are lessconclusive.

In the one published repeated measures study noted above, Salgadoand Moscoso (2003) reported high correlations (greater than or equal to.93) for each of the five factors of the Big Five personality dimensions ad-ministered to a sample of 162 undergraduate students enrolled in a Spanishuniversity. The high correlations reported by Salgado and Moscoso are en-couraging, especially in light of the fact that prior research has establishedhigh degrees of equivalence between computerized and paper-and-penciluntimed noncognitive tests (e.g., personality tests [King & Miles, 1995;Potosky & Bobko, 1997] and attitudinal measures [Donovan, Drasgow,& Probst, 2000]). Overall, we anticipated high correspondence betweenresponses to an untimed, Internet-administered situational judgment testand its paper-and-pencil counterpart.

Timed selection tests. This study explored relatively uncharted terri-tory with respect to timed selection tests. In fact, although prior researchhas examined the equivalence of timed computerized versus paper-and-pencil selection tests, we found no published equivalence studies thatreport the cross-mode correlation for specifically for timed Internet-administered tests. As a result, we relied upon what we know from thecomputerized testing literature to form our expectations for the currentinvestigation.

POTOSKY AND BOBKO 1007

The research literature that has compared computerized versus paper-and-pencil cognitive ability tests and other timed tests suggests less thanperfect cross-mode equivalence (Burke & Normand, 1987; Greaud &Green, 1986; Mead & Drasgow, 1993; Silver & Bennett, 1987; Van deVijver & Harsveld, 1994). For example, Van de Vijver and Harsveld(1994) reported quicker but less accurate responding for a computerizedversion of the General Aptitude Test Battery (GATB). However, Meadand Drasgow (1993) reported a meta-analytic, disattenuated cross-modecorrelation of .97 for timed power tests (i.e., assessment of ability sub-ject to some practical time limitation) and .72 for speeded tests (i.e.,measures of how quickly one can answer fairly straightforward and rela-tively homogeneous items). Overall, research on cross-mode equivalencefor computerized tests has suggested that if testing procedures look andfeel the same, results across the different modes of administration can bevery similar (Booth-Kewley, Edwards, & Rosenfeld, 1992; King & Miles,1995).

It is conceivable that results associated with timed Internet-administered tests might not replicate results for all other types ofcomputer-administered tests. For example, the Internet test administrationplatform used in this study did not download test items to the computersused by study participants, but instead, everything occurred over the In-ternet. As a result, the Web technology used to display test items in thisstudy required pages (i.e., screens) to load during testing. “Load speed”depends not only upon the processing speed and performance factors ofeach computer used to take the test but also upon the method of connectionto the Internet. In addition, the speed/capacity of the Internet server wherethe test resides can be affected by the volume of information exchange atthe time of testing.

In this study, participants were presented with the same test items, inthe same order, for each version (paper-and-pencil and Internet). In addi-tion, study participants were able to skip items and to backtrack within thetest at their discretion for both modes of administration. Given these effortsto create an Internet test that looked very similar to the paper-and-pencilversion, as well as the fact that there was little time between repeatedadministrations of the otherwise identical test items, we expected at leasta moderate, if not high, degree of cross-mode correlation. On the otherhand, in light of our understanding that Internet administration needs toadopt different conventions for keeping time and for displaying test items,we did not anticipate perfect cross-mode correlations (e.g., a disattenu-ated r = 1.00) for the timed cognitive ability test administered in thisstudy.

Item differences. Despite the fact that the items were the same acrossmodes of administration, we contemplated the potential consequences

1008 PERSONNEL PSYCHOLOGY

associated with viewing different types of items on the computer screenrather than on paper—a topic that has not been addressed in thecomputerized testing literature. For example, because spatial reasoningproblems typically require respondents to look at a picture and decipherwhat they see, a block counting task incorporated into the timed cognitiveability test in our study was potentially perceptually more interactive thanthe task of reading words or numbers. Indeed, when the researchers tookeach version of the tests prior to our study, we found ourselves wanting touse a pencil to count blocks and to estimate hidden blocks in the pictures.It was not possible to make marks on the computer screen as a problem-solving aid, and allowing a pencil and scratch paper in the Internet modewould impose a difference in the prescribed testing protocol.

We also considered potential differences in how test takers might re-spond to arithmetic reasoning items versus vocabulary items presented inthe timed cognitive ability test. On one hand, reading vocabulary words ona computer screen should be an ordinary task for individuals who wouldapply for jobs over the Internet and/or who would agree to participate injob screening procedures administered via the Internet. Perhaps vocabu-lary items would produce the highest cross-mode correlation of the typesof items presented via the Internet. On the other hand, we also consid-ered the potential for math ability items to have the highest cross-modecorrelation. The act of calculating an answer may add to respondents’decisiveness, especially upon finding their answer in the multiple-choicelist of responses for that item. That is, after putting forth the effort tocalculate an answer, it seemed likely that respondents would remembertheir response to each item from one mode of test administration to theother (i.e., enhanced recall effects). In addition, responding to arithmeticreasoning items seems less subjective than responding to the vocabularyitems, which may be more open to interpretation than numeric informa-tion. When presented with four choices, a respondent might find two wordsthat could mean the same as (or the opposite of ) the focal word and thenguess or perhaps add some context for selecting the best response. Forthe math problems, there was one correct answer, and incorrect responseswere more likely due to mistakes than to alternative interpretations. Inlight of these competing considerations, we did not make an a priori dif-ferential prediction about the cross-mode rs for the vocabulary and mathsubscales—although as suggested in the prior paragraph, we did expectthe cross-mode correlation to be lowest for the spatial reasoning subscale.

Individual Characteristics and Performance on Web-Based Tests

A large body of research has been conducted on the characteristics ofcomputer users. From this research, we anticipated that certain individual

POTOSKY AND BOBKO 1009

characteristics (e.g., computer experience, computer playfulness, beliefsabout computers) may play a role in the way that individuals respond toWeb-administered tests. For example, it has been reported that individuals’understanding of and experience with using computers is related to theirattitudes towards computers (Potosky & Bobko, 1998; 2001) and to theplayfulness with which they approach computers (Webster & Martocchio,1992). In a recent study of undergraduate students’ perceptions of a com-puterized in-basket selection exercise, computer experience was shown toinfluence individuals’ perceptions of the selection process (Wiechmann &Ryan, 2003). Because Internet-based testing requires at least minimal useof computers, we included measures of computer-related characteristicsin our study. However, because the “point and click” Web-based tests inthis study were self-administered and easy to use, we did not expect thatindividuals with more computer experience would necessarily performbetter on the Web-based tests.

Computer self-efficacy beliefs have been shown to be positively re-lated to reactions to computers (Compeau & Higgins, 1995; Webster &Martocchio, 1995) and to performance using computers (Gist, Schwoerer,& Rosen, 1989; Potosky, 2002). Derived from the social cognitive the-ory of self-regulation (Bandura, 1991; 1997), self-efficacy refers to anindividual’s belief in his or her capabilities regarding the performanceof a specific task or set of behaviors. In addition to computer self-efficacy, it is useful to consider Internet self-efficacy. Prior studies havesuggested that novice Internet users are less comfortable and moreuncertain with the Internet and may perceive the Internet, like com-puters, as complicated (Graphic, Visualization, and Usability Center,1999; Katz & Aspden, 1997). Individuals with higher degrees of In-ternet self-efficacy may approach Web-based tests more positively andmay perform better on Web tests than individuals with lower degreesof Internet efficacy. Some prior research on Internet self-efficacy hasfocused on the performance of specific Internet tasks, such as creat-ing Web addresses or searching for sources over the World Wide Web(Nahl, 1996; Ren, 1999), and found positive relationships between effi-cacy beliefs and task performance. Eastin and LaRose (2000) examinedexpected relationships between Internet self-efficacy and broader mea-sures such as Internet use and stress. Their results suggested that Inter-net self-efficacy is positively correlated with Internet use (measured inhours spent online) and negatively correlated with Internet stress (mea-sured as the perception that frustration and problems are likely to be en-countered when using the Internet). We included measures of computerself-efficacy as well as Internet self-efficacy in this study, with the expec-tation that these efficacy beliefs may be related to reactions or possibly testperformance.

1010 PERSONNEL PSYCHOLOGY

Reactions to Internet Testing

Our study also provided us the opportunity to investigate test takers’reactions to cognitively oriented selection tests administered via the In-ternet. Prior research has suggested that negative consequences can oc-cur if applicants dislike a selection process (Gilliland & Cherry, 2000),and some studies that compared computerized versus paper-and-penciltests reported that computerized tests required more time to complete andproduced stronger feelings of time pressure (Mead & Drasgow, 1993;Vispoel, Boo, & Bleiler, 2001). In addition to studies on reactions to com-puterized testing, some recent research has examined applicant reactionsto Internet-based testing that included personality and situational judg-ment tests (e.g., Salgado & Moscoso, 2003; Sinar & Reynolds, 2001).For example, Salgado and Moscoso (2003) reported that respondents inthe Internet-administered condition perceived the personality test morepositively than those in the paper-and-pencil condition. We were carefulto observe and document participants’ reactions to testing throughout thecourse of this unique study, and participants also completed a brief surveyof their reactions to the study.

Research Expectations

Our research expectations regarding cross-mode equivalence, poten-tially relevant individual characteristics, and participants’ reactions to In-ternet testing were consistent with the literature reviewed above. Our ex-pectations were also shaped by what we understood a priori about theparticular tests used in this study, the Test of Learning Ability (TLA;Richardson, Bellows, & Henry, Inc., 1989) and a situational judgmenttest. The TLA is a timed test of general cognitive ability that is similar informat to the Army General Classification Test (AGCT; see Harrell, 1992)and to the Wonderlic Personnel Test (Wonderlic, 1992). Three subscalescomprise the TLA: spatial reasoning (block counting), mathematical abil-ity (arithmetic), and verbal ability (vocabulary). The situational judgmenttest consisted of 10 multiple-choice items taken from the Manager ProfileRecord, Part II (Manager Profile Record, Richardson, Bellows, & Henry,Inc.; see also Pederson, 1984). The 10 items were intended for operationaluse in an initial, Web-based screen; particular items were chosen by thetest developer to maintain the validity and representativeness of the fullmeasure.

Cross-mode equivalence. Based upon the equivalence literature fortimed and untimed tests reviewed above, our first expectation regardingcross-mode equivalence was that we would obtain a moderate cross-mode

POTOSKY AND BOBKO 1011

correlation for the timed TLA (i.e., cross-mode .5 < r < .9), but a higherdegree of equivalence for the untimed situational judgment test.

Hypothesis 1: There will be a moderate cross-mode correlation betweenInternet-administered and paper-and-pencil administered versions of thetimed cognitive ability test and a higher cross-mode correlation for untimedsituational judgment items.

Given the potential distinctions in the way respondents may perceiveand interact with items presented via the computer screen, we anticipatedthat the cross-mode correlation for the block counting items would belower than the cross-mode correlation for vocabulary and math items.

Hypothesis 2: The cross-mode correlation (between paper-and-pencil andWeb-based administration modes) for spatial ability (i.e., block counting)items will be lower than the cross-mode correlations for vocabulary andmath ability items.

Individual characteristics. We attempted to include as many of therelevant individual characteristics from the computerized testing litera-ture as possible in this exploratory investigation. We included measuresof computer experience, hours spent using computers, hours spent usingthe Internet, computer efficacy beliefs, Internet efficacy beliefs, computerplayfulness, beliefs about surveillance when using computers, and beliefsabout the honesty detection capabilities of computers. Because some stud-ies have also suggested that age and gender may be related to computerexperience (see Lloyd & Gressard, 1984; Pope-Davis & Twing, 1991;Temple & Lips, 1989), we included age and gender variables as well.We examined potential relationships between all variables included in thestudy and Internet test scores and reactions.

Based upon the literature reviewed, we expected that computer efficacyand Internet efficacy beliefs would play a role in test performance, affectingscores on the Web-administered versions of the tests. We also anticipatedthat individuals with high levels of Internet self-efficacy would react morepositively to the Internet testing situation than those with low self-efficacybeliefs. Those with low degrees of Internet self-efficacy might prefer themore traditional, paper-and-pencil versions of the tests.

Hypothesis 3: Computer and Internet self-efficacy beliefs will be positivelyrelated to scores on tests administered via the Internet and to reactions totest-administration mode.

Practical observations. As noted earlier, given the newness of Web-based selection technology, we were interested in the way test takers re-spond to Internet-administered tests, and we anticipated that Internet se-lection testing would encompass its own distinguishing aspects relevantto test administration. In addition to a brief assessment of participants’

1012 PERSONNEL PSYCHOLOGY

self-reported reactions, we supervised test administration and documenteda variety of qualitative aspects associated with implementing selection testsin Web-based environments. These qualitative aspects included observa-tions of respondents’ test-taking behaviors, identification of issues uniqueto administering timed tests via the Internet, and consideration of generalaspects of Web-based test format and design. We present practical andoperational issues of Internet testing in the discussion section below.

Method

Study Participants

Study participants were graduate students, continuing education stu-dents, and, in some cases, university employees. The sample consistedof 37 male and 28 female respondents; 55 Caucasian, 7 Asian, and 3African American respondents. The average age of study participants was35 years (SD = 9.45), and the vast majority of the participants (91%) wereemployed at the time of the study. Occupational titles provided by partic-ipants included several computer or information technology titles, severalmanagerial titles, an engineer, a chemist, five human resource specialists,three staff assistants, a social worker, and two homemakers. Eight par-ticipants described themselves as students, and four listed themselves asunemployed. Although not all job titles represented positions likely to betested via the Internet, this occupationally diverse, adult sample appearedto be indicative of the population of those likely to use the Internet to ap-ply for jobs. Many study participants represented qualified applicants forthe technical and professional types of jobs that are frequently advertisedon the Internet. Almost half of the participants (48%, or 31 individuals)indicated that they had applied for a job via the Internet in the past.

Procedure

Information about the study and a request for participation was an-nounced in various graduate class meetings and training courses. Datacollection sessions were scheduled so that only small numbers of individ-uals (i.e., 5 to 10) would be online at any one period of time. Althoughseveral different computer labs were used during test administration, de-pending on lab availability, all computers used in this study had no lessthan Intel Pentium II 300 MHz Processors and 64 Mb memory. All partic-ipants accessed the testing Web site via Microsoft Internet Explorer usinga T3 Internet connection, one of the fastest connections available.

Participants were told that the purpose of this study was to gain furtherunderstanding of how the Internet might be used to administer cognitive

POTOSKY AND BOBKO 1013

ability tests associated with selection decisions. They were told that theywould be asked to complete paper-and-pencil as well as Internet versionsof the tests, and they were paid $20 each for their participation in the study.In addition, although we could not actually “hire” the best applicants, inorder to encourage participants to do their best on the tests, participantswere told that the top four scorers on the timed test would be selected toreceive a $50 bonus payment.

The first author coordinated and directed each testing session, through-out which she followed a scripted set of instructions to participants. Partic-ipants used a code number (e.g., the last four digits of their social securitynumbers) for all measures completed and, therefore, could not be identifiedby name. After providing their consent and expressing their understandingof and interest in trying for the “top score” bonus payment, participantscompleted a background survey comprised of demographic questions aswell as scales designed to assess computer and Internet use, experience,attitudes, playfulness, and beliefs. In order to counterbalance the order oftest administration mode, participants were randomly assigned to eitherCondition A (paper-and-pencil administration first) or Condition B (Web-administration first). Condition B participants went to a computer lab tobegin Web-based testing, while condition A participants were asked toremain in the room to take the paper-and-pencil tests.

In order to ensure the security of test items and to enable observationof test takers’ experiences, our study used a supervised setting for boththe paper-and-pencil and Internet test administrations. As explained byPloyhart et al. (2003), proctored Internet tests are widely used by severallarge organizations because they allow organizations to take advantageof the benefits of Internet tests without compromising test security andproper identification of test takers.

In both conditions, the TLA was presented prior to the situationaljudgment test. In the paper-and-pencil condition, the administrator readaloud the written instructions for the TLA provided by RBH, Inc. and useda timer with an alarm audible to study participants to aid in ending thetimed test. The administrator then read the instructions for the situationaljudgment test out loud as participants read them, after which participantsresponded to the items at their own pace. Upon entering the computer lab,each study participant was given an instruction sheet regarding the logonprocedure for the Web site. Test instructions were provided on screen,online via the testing Web site. A lab attendant observed study participantsand assisted individuals who had questions but did not take initiative toorganize or instruct study participants. That is, the Internet tests were self-administered but supervised. The Internet version of the TLA incorporatedits own timer, which meant that participants in the Web condition didnot necessarily begin and end the timed test as a group. (Note that the

1014 PERSONNEL PSYCHOLOGY

timer within the TLA test stopped while Web pages were uploaded.) Uponcompletion of the TLA, the instructions for the next test appeared on thescreen. Again, the situational judgment test was untimed.

After completing either the paper-and-pencil version or the Web ver-sion of the tests, participants switched such that they completed the al-ternative version of the tests either in the initial room or in a computerlab. Given the time required to complete the untimed situational judgmenttest, to move between test administration rooms, and to listen to and/orread through instructions for testing, for most study participants therewas about a 30 to 40 minute interval between the two test administrationmodes. When finished with testing, participants completed a short set offollow-up questions designed to assess their reactions.

Measures Administered

Background questionnaire. A background survey was administeredvia paper-and-pencil to all study participants prior to testing. This ques-tionnaire included the Computer Understanding and Experience (CUE)Scale (Potosky & Bobko, 1998), a 12-item measure of general computerexperience. A sample item from this measure is “I know how to installsoftware on a personal computer.” The internal consistency estimate forCUE items obtained from our sample was α = .86. Computer self-efficacywas assessed with five items (α = .67) similar in specificity to computerefficacy measures developed by Hill, Smith, and Mann (1987) and Websterand Martocchio (1992). Examples of items are “I do not have much confi-dence in my computer ability” and “Computer errors are very difficult tofix.” Six items assessed Internet self-efficacy beliefs, for example, “I be-lieve that using the Internet is something I do well” and “I am able to learnnew Internet tasks quickly” (α = .74). Also included was a seven-itemmeasure of computer playfulness (Webster & Martocchio, 1992), whichassessed the playfulness with which individuals interact with computers(α = .88). Similar to the computer beliefs measure(s) used in Potoskyand Bobko (2001), six items that measured participants’ beliefs regard-ing privacy and computer monitoring (e.g., “It is likely that someone canmonitor what a person types on computers that are linked to one another”)and five items that assessed beliefs regarding the ability of computers todetect honesty (e.g., “There are computer programs that can determinethe honesty of responses to questions presented by computerized tests”)were also included in the background survey. In addition, several singleitems requested background information on computer and Internet use(e.g., “hours per day spent using computers” and “hours per day activelyaccessing the Internet”) and demographics.

POTOSKY AND BOBKO 1015

Test of Learning Ability (TLA). The Test of Learning Ability is a54-item, multiple-choice, timed test of cognitive ability developed byRichardson, Bellows, & Henry, Inc. that assesses spatial reasoning (blockcounting), mathematical ability (arithmetic), and verbal ability (vocabu-lary). As noted earlier, the TLA is a measure of general cognitive abilityand is similar in format to the Army General Classification Test (AGCT;see Harrell, 1992). Respondents are allowed 12 minutes for this test.

Situational judgment. Ten situational judgment items, adapted fromthe Manager Profile Record, Part II (Manager Profile Record, Richardson,Bellows, & Henry, Inc.; see also Pederson, 1984) were administered inthis study. As noted earlier, the 10 situational judgment items were se-lected by the test developer to maintain validity and representativenessof the full measure. Each item presented a statement or description, andrespondents were asked to choose what they believed to be the best re-sponse to the statement from a set of multiple choice response options.Only one response option was selected from the four to eight choicespresented for each item. Zero to four points were allocated to the re-sponse selected for each item; the maximum possible score on this testwas 29.

Reactions to testing. At the end of testing, we assessed study partic-ipants’ reactions by asking for their responses to 11 questions, presentedas a paper-and-pencil survey. For each administration mode, participantswere asked about their enjoyment, satisfaction with their effort, the ex-tent to which they felt pressure while completing the tests, and the extentto which they felt monitored during testing. Three additional questionsasked each participant to indicate the likelihood that he or she would beone of the four top scorers (to be selected for the $50 bonus), the extentto which the possibility of receiving the bonus award was desirable, andthe degree to which he or she thought that participating in the study wasinteresting. Study participants were asked to mark an “X” in one of fivespaces, bounded by extremely and not at all, in order to respond to eachitem. Study participants were encouraged to provide comments either bywriting them on the reaction form or by talking to the test administrator,who documented their responses.

Missing Data

This study used state-of-the-art Internet testing technology that wascommercially developed, owned, and operated by a private testing cor-poration. Complete data were not available for the Web-based responsesof 11 participants, however, and the investigators documented Internet orcomputer-related problems for seven of these cases. For example, a fewsubjects were kicked out of the test or the time for uploading a new screen

1016 PERSONNEL PSYCHOLOGY

took so long that the screen froze. In four instances, however, there was norecord of difficulties during test administration but data and scores wereunavailable from the commercial Internet test provider. As a result, thesample size was reduced to 54.

We did obtain paper-and-pencil scores for all 65 study participants, andwe compared the scores and background information for the 11 individualsfor whom Web-based scores were unavailable to the 54 participants forwhom complete data were obtained. No significant systematic differenceswere observed between these two subgroups.

Results

Table 1 shows the means, standard deviations, and correlations betweenthe measures identified in this study, including the subscales of the TLA,for each mode of administration. Note that test respondents understood thesimulated selection context of their participation in this study. As a checkon the salience of the bonus award, which was intended to encourageparticipants to do their best on the tests so that they might be selectedas one of the top four scorers, participants indicated that they found thepossibility of the bonus desirable (M =4.09 where 5= extremely desirable,SD = 1.20).

Cross-Mode Equivalence

Our first proposition suggested that moderate cross-mode correlationswould be obtained for the TLA, and higher cross-mode correlations wouldbe found for the untimed situational judgment test. Overall, these expec-tations were supported. As reported in Table 1, the correlation betweenthe Internet version and the paper-and-pencil version of the TLA was r =.60 ( p < .001). Although the correlation is positive and significant, r =.60 is insufficient to suggest that the Web-based version of the TLA willexactly replicate the selection validity (or rank ordering of candidates)of the paper-and-pencil version (see McCornack, 1956). Norms from theTLA manual indicate that although the population mean is less than weobtained here, the standard deviation is not different (8.5 vs. the valuesof 6.7 and 8.5 we obtained). Further, consistent with our expectations, alarger cross-mode correlation was obtained for the situational judgmenttest (r = .84, p < .001).

Paired means t-tests were also used to compare potential mean differ-ences in scores for Web-administered versus paper-and-pencil adminis-tered tests. T-test results indicated that mean scores on the Web versionof the TLA (M = 39.19, SD = 8.51) were significantly lower than meanscores on the paper-and-pencil version of the TLA (M = 40.55, SD = 6.71;

POT

OSK

YA

ND

BO

BK

O1017

TABLE 1Descriptive Statistics and Cross-Mode Correlations for the TLA and the Situational Judgment Tests

M SD n 1 2 3 4 5 6 7 8 9 10

1 pptotal 40.55 6.71 65 1.002 Webtotal 39.19 8.51 54 0.60∗∗∗ 1.003 ppblock 13.43 2.26 65 0.81∗∗∗ 0.38∗∗ 1.004 Webblock 12.74 2.90 54 0.46∗∗∗ 0.87∗∗∗ 0.44 1.005 ppvocab 13.94 2.58 65 0.73∗∗∗ 0.41∗∗ 0.37 .23 1.006 Webvocab 13.48 3.01 54 0.52∗∗∗ 0.85∗∗∗ 0.23 .66∗∗∗ .58∗∗∗ 1.007 ppmath 13.19 3.35 65 0.90∗∗∗ 0.61∗∗∗ 0.67∗∗∗ .44∗∗∗ .45∗∗∗ .42∗∗ 1.008 Webmath 12.96 3.60 54 0.62∗∗∗ 0.92∗∗∗ 0.35 .71∗∗∗ .31∗ .73∗∗∗ .74∗∗∗ 1.009 ppjudgment 17.97 3.52 65 .13 .22 .00 .29∗ .14 .15 .15 .15 1.00

10 Webjudgment 17.59 3.68 57 .20 .20 .02 .23 .20 .11 .22 .20 .84∗∗∗ 1.00

Notes. Variable names prefixed with “pp” refer to tests administered via paper-and-pencil. Variable names beginning with “Web” refer to testsadministered via the Internet. “Total” refers to the total score on the TLA; “block” refers to the spatial reasoning (block counting) subscale of the TLA;“vocab” refers to the verbal reasoning (vocabulary) subscale of the TLA; “math” refers to the mathematical reasoning subscale of the TLA. “Judgment”refers to the situational judgment measure. Sample size varies according to the number of usable test scores obtained for each mode of administration.

∗∗∗p < .001 ∗∗p < .01 ∗p < .05.

1018 PERSONNEL PSYCHOLOGY

t = 2.27, p < .05, df = 53). Mean scores for the Internet-administeredsituational judgment test (M = 17.59, SD = 3.68) did not differ from thoseadministered via paper-and-pencil (M = 17.97, SD = 3.52). The standarddeviations for Internet test scores were greater than their paper-and-pencilcounterparts, but this difference was not significant.

We examined whether test scores improved the second time tests weretaken. In order to test for such practice effects, a paired means t-test wasconducted for Time 2 versus Time 1 test scores. Scores were higher thesecond time the TLA was administered, when averaged across modes(t = 5.16, p < .01, df = 53). In contrast, for the untimed situational judg-ment scores, the paired means t-test did not suggest significant differencesbetween Time 2 and Time 1 (t = .77, p = .44, df = 56).

Proposition 2 was that the cross-mode correlation for the block count-ing subscale of the TLA would be lower than the cross-mode correlationsfor the other two subscales. As shown in Table 1, the cross-mode correla-tions for each subscale were in an anticipated order of magnitude (mathr = .74, vocabulary r = .58, and block counting r = .44). Using the teststatistic for nonoverlapping correlations and an appropriate adjustment fornonindependence recommended by Raghunathan, Rosenthal, and Rubin(1996), we compared the cross-mode correlation for the spatial reasoning(block counting) subscale to the cross-mode correlation obtained for eachof the other two subscales (verbal and math). Although the block-countingcross-mode correlation (r = .44) was lower than that of the vocabularysubscale (r = .58), the difference between these two correlations was notsignificant (z = 1.025). On the other hand, the cross-mode correlationobtained for the math subscale (r = .74) was significantly greater thanthe cross-mode correlation for the block counting subscale (z = 2.70, p <

.01, one-tailed), suggesting partial support for our second proposition. Re-spondents exhibited the highest consistency of responses for math items. Itmay be worth noting, however, that the cross-mode correlation for the mathsubscale was not significantly different from that of the verbal subscale inour sample (z = 1.59).

Individual Characteristics

Table 2 provides means, standard deviations, and correlations betweentest scores, scores on individual characteristics measures, and reactionitems. Results indicate that several computer and Internet characteristicswere interrelated. For example, computer efficacy and Internet efficacywere positively related to each other (r = .42, p < .01) and also to com-puter understanding and experience (CUE) scores, computer playfulness,and hours per day spent using computers. Internet efficacy beliefs werepositively correlated with hours per day actively accessing the Internet(r = .45, p < .001) and hours per day spent using computers (r = .42,

POT

OSK

YA

ND

BO

BK

O1019

TABLE 2

Individual Characteristics Measured

M SD n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Test scores:1 pptotal 40.55 6.71 65 1.002 Webtotal 39.19 8.51 54 .60∗∗ 1.003 ppjudgment 17.97 3.52 65 .13 .22 1.004 Webjudgment 17.59 3.68 57 .20 .20 .84∗∗ 1.00

Individual characteristics:5 Internet efficacy 3.81 .77 65 .33∗∗ .34∗ .09 .18 1.006 Computer efficacy 4.09 .55 65 .15 .21 −.11 .00 .42∗∗ 1.007 Comp verify 2.19 .70 65 −.20 −.11 −.18 −.09 −.25∗ −.10 1.008 Comp watch 3.87 .67 65 .08 .11 .08 .13 .30∗ .29∗ −.01 1.009 Cue 4.05 .57 65 .33∗∗ .40∗∗ .08 .19 .70∗∗ .56∗∗ −.09 .21 1.00

10 Playfulness 5.48 1.08 65 .19 .24 .02 .10 .59∗∗ .53∗∗ −.34∗∗ .03 .61∗∗ 1.0011 Age 35.35 9.45 65 .05 −.27∗ .16 .17 −.18 −.10 −.17 −.09 −.03 −.11 1.0012 Comp hours 3.83 .91 65 .18 .25 −.05 −.03 .42∗∗ .49∗∗ −.24 .27∗ .53∗∗ .43∗∗ −.06 1.0013 Internet hours 2.06 .92 65 .24 .25 −.18 −.13 .45∗∗ .13 −.04 −.04 .51∗∗ .33∗∗ −.06 .46∗∗ 1.00

Reactions to testing:14 Enjoyed pp 3.38 .96 63 .30∗ .24 .18 .27∗ .04 −.11 −.10 −.11 .09 −.03 .05 .03 .25∗ 1.0015 Satisfied w/pp 3.51 .90 63 .11 .28∗ .26∗ .25 .12 −.09 .03 .03 .09 .07 −.05 .10 .22 .49∗∗ 1.0016 Pressure pp 3.18 1.12 63 −.15 −.22 .12 −.03 −.07 .08 .13 .13 −.06 .06 .15 .00 −.07 −.17 −.17 1.0017 Felt monitored pp 2.16 1.22 63 −.06 −.01 −.11 −.13 −.15 −.01 .22 .13 −.17 −.13 −.08 −.03 −.16 −.25 −.18 .51∗∗ 1.0018 Enjoyed Web 3.79 1.02 63 −.09 −.16 −.15 −.03 .24 .38∗∗ .03 .17 .21 .32∗∗ −.19 .27∗ .15 0.07 .03 −.15 −.13 1.0019 Satisfied w/Web 3.65 1.08 63 .02 −.01 .00 .05 .37∗∗ .16 −.06 .18 .36∗∗ .40∗∗ −.08 .36∗∗ .32∗∗ .16 .39∗∗ −.06 .01 .62∗∗ 1.0020 Pressure Web 2.71 1.28 63 −.05 −.07 .16 .04 −.15 −.22 .14 −.08 −.18 −.28∗ −.03 −.14 −.18 .09 .06 .50∗∗ .19 −.38∗∗ −.34∗∗ 1.0021 Felt monitored Web 1.89 .99 63 .10 .17 .09 −.04 .08 −.20 −.16 −.06 .05 .03 −.05 .01 .11 −.01 −.15 .12 .07 −.46∗∗ −.26∗ .37∗∗ 1.0022 High score likely 3.03 1.16 63 .27 .44∗∗ .10 .09 .25∗ .18 .11 −.06 .33∗∗ .06 −.07 .20 .34∗∗ .22 .39∗∗ −.08 −.02 .17 .42∗∗ .01 −.11 1.0023 Desirable bonus 4.10 1.20 63 .11 .21 .17 .04 −.05 −.14 −.06 −.36∗∗ −.09 .15 .00 −.04 .13 .08 .16 .06 −.02 −.09 .06 .11 .13 .38∗∗ 1.0024 Interesting study 4.38 .68 63 .00 .00 .21 .16 .13 .06 −.10 −.10 .11 .35∗∗ −.16 .12 .18 .39∗∗ .50∗∗ .04 −.11 .21 .42∗∗ −.10 −.22 .21 .21

Notes. “Comp verify” refers to the belief that computers can verify the truth of responses; “comp watch” refers to the belief that a person can use computers to monitor participants’ test taking behavior; CUE = ComputerUnderstanding and Experience scale; “playfulness” refers to computer playfulness. All correlations greater than .41 are significant at p < .001.

∗∗p < .01 ∗p < .05.

1020 PERSONNEL PSYCHOLOGY

p < .001). Interestingly, computer efficacy beliefs were significantlyrelated to hours spent using computers (r = .49, p < .001) but were notrelated to Internet access hours. These results suggest that Internet experi-ence and efficacy are related to computer experience and efficacy but thatcomputer experience and efficacy may represent broader constructs thatare distinct from Internet use and efficacy.

Computer understanding and experience, as measured by the CUEscale, was related to performance on both the Web-administered (r = .40,p < .05) as well as the paper-and-pencil version (r = .33, p < .05) of theTLA. In addition, although age was unrelated to scores on the paper-and-pencil version of the TLA, age was significantly, negatively correlatedwith Web-based TLA scores (r = −.27, p < .05). Younger study partici-pants performed better on the Web-based TLA than did older participants.We considered the possibility that computer experience might explainthe observed differences in Web-based test scores. We computed the par-tial correlation between age and Web-based TLA scores, partialling outCUE scores. Because the correlation between age and the Web-based TLAscores does not drop much after partialling out CUE scores, computer ex-perience does not appear to be the explanatory factor here. In addition, wenoted that the correlation between Internet efficacy and age was negative(r = −.18, ns) and the correlation between Internet hours and age wasnegative and significant (r = −.33, p < .01), suggesting that regardless oftheir computer experience, older participants may have been less confidentor familiar with Internet tests. There were no significant effects associatedwith gender in our sample.

Our third proposition suggested that computer and Internet efficacy be-liefs would be related to test scores and to self-reported reactions as studyparticipants. Computer efficacy beliefs were unrelated to test scores. Al-though Internet efficacy beliefs were related to test scores on the TLA,they were positively correlated with both the paper-and-pencil and Web-administered versions of the test. Internet efficacy beliefs were not relatedto the situational judgment scores for either administration mode. Whenwe regressed each set of Web-based scores on their paper-and-pencil coun-terpart scores and Internet efficacy scores, Internet efficacy scores did notexplain any additional variance in Internet-administered test scores.

As shown in Table 2, Internet efficacy beliefs were significantly cor-related with two specific reaction items: Satisfaction with participants’effort on the tests administered via the Internet (r = .37, p < .01) andthe self-rated likelihood of attaining one of the four highest scores on thetimed test (r = .25, p < .05). Internet efficacy beliefs were not relatedto enjoyment or to feelings of pressure during testing. In sum, our thirdproposition was not supported in terms of test scores, and received onlylimited support with regard to reactions.

POTOSKY AND BOBKO 1021

On average, participants reported that they enjoyed taking the Internettests more than the paper-and-pencil tests (M Internet = 3.79, S.D. = 1.02;Mpaper = 3.38, SD = .96; paired t = 2.27, p < .05). In addition, participantsindicated that they felt more pressure while completing the timed paper-and-pencil test (M = 3.17, SD = 1.12) than they did while completingthe timed test administered via the Internet (M = 2.71, SD = 1.28; pairedt = 3.05, p < .01). No significant mean differences were found in terms ofparticipants’ satisfaction with their effort on the tests or their perceptionsof being monitored while taking the tests.

Some of the open-ended comments we documented indicated that sev-eral participants believed that the Internet version of the tests went faster,although in fact the Internet-administered tests actually required more timeto complete (i.e., given the additional time required for screens to load).As one respondent wrote, “I could use the mouse quickly, but on paperthe letters seemed slower, I had to find my place, and switch between thetest and the answer sheet.”

Not all study participants preferred the Web version of the tests, how-ever. As one participant expressed after completing the Web test, “Thatwas weird.” Not all participants felt comfortable reading from the com-puter monitor, and some felt frustrated as they waited for screen pagesto load. Some were skeptical regarding the accuracy of the timer for theInternet-administered TLA. Others mentioned their “doubled” anxiety re-garding testing via the Internet. This was unfamiliar territory for somerespondents, and implications are discussed in more detail below.

Discussion

We anticipated that Internet-administered selection tests would becharacterized by unique practical and operational issues, distinct fromtesting issues that arise in paper-and-pencil and other computerized testadministration modes. Derived from our notes taken throughout this study,we organized our observations into the following categories: Practical ad-ministrative issues specific to timed Web-based tests, respondents’ test-taking behaviors, and design considerations for Internet tests. These ob-servations are discussed below and are also summarized in Table 3. Inaddition to these considerations for practice, we also discuss our empiri-cal findings.

Practical Observations

Issues specific to timed tests. Timed (speeded) tests present a specialset of considerations for Internet administration. A computer programcan certainly track the allotted time for power or speeded tests. However,

1022PE

RSO

NN

EL

PSYC

HO

LO

GY

TABLE 3Summary of Practical Observations from Timed and Untimed Internet Test Administration

Practical issues specific to timed web-based tests Respondents test-taking behaviors Web-based test design considerations

• Virtual (Internet) time is not equal to actual timepassed during test administration.

• The option of employing alternativetest-taking strategies may be less apparent torespondents in Web-based tests.

• Navigation information and frames forInternet tests may result in fewer items perscreen and more pages for each test.

• The time required for Web pages to load can varyconsiderably within a test and from one user toanother. Sources of variation include graphics inthe test, Internet connection speed, computerprocessing speed and performance, and Internetserver speed and capacity.

• The distinctions between tests and theirinstructions may be less apparent when abattery of tests is administered online.

• The use of color and graphics on Internettests may increase load time, but it mayalso enhance test-taker reactions andinfluence test performance.

• Practitioners should investigate the method usedto adjust the timer in Internet tests and ensure thatthis method is appropriate to test item format andto users’ method of accessing the test.

• Respondents may not attend to testinstructions and ancillary informationpresented online in the same way they attendto this information on p&p tests.

• Equivalent formats and procedures acrossadministration modes do not ensurecross-mode equivalent scores.

• Even in a monitored setting, an on-screen timerduring an Internet test may not present the sametime pressure that a human proctor may presentduring timed p&p tests.

• The role of the proctor with regard toclarifying test instructions is different inself-administered Internet tests. Readinginstructions aloud, audio files, and requireduser actions might enhance cross-modeequivalence.

• User expectations and human factorsconsiderations warrant consideration whendesigning Internet-administered selectiontests.

• “Load time” may allow some cognitive breathingspace that is not available during timed p&p tests.However, “load time” may break cognitive flow orconcentration for some individuals.

• Respondents may have strong preferences orhabits regarding methods of responding toWeb-based test items (e.g., mouse click vs.keystroke).

• Few would argue against the ease of takingWeb-based tests. More research is needed,however, to better understand the role thatage and other demographic differences mayplay in Internet testing.

Note. In the table above, “p&p” stands for “paper-and-pencil.”

POTOSKY AND BOBKO 1023

“virtual” Internet time is not necessarily equal to actual time passed duringtest administration. Only so many items can be presented per viewablescreen, and screens (or Web pages) take time to “load” for respondents toread. As noted earlier, the test administration platform used in this studydid not download the tests to a local computer, which resulted in relativelylong load times. Load time can vary according to how the computer isconnected to the Internet (e.g., via a 56K telephone modem, a T1, T2, orT3 connection, or high speed cable), the processing speed and performancequality of the computer used to take a test, and the speed and capacity ofthe Internet server upon which the test resides. In addition, load time maydepend on the graphics or other features included in a Web-based test (e.g.,in our study, images of blocks to count in the TLA were more elaboratethan simple words and, therefore, these screens took somewhat longerto load). Current technology for establishing the timer adjustments arenot completely adaptive to these variables, but a formula adjustment isapplied to all pages and/or test takers in the same way. For example, wechecked with several commercial Internet test providers, and the presentconvention is to have the timer add back the average number of secondsit takes for the screens within a test to load. By controlling access to theInternet tests in our study (i.e., by using computer labs), we were able tocontrol several of the aforementioned sources of variation in load time,but not all of them.

Practitioners who administer timed Internet tests should understand themethod used to adjust the timer within the test and should consider usingtest administration settings that enable some control over the computersused for testing if the sources of variation in load time described aboveare a concern. However, as we observed, Internet test administration in acomputer lab setting may not eliminate all sources of load time variabilityduring Internet testing. Administrators of timed computerized tests mayneed to select testing platform options (e.g., computerized but not Web-based, Web-based but downloaded to a local computer, and fully Internet-administered) that complement load time and timer considerations.

Test taker reactions present another issue for timed Web-based tests.In our study, the timer appeared at the bottom of the screen once thetimed test began (i.e., after reading instructions the respondents clicked“Begin”). Many test-takers asked about the load time and if it was countedagainst their test time. The instructions noted the fact that the test wastimed and also noted that the timer would adjust to load time. However,most respondents did not seem to recall this information during the actualtest. As one respondent mentioned “The timer was apparent. Did the timerstop/readjust?” Others took a moment to notice the timer and watch to seethat it adjusted while taking the tests. Some respondents went so far as totry to decipher how the timer adjusted for load time. Interestingly, despite

1024 PERSONNEL PSYCHOLOGY

these documented observations, the reaction survey administered at theend of testing suggests that study participants felt less time pressure whiletaking the Internet tests than they did when taking the paper-and-penciltests.

For those respondents who understood that the timer would account forthe time it took for screens to load, the “load time” may have allowed some“cognitive breathing space” that was not available during the paper-and-pencil versions of the timed tests. On the other hand, for some respondents,this additional time may have broken the cognitive “flow” of ideas withinan individual’s test-taking process and could have been a distraction or aninterruption during the test. Whether the load time on Web-based tests isviewed by respondents as an opportunity or a distraction is an interestingquestion for future research on testing.

Respondents’ test-taking behaviors. One of the things we observed inthis study was that the test-taking strategy adopted by respondents couldvary according to mode of administration. On a paper-and-pencil test,more items can be presented on a page (using smaller font sizes), andrespondents may find it easier to quickly estimate time constraints, typesof items, and so forth for the whole test. In addition, few paper-and-penciltests restrict respondents from turning pages within a test while taking thetest. For example, on the TLA, a test taker might choose to answer all blockproblems first, then work on vocabulary items, then math items, regardlessof the order in which the items are presented. In contrast, the presentationof test items administered via the Internet is limited by font and screensize. In our study, three to six items were presented on a screen at a time.Even with the option of moving to previous and next screens within aWeb-administered test, the ease of moving around in a test to completecertain types of items was harder to do and more time consuming viathe Internet. More importantly, the option of employing alternative testtaking strategies (e.g., other than responding to each item in the orderpresented) may be less apparent to respondents taking Web-based tests.Respondents may be more likely to take the test in the order in whichitems are presented. This is good for consistency of administration, butnot necessarily for equivalence with paper-and-pencil tests.

A similar issue concerns how test takers perceive test instructions.Professionally developed paper-and-pencil tests usually have cover pages,instructions pages, and headers on each page that may reference the titlesof the tests, all of which suggests the tests’ purpose. In our study, theWeb-based tests included descriptions and instructions very similar tothose of the paper-and-pencil test, but no equivalent to the cover page wasprovided and the pages within the test did not provide similarly evidentheaders or footers. Little is known about the extent to which test takers mayrely on the redundancy of instructions and test titles in the presentation

POTOSKY AND BOBKO 1025

of items on a test. In addition, when a battery of online tests is given,the distinctions between tests may become blurred without such titles,headers, and footers. In practice, test administrators who opt to use onlineversions of tests developed for paper-and-pencil administration may wantto incorporate such orientation cues into the formatted Web pages.

Respondents’ attention to instructions for the tests may be differentwhen instructions are read on-screen versus on paper and/or by a testadministrator. For example, the online test instructions asked participantsto maximize their computer screens prior to the beginning of each test,and the instructions for the online TLA clearly stated that the timer wouldadd time to adjust for the moments required for pages to load on thescreen. Some individuals attended to the information provided in theseinstructions, but some did not. Questions about how to take the test orabout the method used to keep time never came up in the paper-and-pencilcondition. Perhaps instructions for testing are more likely to be skimmedthan carefully read when test respondents are reading on their own and/orwhen reading the instructions on a computer screen.

Our observations regarding adherence to test instructions call into ques-tion the meaning of the term “proctored setting” as it applies to Web-basedtest administration. Although we found no documented protocol for proc-toring Web-based tests, we contacted the Society for Human ResourceManagement (SHRM) Assessment Center, which offers a catalog of on-line employment testing resources, to ask whether proctors of Web-basedassessments (including available timed cognitive ability tests) typicallyread aloud instructions to test takers. We were informed that this is gen-erally not the case, and that there are no specific guidelines that wouldencourage test administrators to do so (A.Vassar, SHRM Online Assess-ment Center, personal communication, February 25, 2004). However, iftest equivalence is an issue, we suggest that practitioners who administerWeb-based tests in a proctored setting consider reading instructions aloudto test takers (as appropriate) until further study can provide more guid-ance on this matter. In addition, more consideration should be given to thepresentation of instructions in unproctored Internet tests. Perhaps includ-ing an audio file that reads instructions aloud or requiring user action toconfirm understanding of instructions would be helpful.

The primary role of the proctor in Internet tests is to ensure the identityof the test taker and to monitor against cheating during the test. Despitethe convenience and cost savings associated with remote administrationof Internet tests, test takers are often required to come to a supervisedtesting location because there is currently no way to completely preventtest takers from cheating or copying items during testing, there is no wayto guarantee that another person won’t be looking over the test taker’sshoulder, and options for authenticating the identity of the test taker from

1026 PERSONNEL PSYCHOLOGY

remote locations are limited. Perhaps the role of the proctor in Web-basedtesting needs to be more carefully matched to that of the proctor in tra-ditional paper-and-pencil tests. More work is needed in both practice andresearch, however, to increase our understanding of test takers’ behaviorsand reactions and to develop creative adaptations to the testing processin order to take advantage of the self-administered aspects of Web-basedtests.

Some respondents indicated that they preferred to use the keypad torespond to online items rather than use the mouse to click on their selectedresponse. In addition, in the Web site used in our study, the mouse didnot always have a presence on the screen. That is, until the respondent“clicked” on something, the cursor was not visible. Some respondentsattempted to use the keypad, regardless of their preference for or famil-iarity with the keypad. Decisions about how item responses are submitted(e.g., by mouse click, by keystroke, by voice, by light pen, etc.) are ul-timately made by test developers, and these decisions may influence testoutcomes. Not only do test takers have preferences regarding these optionsfor responding, but failure to accommodate different methods of onlineresponding may affect response speed and accuracy.

Design considerations for Internet tests. Our exploration of Inter-net testing prompted consideration of some issues that may be taken forgranted when using traditional paper-and-pencil tests. One issue pertainsto the trade-off between font size and viewing space. For example, in orderto maintain “equivalence,” in our study the Internet test used a 12-pointfont size. The navigation information (e.g., the toolbar) and frames forscreen content meant fewer items per screen and more “pages” for eachtest. An alternative format could include many more items per Web page,but only so many items would be viewable at once. Test takers would haveto “scroll down” in order to view more items, and this might require morefamiliarity with Web formats and applications. Choices regarding the mostappropriate font size in relation to the “viewable screen” of test items maydeserve more attention and study.

Another general issue relates to the use of color and graphics on thecomputer screen. In our study, the online testing screens, like their paper-and-pencil counterparts, had white backgrounds and black lettering. Theuse of graphics, logos, and color is generally minimized on paper tests,probably due to cost and printing considerations. The cost of color isless an issue for Internet-administered tests. The potential effects of color,graphics, and perhaps other enhancements such as the use of audio or flashanimation on Internet test performance as well as applicants’ reactions tosuch tests warrant future investigation.

We made every effort to present the same items in the same way acrossthe two modes of administration examined in this study. Yet, we do not

POTOSKY AND BOBKO 1027

know the proper protocol for Internet test administration in order to ensureequivalence with professionally developed paper tests. Given our experi-ences in this exploratory study, perhaps more attention to the design ofInternet-administered selection tests that yield equivalent scores, despitedissimilar presentation of items, is needed. On the other hand, rather thansimply introduce sources of format inequivalence, the inherent differencesassociated with Web-based applications used in Internet test may offer anopportunity to improve the quality of assessment and the experience of testtakers. Users’ expectations about the Internet and about Web-based testingas well as human factors considerations might warrant the use of differentfont sizes, colored backgrounds, and other graphics in Web-based tests.Such modifications may ultimately make scores (despite different testingprocedures) more equivalent.

Empirical Results

Our empirical findings suggest that Internet-administered cognitiveability tests are about as equivalent to their paper-and-pencil counterpartsas are other computer-administered tests. Our results are encouraging inthat we encountered few surprises regarding how respondents scored whentaking a Web-based test. The untimed situational judgment test produced amoderately high (r = .84) cross-mode correlation. Although we could notlocate precise test–retest reliability data for the Manager Profile Record(Part II), from which the situational judgment test used here was taken,we did find a test–retest reliability coefficient (r = .81) for a very similarsituational judgment test, the Supervisory Profile Record, Part II (TheRBH Supervisory Profile Record, p. 148). Considering that the cross-modecorrelation obtained in this study is, in fact, an indicator of test–retestreliability, the similarity of the magnitude of these coefficients (r = .81and r = .84) suggests that the unreliability associated with the situationaljudgment measure is probably not due to differences in administrationmode.

The timed cognitive ability test demonstrated less equivalence, as indi-cated by the relatively low cross-mode correlations shown in Table 1. Thisresult is consistent, however, with the computer-versus-paper-and-pencilresearch literature, which has suggested lower degrees of equivalence fortimed cognitive ability tests than for untimed tests (Burke & Normand,1987; Mazzeo & Harvey, 1988; Mead & Drasgow, 1993; Van de Vijver& Harsveld, 1994). Further, the TLA is a derivative of the Army Gen-eral Classification Tests (AGCT; see Harrell, 1992), which has a reportedreliability of at least .95. This suggests that the cross-mode correlationsobtained here are not much attenuated by unreliability.

1028 PERSONNEL PSYCHOLOGY

Not only did we examine cross-mode equivalence on an untimed testas well as a timed test, but we also looked at performance on three differentsubscales within the timed test. As expected, concerns about equivalencemay be more applicable to measures of spatial reasoning or other measuresthat require visual perception. Our findings regarding block-counting itemsare consistent with mean differences on timed ability tests that includegraphics (Mazzeo & Harvey, 1988; Van de Vijver & Harsveld, 1994).As noted earlier, given the ease of incorporating graphics and other itemformats and presentation options that Web-based applications offer, futureinvestigation is warranted.

The observation that the cross-mode correlations for the TLA sub-scales are lower than the correlations between subscales within each modemay raise questions about the constructs assessed in each administrationmode. Indeed, one reviewer noted that the within-mode correlations werehigher for Internet-administered scores. Given our sample size and the ex-ploratory intent of this study, it is inappropriate to draw conclusions aboutthe convergent validity of the TLA subscales. Future study is needed toexplore the extent to which Internet test administration introduces sys-tematic variation in test scores and possibility that the Internet mode mayinteract with and alter the traits being measured.

For both the TLA and the situational judgment tests administered,mean scores were lower and standard deviations were larger for the Inter-net administration mode. This finding is consistent with results reportedby Buchanan and Smith (1999) and Ployhart et al. (2003). If tests ad-ministered via the Internet were found to be equivalent to their validatedpaper-and-pencil counterparts, practitioners would likely choose to ad-minister tests using one mode or the other, but not both. Mean differencesare important, however, if test takers have some choice regarding modeof test administration, or when test administrators elect to offer a test inalternative formats. In these circumstances, practitioners may need to es-tablish cut scores and test norms for the specific test to be administeredonline, relative to their paper-and-pencil counterparts.

One reason why practitioners might consider offering test takers achoice regarding administration mode pertains to age. The observed neg-ative correlation between age and performance on the timed Internet testmay present some cause for concern in terms of age-related adverse im-pact. In our sample, older individuals reported fewer hours spent using theInternet. Although the correlations were not significant, the relationshipbetween Internet self-efficacy and age as well as enjoyment of the Internettesting format and age was negative. Age did not seem to be a factor inthe shorter, untimed situational judgment test. Because Internet tests areself-administered, perhaps offering practice items with supportive feed-back to help acclimate users to the Web-based format would enhance task

POTOSKY AND BOBKO 1029

specific self-efficacy or improve reactions to the Internet test format. Wespeculate that human factors considerations, including larger font sizesand the ability to choose alternate ways to respond to questions (e.g., bykeying responses rather than using a mouse), might further accommodatethe needs of older respondents. More research is needed to better under-stand the extent to which age, test taker perceptions, and preferences mayaffect Internet test performance.

Limitations

Of course, no study is without limitations. In our case, the small sam-ple size may account for some of our empirical findings. For a completetest of psychometric equivalence, one would desire significantly greatersample sizes. However, our purpose was to explore and document as manyaspects as possible about using the Internet to administer selection tests.Our finding of consistently lower mean scores and larger variances for theInternet-administered tests is consistent with results recently reported byPloyhart et al. (2003). The cross-mode correlations we report are illustra-tive, not definitive, and the fact that they fit well with prior research andexpectations lends credence to our findings. The low cross-mode correla-tions observed for the timed test provide an important caveat to those whomight optimistically assume cross-mode equivalence for all types of teststhat are used in personnel selection.

The 11 sets of scores lost in the Internet administration in this study wasunfortunate and is consistent with reports from other researchers who haveconducted Web-based research. For example, in their Web-based survey,Chapman and Webster (2003) also commented that “despite extensive betatesting of the Web site using a variety of browsers and platforms, severalpeople reported that they had difficulty with the Web site and could notcomplete the online version” (pp. 114–115). In our study, even the use ofa commercially developed and administered Internet test site as well as aproctored setting did not immunize our efforts from cyberdifficulties. It isunlikely that connection speed was a problem in our study. However, theperformance of one computer versus another in the labs used could havebeen one source of the observed variation in “load time.” In addition, theperformance and location of the Internet server may have played a role inlost data. We caution that unproctored, individual access to Internet testsmay potentially introduce even more variation and risk in terms of datatransfer.

We also note that the effects of delivery system failure or even theeffects of “technical difficulties” on test performance have not been re-searched. We speculate, however, that these events are “negatively rein-forcing” and may interact with individual differences such as efficacy and

1030 PERSONNEL PSYCHOLOGY

motivation. Our experience in testing tells us that people, including job ap-plicants, don’t like taking tests and some may lack confidence when doingso. Technical problems with delivery mode present a type of feedback totest takers and may increase frustration and debilitate performance. At thesame time, people seem to like taking Web-based tests (Anderson, 2003).Perhaps this positive affect can alleviate some of the negative reactions orstress experienced when problems occur.

Another limitation concerns the short period of time between our twoadministration modes (ranging from about 30 minutes up to an hour). Atthe same time, it is likely that a more extended period between repeated testadministrations would attenuate the cross-mode correlations even further.It would be useful to know what happens to cross-mode correlations whenmemory effects can be expected to have less of an effect. We note that theselimitations refer generally to our empirical findings, not our qualitativeobservations.

Conclusion

Internet selection testing, especially supervised testing as conductedin this study, is becoming widespread in public and private organizations(Ployhart et al., 2003), and equivalence with the paper-and-pencil ver-sions of the tests administered is a salient concern. Some caution seemswarranted as organizations endeavor to convert established timed paper-and-pencil measures of cognitive ability to Internet-administered tests.This study demonstrated that the untimed situational judgment test ad-ministered shows promising cross-mode equivalence but that the cross-mode correlation for the timed test of cognitive ability was considerablylower. Nonequivalence in cross-mode correlations could be of practicalconcern when top-down selection is invoked because different candi-dates might get selected depending upon the mode used. In addition,nonequivalence in means could be of practical concern when an orga-nization uses a set cut score (based on paper-and-pencil analyses) forselection.

Despite our efforts to make the Internet-administered tests as similar incontent as possible to their paper-and-pencil counterparts, our experiencein this research suggests respondents’ perceptions about the Internet, theirreactions to the Internet test(s), and technological features of the specificadministration platform employed may influence test performance. Thepotential effects of the interaction between respondents and administrationmode on the construct validity of the measures administered is an importanttopic for future study.

Additional research may also be needed to investigate potential age-related adverse impact for Internet tests, as well as adverse impact

POTOSKY AND BOBKO 1031

associated with other variables not included in this study (e.g., socioeco-nomic status and potential differential familiarity with computers and/orInternet access). Research is also needed to establish the criterion validityof Internet tests administered in proctored and unproctored settings. At aminimum, validation studies seeking equivalence may need to considerone test at a time. In addition, further study of test-taking strategies in theonline environment, Web design features and test formatting issues ap-plicable to Internet testing, and test-taker’s perceptions, preferences, andreactions will enrich our understanding of the opportunities and limitationsassociated with Internet-administered selection tests.

REFERENCES

Anderson N. (2003). Applicant and recruiter reactions to new technology in selection: Acritical review and agenda for future research. International Journal of Selection andAssessment, 11, 121–136.

Bandura A. (1991). Social cognitive theory of self-regulation. Organizational Behavior andHuman Decision Processes, 50, 248–287.

Bandura A. (1997). Self-efficacy: The exercise of control. New York: Freeman.Booth-Kewley S, Edwards JE, Rosenfeld P. (1992). Impression management, social desir-

ability, and computer administration of attitude questionnaires: Does the computermake a difference? Journal of Applied Psychology, 77, 563–566.

Bruvold NT, Comer JM, Rospert AM. (1990). Interactive effects of major response facili-tators. Decision Sciences, 21, 551–562.

Buchanan T, Smith JL. (1999). Using the Internet for psychological research: Personalitytesting on the World Wide Web. British Journal of Psychology, 90, 125–144.

Burke MJ, Normand J. (1987). Computerized psychological testing: Overview and critique.Professional Psychology: Research and Practice, 18, 42–51.

Chapman DS, Webster J. (2003). The use of technologies in the recruiting, screening,and selection processes for job candidates. International Journal of Selection andAssessment, 11, 113–120.

Cho H, LaRose R. (1999). Privacy issues in Internet surveys. Social Science ComputerReview, 17, 421–434.

Compeau DR, Higgins CA. (1995). Computer self-efficacy: Develop of a measure andinitial test. MIS Quarterly, 19, 189–211.

Cronk BC, West JL. (2002). Personality research on the Internet: A comparison of Web-based and traditional instruments in take-home and in-class settings. Behavior Re-search Methods, Instruments, and Computers, 34, 572–577.

Davis RN. (1999). Web-based administration of a personality questionnaire: Comparisonwith traditional methods. Behavior Research Methods, Instruments, and Computers,31, 177–180.

Donovan MA, Drasgow F, Probst RM. (2000). Does computerizing paper-and-pencil jobattitude scales make a difference? New IRT analyses offer insight. Journal of AppliedPsychology, 85(2), 305–313.

Eastin MS, LaRose R. (2000). Internet self-efficacy and the psychology of thedigital divide. Journal of Computer-Mediated Communication, 6, [Online]:http://www.ascusc.org/jcmc/vol6/issue1/eastin.html.

Foster Thompson L, Surface EA, Martin DL, Sanders MG. (2003). From paper to pixels:Moving personnel surveys to the web. PERSONNEL PSYCHOLOGY, 56, 197–227.

1032 PERSONNEL PSYCHOLOGY

Gangstead SW, Snyder M. (1985). ‘To carve nature at its joints’: On the existence of discreteclasses in personality. Psychological Review, 92, 317–340.

Gilliland SW, Cherry B. (2000). Managing “customers” of selection processes. In Kehoe JF(Ed.), Managing selection in changing organizations (pp. 158–196). San Francisco:Jossey-Bass.

Gist ME, Schwoerer C, Rosen B. (1989). Effects of alternative training methods on self-efficacy and performance in computer software training. Journal of Applied Psy-chology, 74(6), 884–891.

Greaud VA, Green BF. (1986). Equivalence of conventional and computer presentation ofspeed tests. Applied Psychological Measurement, 10, 23–34.

Graphic, Visualization and Usability Center (GVU). (1999). GVU’s Tenth AnnualWWW User’s Survey. Atlanta, GA: Georgia Institute of Technology. [Online]http://www.cc.gatech.edu/gvu/user surveys/

Harrell TW. (1992). Some history of the Army General Classification Test. Journal ofApplied Psychology, 77(6), 875–878.

Harris MM, Van Hoye G, Lievans F. (2003). Privacy and attitudes towards Internet-basedselection systems: A cross-cultural comparison. International Journal of Selectionand Assessment, 11, 230–236.

Hill T, Smith ND, Mann MF. (1987). Role of efficacy expectations in predicting the decisionto use advanced technologies: The case of computers. Journal of Applied Psychology,72(2), 307–313.

Jones JW, Dages KD. (2003). Technology trends in staffing and assessment: A practicenote. International Journal of Selection and Assessment, 11, 247–252.

Katz JE, Aspden P. (1997). A nation of strangers? Communications of the ACM, 40(12),81–86.

King WC, Miles EW. (1995). A quasi-experimental assessment of the effect of computer-izing non cognitive pencil-and-paper measurements: A test of measurement equiv-alence. Journal of Applied Psychology, 80, 643–651.

Lievans F, Harris MM. (2003). Research on Internet recruitment and testing: Current statusand future directions. In Cooper CL, Robertson IT (Eds.), International Review ofIndustrial and Organizational Psychology. Chichester, UK: Wiley.

Lloyd B, Gressard C. (1984). Reliability and factoral validity of computer attitude scales.Educational and Psychological Measurement, 44, 501–505.

Mazzeo J, Harvey AL. (1988). The equivalence of scores from automated and conventionaleducational and psychological tests: A review of the literature. College Board ReportNo. 88-8, ETS RR No. 88-21.

McCornack RL. (1956). A criticism of studies comparing item-weighting methods. Journalof Applied Psychology, 40(5), 343–344.

McManus MA, Ferguson MW. (2003). Biodata, personality, and demographic differencesof recruits from three sources. International Journal of Selection and Assessment,11, 175–183.

Mead AD, Drasgow F. (1993). Equivalence of computerized and paper-and-pencil cognitiveability tests: A meta-analysis. Psychological Bulletin, 114(3), 449–458.

Nahl D. (1996). Affective monitoring of Internet learners: Perceived self-efficacy and suc-cess. Journal of American Society for Information Sciences, 33, 100–109.

Pasveer KA, Ellard JH. (1998). The making of a personality inventory: Help fromthe WWW. Behavior Research Methods, Instruments, and Computers, 30, 309–313.

Pederson K. (1984). Manager Profile Record. In Hogan S, Hogan R (Eds.), Business andIndustry Testing: Current Practices and Test Reviews (pp. 360–362). Austin, TX:Pro-Ed.

POTOSKY AND BOBKO 1033

Ployhart RE, Weekley JA, Holtz BC, Kemp C. (2003). Web-based and paper-and-penciltesting of applicants in a proctored setting: Are personality, biodata, and situationaljudgment tests comparable? PERSONNEL PSYCHOLOGY, 56, 733–752.

Pope-Davis DB, Twing JS. (1991). The effects of age, gender, and experience on measuresof attitude regarding computers. Computers in Human Behavior, 7, 333–339.

Potosky D. (2002). A field study of computer efficacy beliefs as an outcome of training:The role of computer playfulness, computer knowledge, and performance duringtraining. Computers in Human Behavior, 18, 241–255.

Potosky D, Bobko P. (1997). Computer versus paper-and-pencil administration mode andresponse distortion in noncognitive selection tests. Journal of Applied Psychology,82(2), 293–299.

Potosky D, Bobko P. (1998). The Computer Understanding and Experience (CUE) scale: Aself-report measure of computer experience. Computers in Human Behavior, 14(2),337–348.

Potosky D, Bobko P. (2001). A model for predicting computer experience from attitudestoward computers. Journal of Business and Psychology, 15(3), 391–404.

Raghunathan TE, Rosenthal R, Rubin DB. (1996). Comparing correlated but nonoverlap-ping correlations. Psycholocial Methods, 1(1), 178–183.

Ren W. (1999). Self-efficacy and the search for government information. Reference andUser Service Quarterly, 38, 283–291.

Reynolds DH, Sinar EF, McClough AC. (2000, April). Evaluation of an Internet-based se-lection procedure. In Mondragon NJ (Chair), Beyond the demo: The empirical natureof technology-based assessments. Symposium presented at the 15th Annual Confer-ence of the Society for Industrial and Organizational Psychology, New Orleans,LA.

Richardson, Bellows, Henry & Co., Inc. (1989). Test of Learning Ability. Washington, DC:Author.

Richardson, Bellows, Henry & Co., Inc. (n.d.) Manager Profile Record. Washington, DC:Author

Richardson, Bellows, Henry & Co., Inc. (n.d.) The RBH Supervisory Profile Record. Wash-ington, DC: Author

Salgado JF, Moscoso S. (2003). Internet-based personality testing: Equivalence of measuresand assessees’ perceptions and reactions. International Journal of Selection andAssessment, 11, 194–203.

Schmidt WC. (1997). World-Wide Web survey research: Benefits, potential problems,and solutions. Behavior Research Methods, Instruments, & Computers, 29, 272–279.

Silver EM, Bennett C. (1987). Modifications of the Minnesota Clerical Test to predictperformance on video display terminals. Journal of Applied Psychology, 72, 153–155.

Simsek Z, Veiga JF. (2001). A primer on Internet organizational surveys. OrganizationalResearch Methods, 4(3), 218–235.

Sinar EF, Reynolds DH. (2001, April). Applicant reactions to Internet-based selection tech-niques. Paper presented at the 16th Annual Conference of the Society for Industrialand Organizational Psychology, San Diego, CA.

Sproull LS. (1986). Using electronic mail for data collection in organizational research.Academy of Management Journal, 29, 159–169.

Stanton JM. (1998). An empirical assessment of data collection using the Internet.PERSONNEL PSYCHOLOGY, 51, 709–725.

Stanton JM, Rogelberg SG. (2001a). Using Internet/Intranet Web pages to collect organi-zational research data. Organizational Research Methods, 4(3), 200–217.

1034 PERSONNEL PSYCHOLOGY

Stanton JM, Rogelberg SG. (2001b, April). Challenges and obstacles in conducting em-ployment testing via the Internet. Symposium paper presented at the 16th AnnualConference of the Society for Industrial and Organizational Psychology, San Diego,CA.

Temple L, Lips HM. (1989). Gender differences and similarities in attitudes toward com-puters. Computers in Human Behavior, 5, 215–226.

Tse ACB. (1998). Comparing the response rate, response speed and response quality of twomethods of sending questionnaires: E-mail vs. mail. Journal of the Market ResearchSociety, 40, 353–361.

Webster J, Martocchio JJ. (June, 1992). Microcomputer playfulness: Development of ameasure with workplace implications. MIS Quarterly, June, 201–226.

Webster J, Martocchio JJ. (1995). The differential effects of software training previews ontraining outcomes. Journal of Management, 21, 757–787.

Wiechmann D, Ryan AM. (2003). Reactions to computerized testing in selection contexts.International Journal of Selection and Assessment, 11, 215–229.

Wonderlic Personnel Test User’s Manual. (1992). Wonderlic Personnel Test, Inc., Liber-tyville, IL: Author.

Van de Vijver FJR, Harsveld M. (1994). The incomplete equivalence of the paper-and-penciland computerized versions of the General Aptitude Test Battery. Journal of AppliedPsychology, 79(6), 852–859.

Vispoel WP, Boo J, Bleiler T. (2001). Computerized and paper and pencil versions of theRosenberg self-esteem scale: A comparison of psychometric features and respondentpreferences. Educational and Psychological Measurement, 61, 461–474.