forming composites of cognitive ability and alternative measures to predict job performance and...

12
Forming Composites of Cognitive Ability and Alternative Measures to Predict Job Performance and Reduce Adverse Impact: Corrected Estimates and Realistic Expectations Denise Potosky* Pennsylvania State University Philip Bobko Gettysburg College Philip L. Roth Clemson University Although there has been empirical attention paid to the criterion-related validity of predictor composites, there has been much less attention paid to the standardized ethnic group differences associated with these composites. One important area of inquiry in predictor composite research is the influence of adding predictors to a test of general mental ability. The limited empirical literature on this practice is mixed, but the prevailing expectation is that there is likely to be higher validity and less adverse impact. Unfortunately, much of the previous work is limited by the presence of inaccurate validity and standardized ethnic group difference values. In this analysis we formed meta-analytic matrices to more accurately estimate the validity and standardized ethnic group differences of several composites that combine a measure of cognitive ability with measures of conscientiousness, a structured interview, or biodata. While results were somewhat complex, we found that adding alternative predictors does not result in a situation in which validity automatically goes up and adverse impact potential automatically goes down. In fact, the reductions in adverse impact (if any) from adding ‘‘non-cognitive’’ predictors were more modest than much of the literature suggests. Introduction T here has been a great deal of research about the criterion-related validity of predictors of job perfor- mance (e.g., Hunter & Hunter, 1984). For over 80 years, such research has dealt with individual predictors of performance (Schmidt & Hunter, 1998). There has also begun to be a substantial interest in the validity of composites of predictors. For example, two sets of researchers examined the uncorrected validity of compo- sites (Bobko, Roth, & Potosky, 1999; Schmitt, Rogers, Chan, Sheppard, & Jennings, 1997) and one set of researchers examined the regression-weighted validity of predictor composites in which the individual predictor validities were corrected for research artifacts such as range restriction and criterion unreliability (Schmidt & Hunter, 1998). Less is known about the adverse impact and adverse impact potential of predictor composites (Salgado, Viswesvaran, & Ones, 2001). Although the issue has been addressed from several avenues including psychometric theory (Sackett & Ellingson, 1997), primary studies (Pulakos & Schmitt, 1996; Ryan, Ployhart, & Friedel, 1998) and meta-analyses (Bobko et al., 1999; Schmitt et al., 1997) provide mixed evidence and recommendations regarding the effectiveness of adding alternative predictors *Address for correspondence: Denise Potosky, Great Valley School of Graduate Professional Studies, Pennsylvania State University, 30 E. Swedesford Rd., Malvern, PA 19355. E-mail: [email protected]. INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT VOLUME 13 NUMBER 4 DECEMBER 2005 304 r 2005 The Authors Journal compilation r 2005 Blackwell Publishing Ltd, 9600 Garsington Road, Oxford, OX4 2DQ, UK and 350 Main St, Malden, MA 02148, USA

Upload: denise-potosky

Post on 14-Jul-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Forming Composites of Cognitive Ability andAlternative Measures to Predict Job

Performance and Reduce Adverse Impact:Corrected Estimates and Realistic Expectations

Denise Potosky*Pennsylvania State University

Philip BobkoGettysburg College

Philip L. RothClemson University

Although there has been empirical attention paid to the criterion-related validity ofpredictor composites, there has been much less attention paid to the standardized ethnicgroup differences associated with these composites. One important area of inquiry inpredictor composite research is the influence of adding predictors to a test of generalmental ability. The limited empirical literature on this practice is mixed, but the prevailingexpectation is that there is likely to be higher validity and less adverse impact.Unfortunately, much of the previous work is limited by the presence of inaccurate validityand standardized ethnic group difference values. In this analysis we formed meta-analyticmatrices to more accurately estimate the validity and standardized ethnic groupdifferences of several composites that combine a measure of cognitive ability withmeasures of conscientiousness, a structured interview, or biodata. While results weresomewhat complex, we found that adding alternative predictors does not result in asituation in which validity automatically goes up and adverse impact potentialautomatically goes down. In fact, the reductions in adverse impact (if any) from adding‘‘non-cognitive’’ predictors were more modest than much of the literature suggests.

Introduction

T here has been a great deal of research about the

criterion-related validity of predictors of job perfor-

mance (e.g., Hunter & Hunter, 1984). For over 80 years,

such research has dealt with individual predictors of

performance (Schmidt & Hunter, 1998). There has also

begun to be a substantial interest in the validity of

composites of predictors. For example, two sets of

researchers examined the uncorrected validity of compo-

sites (Bobko, Roth, & Potosky, 1999; Schmitt, Rogers,

Chan, Sheppard, & Jennings, 1997) and one set of

researchers examined the regression-weighted validity of

predictor composites in which the individual predictor

validities were corrected for research artifacts such as range

restriction and criterion unreliability (Schmidt & Hunter,

1998).

Less is known about the adverse impact and adverse

impact potential of predictor composites (Salgado,

Viswesvaran, & Ones, 2001). Although the issue has been

addressed from several avenues including psychometric

theory (Sackett & Ellingson, 1997), primary studies

(Pulakos & Schmitt, 1996; Ryan, Ployhart, & Friedel,

1998) and meta-analyses (Bobko et al., 1999; Schmitt et al.,

1997) provide mixed evidence and recommendations

regarding the effectiveness of adding alternative predictors

*Address for correspondence: Denise Potosky, Great Valley School

of Graduate Professional Studies, Pennsylvania State University, 30 E.Swedesford Rd., Malvern, PA 19355. E-mail: [email protected].

INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT VOLUME 13 NUMBER 4 DECEMBER 2005

304

r 2005 The AuthorsJournal compilation r 2005 Blackwell Publishing Ltd, 9600 Garsington Road,

Oxford, OX4 2DQ, UK and 350 Main St, Malden, MA 02148, USA

to measures of general mental ability. Despite mixed

evidence, the view that adding other predictors to measures

of general mental ability will increase validity and decrease

adverse impact appears to have received moderate support

in the literature (e.g., Pulakos & Schmitt, 1996). Unfortu-

nately, many studies in this area suffer from methodologi-

cal limitations such as basing estimates of predictor validity

and predictor adverse impact potential on job incumbents.

This means that estimates of composite validity and

composite adverse impact potential will be biased because

they are likely to be range restricted.

The primary purpose of this manuscript is to examine

the validity, adverse impact potential, and estimated

adverse impact of composite predictors of job perfor-

mance. We use matrices that have been corrected for range

restriction and criterion reliability of measurement in order

to illustrate realistic expectations and the importance of

obtaining more accurate estimates of validity and ethnic

group differences. For illustrative purposes, we focus our

analyses on white American and African American ethnic

group differences. Before progressing, we define two key

terms: Adverse impact, and adverse impact potential.

Adverse impact refers to the 4/5th’s rule based on the

Uniform Guidelines (1978). Adverse impact occurs when

the selection ratio of the ‘‘minority’’ group is less than

4/5th’s (or 80%) of the selection ratio of the group with the

highest selection rate (often thought of as the ‘‘majority’’

group; see also Bobko & Roth, 2004). Adverse impact

potential refers to the standardized ethnic group difference

(d) associated with a given predictor of job performance.

The d statistic is computed by subtracting the mean of the

focal minority group from the mean of the majority group

in the numerator. The denominator is the sample-weighted

average standard deviation of the minority and majority

groups. For example, a d of .5 indicates that the majority

group scored, on average, one half of an averaged standard

deviation higher than the minority group.

The Importance of Predictor Composites

Predictor composites have important perceived advantages

over the use of single predictors. One such potential

advantage is that regression-weighted validity is increased.

For example, Schmidt and Hunter (1998) show that a

composite of cognitive ability (r 5.51) and a work sample

test (r 5.54) is more valid than either measure alone

(R 5 .63). These researchers show increased validities for

combinations of predictors such as cognitive ability and

conscientiousness (R 5 .60), cognitive ability and struc-

tured interviews (R 5 .63), and cognitive ability and

biodata (R 5 .52) when regression weighting approaches

are used. Similar advantages of composites are noted by

Cortina, Goldstein, Payne, Davison, and Gilliland (2000).

A second potential advantage is that adding more

predictors to a measure of cognitive ability could reduce

adverse impact, which is important from both a legal and

social perspective. Socially, Hakel (1998) points out the

need for considering more predictors than general mental

ability (g) in employee selection. He notes ‘‘there is a

national quest for a level playing field for employee

selection, and I cannot imagine that a model based on

g alone will turn out to be sufficient’’ (p. 212). Legally,

organizations may feel substantial pressure to consider a

composite of a test of cognitive ability and some additional

alternative measure in lieu of using a measure of cognitive

ability measure by itself.

There is a fairly wide-spread belief among researchers

that selection composites of cognitive ability and alter-

native predictors such as structured interviews, biodata, or

personality tests should typically reduce adverse impact

relative to when cognitive ability tests are used alone (Ryan

et al., 1998). As a specific example of this belief, some

authors advocate using biodata in conjunction with

cognitive ability to minimize adverse impact (Stokes,

Mumford, & Owens, 1994). Other researchers have noted

‘‘. . . the optimal combination of cognitive and non-

cognitive selection has the potential to improve both

validity and the equality of selection rates’’ between various

ethnic groups (Kehoe, 2002, p. 104).

It is our belief that the line of reasoning that a

combination of cognitive measures and ‘‘non-cognitive’’

selection measures will likely reduce adverse impact (and

at the same time increase validity) deserves substantial

empirical investigation. If the logic is correct, organizations

can move forward to reduce adverse impact and increase

validity. If the logic is problematic, decision-makers should

also know so that they can seek other ways to reduce

adverse impact and increase validity.

Previous Predictor Composite Research onStandardized Ethnic Group Differences

Most of the existing literature that compares validity and/

or adverse impact of predictors of job performance makes

comparisons between individual predictors. Examples

of such literature include comparing a large variety of

predictors to cognitive ability on a ‘‘one predictor to one

predictor’’ level (e.g., Hough, Oswald, & Ployhart, 2001;

Reilly & Chao, 1982; Reilly & Warech, 1994). As noted by

Salgado et al. (2001), there has been much less attention

to the adverse impact potential and adverse impact of

composites (again, we note the important work of Schmidt

& Hunter, 1998, on composite validities).

The work on adverse impact of composites can be

summarized by examining psychometric theory studies,

primary studies, and meta-analytic studies. We note that

there is a pattern of mixed findings and recommendations

regarding the usefulness of adding alternative predictors to

general mental ability. However, the overall synthesis tends

toward an expectation that adding alternative predictors

to measures of general mental ability will reduce adverse

impact.

COMPOSITE ETHNIC DIFFERENCES 305

r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005

Sackett and Ellingson (1997) illustrated that while it is

intuitively appealing to add a predictor with a low d to a

test of cognitive ability, results may not mitigate adverse

impact potential as much as some researchers might

believe. For example, adding a predictor with a d of .00

to a predictor with a d of 1.00 (and assuming the predictors

are uncorrelated) results in a unit-weighted composite d of

.71, or a decrease of .29 in adverse impact potential. The

diminution of d is less than expected on an intuitive basis.

Another stream of research in this area is based on

primary studies. In one study, the authors reported that the

strategy of adding a structured interview, a measure of

conscientiousness, and work simulation to a measure of

verbal ability decreased d from 1.03 for verbal ability alone

as a predictor to .63 for a composite of the predictors

(Pulakos & Schmitt, 1996). This represents a 39% decrease

in d. Unfortunately, this study was conducted on job

incumbents and the predictors may have been subject

to range restriction. We address the influence of range

restriction below.

A second primary study examined the influence of

adding personality variables to a measure of cognitive

ability when subjects were job applicants (Ryan et al.,

1998). Given the focus on applicants, the results of this

study were not affected by range restriction. Ryan et al.

found that adding a composite of three dimensions of

personality (service orientation, stress tolerance, and

reliability) to a test of verbal ability did not greatly decrease

adverse impact. The adverse impact ratios were virtually

unchanged in a police sample at a variety of selection

ratios. Adverse impact ratios for a firefighter sample were

also highly similar when comparing a test of verbal ability

to a composite of the test of verbal ability and the three

dimensions of personality.

There are two factors that potentially limit the general-

izability of Ryan et al. (1998), however. First, many of

the analyses of the role of personality were based on a

composite of three dimensions of personality (which were

then added to verbal ability). The unit-weighted composite

d of the personality measures was .19 for the firefighter

sample and .34 for the police sample. Such values are larger

than d’s in the range of .0 to .10 commonly assumed and

reported for personality constructs (Mount & Barrick,

1995; Ones & Viswesvaran, 1998). Hence, their ability to

reduce adverse impact may have been less effective relative

to other measures of personality. Second, sample specific

variations in standard deviations and score distributions

could have influenced the results from this primary study.

Meta-analytic work conducted to date provides a mixed

picture of the effectiveness of adding alternative predictors to

general mental ability. Schmitt et al. (1997) suggested that

adding a structured interview, a measure of conscientiousness,

and a biodata form to a measure of general mental ability

decreased d from 1.0 to .60 (or a 40% drop). An updated

version of this matrix by Bobko et al. (1999) showed that d

dropped from 1.0 to .76 (or a 24% drop). It is important to

note that the updated version of the matrix, using more

accurate values, resulted in a smaller decrease in d.

Bobko et al. (1999) also examined adverse impact ratios

simulated from their meta-analytic matrix and noted that

adverse impact ratios did not decline substantially with the

addition of the three alternative predictors. We also note

that these researchers used uncorrected correlation coeffi-

cients to parallel the work of Schmitt et al. (1997).

However, this still allows range restriction artifacts to

influence their covariance estimates.

Despite mixed evidence of the effectiveness of adding

alternative predictors in adverse impact reduction, there

are influential summaries of this literature that suggest

applied psychologists are likely to see less adverse impact

from this practice. In an influential and often-cited review,

several researchers review much of the above evidence and

write ‘‘With improvements in interviews, and better

methods for documenting job-related experience, valid

methods for measuring less cognitively oriented constructs

are becoming available. When these constructs are

included in test batteries, there is often less adverse impact’’

(Sackett, Schmitt, Ellingson, & Kabin, 2001, p. 315, italics

added for emphasis; also recall comments from Kehoe,

2002, above).

The Influence of Research Artifacts

By definition, a composite of selection devices implies that

several ‘‘tests’’ are given to a group of individuals so that a

single overall score can be computed for each individual. In

a primary study, the requirements for data collection are

that each individual is given each test and a composite score

is computed. When simulating composite selection from a

meta-analytic matrix, we should therefore study compo-

sites applied to an applicant population that has not been

prescreened on another predictor. This allows research

estimates to be most accurate relative to the use of

composites on actual job applicants. Such a perspective

suggests validities and standardized ethnic group differ-

ences should be corrected for potential range restriction.

In our analyses below, we also correct our validities for

criterion unreliability to provide the most accurate

estimates of operational validity without the influence of

research artifacts (Hunter & Schmidt, 2004).

The effect of research artifacts upon the formation of a

composite is a major limitation of the above streams of

research (although this criticism does not apply to Ryan

et al., 1998 or Sackett & Ellingson, 1997). In the case of

primary studies and meta-analytic studies, there are two

major problems with using observed (uncorrected) values

in the study of composites.

First, range restriction, based on previous selection,

results in standardized ethnic group differences (d) for

many individual predictors that are too small relative to

applicant population parameters (for further explanation,

see Bobko, Roth, & Bobko, 2001). In fact, based on Monte

306 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH

International Journal of Selection and Assessmentr 2005 The Authors

Journal compilation r Blackwell Publishing Ltd. 2005

Carlo simulations, substantial range restriction that results

from the selection process can result in estimates of d that

are 30–70% too small (Roth, Bobko, Switzer, & Dean,

2001). One empirical study demonstrated that the d for a

structured interview that was corrected for range restric-

tion was .46 – higher than previous meta-analytic estimates

that had been subject to range restriction (see Roth, Van

Iddekinge, Huffcutt, Eidson, & Bobko, 2002). Or, past

meta-analytic investigations have computed the estimated

d value for biodata based on two concurrent samples

(Gandy, Dye, & Maclane, 1994; Pulakos & Schmitt,

1996). We note below that the biodata d’s for applicant

samples are much larger.

The range restriction from previous selection or

concurrent samples is especially problematic for studying

composites in general and composites involving cognitive

ability in particular. Past meta-analytic investigations (e.g.,

Bobko et al., 1999; Schmitt et al., 1997) used d 5 1.00 for

cognitive ability. A subsequent meta-analysis has suggested

that more accurate values of d for applicants are .72 for

medium complexity jobs and .86 for low complexity jobs

(Roth, Bevier, Bobko, Switzer, & Tyler, 2001). Thus, the d’s

for some alternative predictors were too small (because

they were range restricted) and the d for cognitive ability

was too large. Thus, the field has not used accurate and

realistic ‘‘inputs’’ into meta-analytic matrices.

A second related major problem with using observed

values concerns the use of range-restricted validities. It

is well-known that range-restricted estimates of validity

underestimate the operational validity of predictors

(Hunter & Schmidt, 2004). For example, a composite

validity computed from a restricted validity for cognitive

ability and a restricted validity for a structured interview is

likely to yield a value that is too low. The individual

regression weights for forming the composite are also likely

to be affected.

These concerns apply directly to the study of composites

from previous research. For example, one set of researchers

(Bobko et al., 1999) used a validity estimate of .30 for

cognitive ability and an estimate of .28 for biodata to

parallel the calculations of previous studies. However, the

validities were differentially restricted such that the

corrected validities were .51 for cognitive ability and .32

for biodata. Thus, the weights for calculating both

composite d’s and composite validities were biased relative

to estimates based on corrected values.

One might be tempted to dismiss the above concerns

about accurate weighting on the grounds that concurrent

estimates of validity are similar to predictive estimates of

validity. For example, there is evidence of equivalent results

for predictive and concurrent validity studies for the

construct of cognitive ability (Barrett, Phillips, & Alex-

ander, 1981). These authors reviewed literature for

measures of cognitive ability and concluded that the

observed validity coefficients for concurrent studies were

similar to the observed validity coefficients for predictive

studies. However, their conclusions apply to statistics

based on job incumbents. In contrast, our focus is on the

applicant level of analysis in order to compare r, d, and

composite validity for individuals applying for jobs at the

first hurdle of selection.

Contributions to the Literature

We believe that a meta-analytic approach will allow us to

make a contribution to the literature on the validity,

adverse impact potential, and adverse impact of predictor

composites. First, we can empirically extend the analysis of

composite d’s by Sackett and Ellingson (1997) with meta-

analytic data matrices containing more accurate values. We

also extend the work of Ryan et al. (1998) by using meta-

analysis that cumulates work across a number of studies,

with several different predictors, and minimizes the

influence of sample-specific variance. Finally, we extend

the work of Bobko et al. (1999) by using corrected and

updated standardized ethnic group differences and validity

estimates. It is our hope to provide individuals with an

accurate summary of the influence of forming predictor

composites on validity, adverse impact, and demonstrate

the trade-offs inherent in such decisions.

Cognitive Ability and Conscientiousness Composite:Study 1

In this section, we examine a composite of a test of

cognitive ability and a measure of conscientiousness.

Several principles guided our selection of the most

appropriate d and R estimates for this meta-analytic matrix

(Viswesvaran & Ones, 1995). First, our criterion was

performance per se within a given job. There are interesting

arguments for use of measures of progression (e.g.,

promotion, salary, organizational level ‘‘points,’’ etc.;

Carlson, Scullen, Schmidt, Rothstein, & Erwin, 1999).

However, we tried to maintain a focus on performance

given its importance as a criterion in the Uniform Guide-lines (1978), and also in order to have a more clearly

defined dependent variable. Second, we looked for primary

studies or meta-analyses that reported within-job correla-

tions and then cumulated them. We focused on within-job

studies to avoid having extraneous or confounding across-

job variance in our population estimates (Ostroff &

Harrison, 1999). Third, we attempted to control the

influence of job complexity, i.e., the informational

demands of a job. For example, job complexity can

influence the validity of cognitive ability tests as more

complex jobs are associated with higher validities (Hunter

& Hunter, 1984). Complexity can also influence the

standardized ethnic group difference of cognitive ability

tests (Roth, BeVier et al., 2001). We focused on medium-

complexity jobs. Table 1 shows our values for a matrix

of the relationships between a test of cognitive ability,

a measure of conscientiousness, and job performance.

COMPOSITE ETHNIC DIFFERENCES 307

r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005

The best estimate for the corrected validity of cognitive

ability (g) for medium complexity jobs is .51 (Schmidt &

Hunter, 1998). This estimate is also presented in Hunter

and Hunter (1984) and is based on a meta-analysis of more

than 500 studies of the cognitive ability – performance

relationship. The figure of .51 is corrected for both range

restriction and criterion reliability and, thus, is suitable for

our analyses.

The best estimate of the conscientiousness – job

performance relationship is .22 from Hurtz and Donovan

(2000). We chose this value as it was computed only on

criteria measuring job performance. Further, the value of

.22 has been corrected for range restriction and criterion

reliability, but not predictor reliability. We did not choose

to include the value of .31 from Mount and Barrick (1995)

because it included studies using a variety of criterion

measures such as promotion, organizational level, and

turnover, as well as overall performance. As noted by a

reviewer, Mount and Barrick also used multiple scales to

construct their conscientiousness measure. This method

might produce a more reliable measure of conscientious-

ness than is actually used in practice, and such a measure

might therefore overestimate the typical validity observed

for conscientiousness.

In arriving at an estimate of the correlation between

conscientiousness and cognitive ability, we did not include

data from concurrent validity studies (e.g., McHenry,

Hough, Toquam, Hanson, & Ashworth, 1990). We found

only one study that reported a correlation from a sample of

job applicants (r 5.12 in Ryan et al., 1998). However,

we also found six other coefficients from samples of

the ‘‘general population’’ that reported correlations be-

tween self-report measures of conscientiousness and

general mental ability that we believe were likely to be

unaffected by range restriction. Three coefficients were

found in Ackerman and Heggestad (1997). We also

retrieved several other studies from Cortina et al. (2000)

that included Gully, Payne, Kiechel, and Whiteman (2000),

Gully, Phillips, Beaubien, and Payne (1998), and Phillips

and Gully (1997). The resulting meta-analytic estimate is

.03 (K 5 7, N 5 6759). Such a figure agrees with the

literature that estimates a relatively low correlation

between measures of personality and cognitive ability

(e.g., Mount & Barrick, 1995).

We used the meta-analytic value of d 5 .72 for appli-

cant-level differences between African Americans and

white Americans for paper and pencil tests of cognitive

ability within medium complexity jobs (Roth, BeVier et al.,

2001, Table 2). Our applicant d-value of .72 is somewhat

lower than the typically used value of d 5 1.00 which was

based on narrative reviews of the literature (e.g., Hunter &

Hunter, 1984).

Table 1. Effect size estimates for cognitive ability, conscientiousness, structured interviews, and biodata1

Job performance Cognitive abilityStandardized ethnicgroup differences

Cognitive ability r 5.51 d 5 .72K 5 515 K 5 18

N not specified N 5 31,990Schmidt and Hunter (1998) Roth, BeVier et al. (2001)

Conscientiousness r 5.22 r 5.03 d 5 .06K 5 42 K 5 7 K 5 3

N 5 7342 N 5 6759 N 5 4545Hurtz and Donovan (2000) Ackerman and Heggestad

(1997), Gully et al. (1998,2000), Ryan et al. (1998)

Ryan et al. (1998),Wonderlic Inc. (2000)

Structured interview r 5.48 r 5.31 d 5 .31K 5 1492 K 5 21 K 5 21

N 5 18,524 N 5 8817 N 5 8817Huffcutt and Arthur (1994),

McDaniel et al. (1994)Huffcutt et al. (1996) Huffcutt and Roth (1998)

Biodata r 5.32 r 5.37 d 5 .57K 5 5 K 5 2 K 5 2

N 5 11,332 N 5 5475 N 5 6115Rothstein et al. (1990) Dean (1999), Kriska (2001) Dean (1999), Kriska (2001)

Notes: 1Studies used to calculate the corrected correlations are noted below the corrected correlations.Numbers obtained from each of these studies are discussed in the text of this paper.2We recognize that there may be substantial overlap in the primary studies used for the three meta-analyses.

308 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH

International Journal of Selection and Assessmentr 2005 The Authors

Journal compilation r Blackwell Publishing Ltd. 2005

The value of d for conscientiousness was somewhat

more difficult to estimate. We first examined the literature

for a value based upon job applicants. We noted the value

of d 5 .09 for a sample of over 700,000 job applicants who

took integrity tests (Ones & Viswesvaran, 1998). However,

the construct of integrity contains other concepts than just

conscientiousness. However, we did note the value of

d 5 .08 for conscientiousness from a sample of MBA

students (N 5 814) for the Personal Characteristics In-

ventory conscientiousness scale (Wonderlic Inc., 2000) and

two d’s of .06 from samples of job applicants in Ryan et al.

(1998) (N’s of 2210 and 1521). We meta-analyzed these

values and found an average d of .06 (K 5 3, N 5 4545).

We believe the value of .06 reflects the general consensus

that personality factors are associated with relatively small

standardized ethnic group differences (Hough, 1998;

Hough et al., 2001; Mount & Barrick, 1995).

In addition to estimating both d and R for composites,

we also estimate the adverse impact that would accompany

such figures. This required estimates of the proportion of

African American and white American applicants. Bobko

et al. (1999) used the ratio of 80%–20% to parallel the

work of Schmitt et al. (1997). Instead, we used labor

statistics for relevant figures. The African American and

white American proportions of employed individuals in

the U.S. workforce for December 1999, seasonally adjusted,

were 11.9% African American and 88.1% white (Bureau of

Labor Statistics, 2000). Other possible proportions are

noted in the discussion. As in Schmitt et al. (1997) and

Bobko et al. (1999), our meta-analytic correlation matrix

was used to generate a regression equation predicting job

performance, as well as multiple R for the composite. The

standardized ethnic group differences in individual predic-

tors were used to calculate the standardized predicted

difference in composite scores, i.e., generation of the

composite d. We used the composite d to calculate adverse

impact at a range of selection ratios, as shown in Table 2.

The results of study 1 are reported in Table 2. We use a

test of cognitive ability as a benchmark (see Kehoe, 2002;

Schmidt and Hunter, 1998). As noted above, the validity

for cognitive ability is .51 and the applicant d for cognitive

ability alone is .72. We also show that for a single measure

of cognitive ability, adverse impact occurs at all selection

ratios until .9 (see second column of Table 2). This means

that adverse impact is estimated to occur unless one hires

90% or more of the applicants. Note that we assumed

normality as well as equal predictor standard deviations in

the two groups compared.

Adding a test of conscientiousness to form a regression-

weighted composite results in a d of .68 – a 5% reduction

relative to a test of cognitive ability (d 5 .72). The adverse

impact ratios for the composite improved by .01 to .03

when compared to a measure of cognitive ability alone. As

with cognitive ability alone, adverse impact is estimated to

occur unless one hires 90% of the applicants (see Table 2,

which reports results in increments of .10 in the overall

selection ratio). Overall, the regression-weighted compo-

site incorporating conscientiousness does not make large

differences in the degree of expected adverse impact

regardless of the selection ratio.

The regression-weighted composite provided increased

validity of .55 that is quite similar to the value of .60 from

Table 2. Results of analyses for cognitive ability and conscientiousness composite

Cognitive

Study 1(conscientiousness)

Regression

Study 2(structured interview)

Regression

Study 3(biodata)

RegressionAbility test Composite Composite Composite

Standardized group differences in validityd .72 .68 .65 .78r(R) .51 .55 .61 .53

Adverse impact ratiosSelection ratio.1 .25 .25 .28 .25.2 .30 .31 .36 .26.3 .36 .39 .45 .33.4 .44 .46 .47 .39.5 .48 .50 .54 .46.6 .56 .57 .59 .51.7 .63 .64 .66 .60.8 .71 .71 .74 .68.9 .801 .82 .83 .78

Note:1 Adverse impact ratios (as defined by the 4/5th’s rule) in bold font denote that adverse impact has beeneliminated.

COMPOSITE ETHNIC DIFFERENCES 309

r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005

Schmidt and Hunter (1998). The difference may be traced

to our use of the conscientiousness validity coefficient from

the most recent meta-analysis of this topic by Hurtz and

Donovan (2000).

Cognitive Ability and a Structured Interview:Study 2

We also examined the possibility of using a structured

interview in a composite with a measure of cognitive

ability. Before progressing, we note that the use of an

interview for all applicants does have substantial practical

concerns. The interview is a time- and labor-intensive

predictor to administer. Thus, some organizational deci-

sion-makers may not wish to use it in certain selection

scenarios. However, we consider its use here for a number

of reasons. First, it has been suggested as an alternative

predictor to cognitive ability, as well as being used with

cognitive ability in a composite (e.g., Schmitt et al., 1997).

Second, it has relatively large estimated validity. As such,

it will be associated with a larger regression-weight in

forming a composite and it could have a larger influence

on multiple R and d. Third, organizations may use

interviews early in the selection process (e.g., college

recruiting).

The relevant meta-analytic values are presented in Table

1. The validity and standardized ethnic group differences

for cognitive ability tests from study 1 were incorporated

here. Our best estimate of the structured interview – job

performance relationship is .48. It is based on the work

of Huffcutt and Arthur (1994) and McDaniel, Whetzel,

Schmidt, and Maurer (1994). Given our focus on job

performance as the criterion, we opted not to include

Weisner and Cronshaw’s (1988) meta-analytic validity

estimate in our demonstration. Weisner and Cronshaw’s

estimate appears to have combined training success

measures and job performance measures, with no sub-

analyses by criterion type.

We estimated the structured interview – cognitive ability

correlation from Huffcutt, Roth, and McDaniel (1996).

We corrected their observed value of r 5.23 for highly

structured interviews for range restriction (u 5 .74) based

on Huffcutt and Arthur (1994) – resulting in a value of .31.

The value of d 5 .31 in Table 1 represents our best estimate

of standardized African American-white American ethnic

group differences for structured interviews. Previous

researchers used the value of d 5 .23 based on the work

of Huffcutt and Roth (1998). We corrected the value of .23

for range restriction also based on u 5 .74.

The results of our analysis for study 2 are presented in

Table 2. Adding an interview to a measure of cognitive

ability by regression-weighting results in a d of .65 (a 10%

reduction in d). Adverse impact ratios are improved by .03

to .09 when compared to a measure of cognitive ability

used alone. Once again, adverse impact occurs unless one

hires 90% of the applicants. Composite validity was

estimated to be .61.

Cognitive Ability and Biodata: Study 3

Forming a matrix to study the composite of cognitive

ability and biodata was difficult given the relatively small

number of published studies in this area. The values we

used for this matrix are shown in Table 1. Rothstein,

Schmidt, Erwin, Owens, and Sparks (1990) reported a

range restriction and criterion measurement reliability

corrected validity for biodata studies. They reported a

corrected validity of .32 for the job of supervisor (which is

defined as a medium complexity job by Hunter, Schmidt, &

Judiesch, 1990). Other sources of biodata validity reported

in Bobko et al. (1999) and Schmitt et al. (1997) do not

report such correction information.

We were able to find two studies that reported applicant

level data for the correlation between biodata and cognitive

ability. A recent within-job study reported a correlation of

.42 (N 5 3599) for a job in a large federal agency (Dean,

1999). Although the biodata measure was administered

after a cognitive ability screen, the author was able to

correct the correlation for range restriction to estimate the

applicant group correlation. Kriska (2001) described a

situation in which all applicants took both a measure of

cognitive ability and a biodata instrument. The applicant

level r was .26. We used sample weights to average these

values for a final estimate of .37 (N 5 5475).

We found two studies that reported biodata applicant

level statistics. Dean (1999) reported a range restriction

corrected d of .73 (N 5 3599). Kriska (2001) reports an

applicant d of .27 (N 5 1876). We averaged these two

values for a final estimate of .57. We note that the value of

.57 is substantially higher than the value of d 5 .33 used by

Bobko et al. (1999) and Schmitt et al. (1997), which was

largely based on Gandy et al. (1994). Gandy et al. (1994)

estimate was based on incumbents (d 5 .35). We note that

our ability to find only two applicant level biodata d’s was

despite multiple literature searches and repeated requests

to leading biodata consulting firms for data. Unfortunately,

our requests for applicant level information met with no

substantive replies.

The results for study 3 are shown in Table 2. Adding a

biodata measure to a measure of cognitive ability results in

a regression-weighted composite d of .78. This is the largest

composite d for any of our three analyses and is notable

because it is larger than the d for cognitive ability alone

(8% increase in d). Adverse impact ratios were also .02 to

.06 worse than for cognitive ability alone. This may be at

least partially due to the moderate correlation between

cognitive ability and biodata as well as a somewhat lower

validity coefficient for biodata relative to g. The results of

this analysis may serve as a caution to decision-makers,

since adding an alternative predictor with a moderate d and

310 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH

International Journal of Selection and Assessmentr 2005 The Authors

Journal compilation r Blackwell Publishing Ltd. 2005

intercorrelation with cognitive ability may actually exacer-

bate adverse impact potential.

We caution the reader that even though the N’s in most

‘‘cells’’ of our matrix are moderate, there are only two

studies that report unrestricted d’s for biodata and two

unrestricted correlations between biodata and cognitive

ability. Thus, our results are not definitive.

Validity was estimated to be .53 for the regression-

weighted composite. The moderate correlation between

the two independent variables led to a small increment in

validity when using regression weights. Given our rela-

tively small sample sizes noted above, we also conducted

one set of additional analyses. We wanted to examine what

might happen to the composite d under the best circum-

stances that we could expect from the available biodata

information on applicants. To do this, we set the values of d

and r for the intercorrelation with g for biodata at the

‘‘optimistic’’ levels of .27 and .26, respectively. That is, we

set the values based on the work of Kriska (2001) alone.

This lowers the biodata correlation with g and lowers the d

associated with biodata. In both cases, this should serve to

decrease d as much as possible given available data. When

we computed the biodata – g composite under these

circumstances the d was .69 (3% decrease in d) and the

regression-weighted composite validity was .55. Thus,

results show only a slight decrease in d under the ‘‘best of

circumstances’’ and an increase of composite d when all the

available data are considered.

Discussion

The Adverse Impact and Validity of Composites

The results of our analyses are interesting and complex. For

a conscientiousness – g composite, regression weights lead

to a slight reduction in adverse impact potential (from

d 5 .72 to d 5 .68), and the composite does have slightly

increased validity relative to a test of cognitive ability alone

(from r 5.51 to R 5 .55). For the interview – g composite,

there was an increase in validity (from r 5.51 to R 5 .61)

and a decrease in adverse impact potential (d decreases

from .72 to .65). In terms of the biodata and g composite,

adverse impact potential is increased relative to a test of

cognitive ability alone. Our ‘‘bottom line’’ interpretation is

that adding alternative predictors does not automatically

result in a ‘‘win–win’’ situation in which validity auto-

matically goes up and adverse impact potential automati-

cally goes down. Overall, it appears that a strategy of

adding an alternative predictor to a measure of cognitive

ability in order to reduce adverse impact often results in

relatively modest decreases in adverse impact potential. For

example, d decreases about 5% when adding a measure

of conscientiousness and 10% when adding a structured

interview using regression weighting. As well, the adverse

impact ratios in Table 2 are almost always less than .80,

regardless of the composite used. Thus, our meta-analytic

results appear to be similar to the primary study results of

Ryan et al. (1998) in that adverse impact was not greatly

reduced. Together, these studies begin to raise an interesting

question of just how likely it is for organizations to actually

avoid adverse impact in selection systems.

Other Factors in Decisions toUse Selection Composites

While we have tried to provide accurate estimates of

validity and adverse impact, other factors can be important

in such decisions. In terms of conscientiousness, decision-

makers will need to think through the issue of faking and its

possible implications (e.g., Douglas, McDaniel, & Snell,

1996; Ellingson, Sackett, & Hough, 1999; Ellingson,

Smith, & Sackett, 2001; Graham, McDaniel, Douglas, &

Snell, 2002; Ones, Viswesvaran, & Reiss, 1996). We note

that not all researchers view faking on conscientiousness

scales as a problem (cf., Ellingson et al., 1999; Ones et al.,

1996). Decision-makers may also wish to consider the

practicality or feasibility of administering certain predic-

tors. Structured interviews have important cost and

feasibility concerns when interviewing a large number of

individuals. Further, one might need to think through the

utility of such actions if the jobs in question were not

associated with at least moderate economic returns (i.e.,

the standard deviation of job performance in dollars). Also,

biodata measures are typically costly to develop and may

require periodic ‘‘re-keying’’ in addition to the fact that

biodata d’s may be larger than some individuals had

thought. Finally, it might also prove interesting to see if the

pattern of results is robust to weighting schemes (e.g.,

regression weights versus unit weights).

Implications of Accurate Estimates ofStandardized Ethnic Group Differences

Accurate estimates of parameters are important to both

researchers and decision-makers. Assume that a group of

decision-makers is designing a selection system for medium

complexity jobs using a composite of g and an alternative

predictor. They might mistakenly assume that the d value of

the alternative is zero (or close to zero). For example, we

have seen published articles (e.g., Cascio & Phillips, 1979)

and technical reports suggesting this description charac-

terizes work sample tests. Assuming equal validity, no

intercorrelation of these variables, an applicant d of .72 for

g, and a d of .00 for work samples, the composite d would

be expected to be .51. However, as noted below, there is

reason to suggest the d may be at least .48 for work sample

tests. Still assuming no intercorrelation between predictors,

if d 5 .48 for work sample tests, the expected composite d

increases to .85 – an increase of .13 relative to the d for

cognitive ability alone. Alternatively, assuming an inter-

correlation of .3 for the two previously mentioned

predictors (d 5 .72 and d 5 .48, respectively) would lead

COMPOSITE ETHNIC DIFFERENCES 311

r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005

to a d of .75. Such estimates might surprise decision-makers

upon implementation and have negative unanticipated

consequences for the organization.

As new data become available, interested readers may be

able to incrementally add other values for d and r to arrive

at different meta-analytic estimates from those presented in

this paper. It is essential, however, that researchers and

practitioners in fact consider the issues and estimates

presented in this paper, rather than simply assume that

adding predictors will automatically reduce adverse

impact. In particular, when forming a composite with g,

the characteristics of the alternative predictor are quite

important. It appears that the alternative must have little

correlation with cognitive ability (to achieve the slight

reduction of adverse impact reduction associated with

conscientiousness) or the alternative must have high

validity (i.e., the structured interview; see also Schmitt

et al., 1997). Alternative predictors with sizeable d’s or

moderate correlations with cognitive ability might actually

increase adverse impact potential.

Another important implication for decision-makers and

researchers is the need to consider the adverse impact

potential and adverse impact of various predictors at the

same level of analysis. If decision-makers are considering

how to design a selection system to choose among

applicants, it is important to make sure that all psycho-

metric estimates are applicable to that same level of

analysis. If this issue is ignored, decision-makers using

restricted d’s to estimate applicant d’s may find they have

substantially higher levels of adverse impact than they had

projected.

Limitations

We note two sets of limitations to this work. First, there is a

lack of published research in several areas used in our

calculations. We had only two applicant level studies in the

biodata ethnic group differences cell and the biodata –

cognitive ability correlation cell (though total sample sizes

exceeded 5000). We also did not have a great deal of data

on applicant ethnic group differences in conscientiousness.

Our work is also limited by the fact that we used

unrestricted d’s and unrestricted predictor intercorrelation

values for only four major predictors of job performance in

our field. Other predictors such as situational judgment

tests or work samples were not included nor were other

portions of the job performance domain such as contextual

performance. These limitations give rise to extensive future

research needs noted below.

Future Research Needs

Ethnic Group Differences Research. In comparison to

a fairly sizeable body of literature that can be brought to

bear on composite validity (e.g., see Schmidt & Hunter,

1998), there is much more research needed in terms of the

ethnic group differences of various predictors. Once this

basic research is done, researchers could begin to study

many more composites meta-analytically. While there is

now a published meta-analysis of differences on paper and

pencil measures of cognitive ability (Roth, BeVier et al.,

2001), we were surprised by the lack of a major applicant-

level meta-analysis of conscientiousness and other factors

in the Five Factor Model of personality. Although the

current consensus is that that ethnic group differences are

small (e.g., Mount & Barrick, 1995), a meta-analytic effort

would be helpful to precisely document estimates with state

of the art empirical methods. We suggest that such future

work clearly articulate findings by constructs (e.g.,

conscientiousness studies that do not mix integrity and

conscientiousness) and methods when possible. It is also

important to report values separately for applicants and job

incumbents.

We also note a need to examine the unrestricted d’s of

other predictors of performance noted in Schmidt and

Hunter (1998) such as job knowledge tests, training and

experience records (using a behavioral consistency ap-

proach), job experience measures, and training and

experience records using a ‘‘points approach.’’ For exam-

ple, Bernardin’s (1984) work on standardized ethnic group

differences for measures of job knowledge reported a d

of .42. However, this work was done to examine job

knowledge measures as criteria rather than predictors and

the studies mainly included job incumbents so that we do

not know the unrestricted d for job applicants. We are also

aware of Schmitt, Clause, and Pulakos’ (1996) work that

resulted in a d of .38 for a category of job knowledge, work

sample, and situational judgment tests. Again, most of

these studies were conducted on incumbents. The authors

are not aware of any ethnic group difference studies of

training and experience records or other measures of

experience for unrestricted samples of applicants. There

are also a number of predictors of job performance

reviewed by Schmidt and Hunter (1998) that are more

focused on methods than particular constructs. For

example, structured interviews can be designed to focus

on a variety of constructs (Huffcutt, Conway, Roth, &

Stone, 2001), including cognitive ability. For another

example, a work sample test focusing on complex material

could load more highly on cognitive ability than a work

sample test focusing on psycho-motor skills. In investiga-

tions of all of the ‘‘method-based’’ predictors we urge

researchers to work towards a focus on the constructs

that are designed to be measured by such predictors, as

the constructs could substantially influence results. There

is still substantial work to be done to more accurately

articulate the unrestricted d’s for ‘‘method-based’’

predictors.

Research on Predictor Intercorrelations. There are

also many pressing needs in the area of predictor

intercorrelations. Schmidt and Hunter (1998) noted that

there were not enough data on predictor intercorrelations

312 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH

International Journal of Selection and Assessmentr 2005 The Authors

Journal compilation r Blackwell Publishing Ltd. 2005

to study composites of alternative predictors. The situation

has not changed greatly in the few years since their work

was published. For example, one obvious need for research

is to provide an applicant level estimate of the biodata-

conscientiousness correlation to facilitate investigation on

such a composite. There are also needs in virtually every

combination of predictors studied by Schmidt and Hunter

(1998) and we refer the reader to their Table 1 for an

extensive list. While such work is hardly likely to be

considered ‘‘glamorous’’ by researchers, it can be important

to meta-analytically examine both the validity and

standardized ethnic group differences of a variety of

predictors of job performance and to understand the

psychometric characteristics of these predictors in compo-

sites that involve even more than two predictors.

Other Research. While not explicitly a part of ethnic

group differences research, we also urge research into the

area of how managers view the trade-off between validity

and standardized ethnic group differences (and adverse

impact). If adverse impact is viewed as a relatively

continuous variable, what is the functional relationship

between levels of adverse impact and managerial evalua-

tions of selection systems? We suggest the use of Multi-

attribute Utility Analysis (Roth & Bobko, 1997) is one way

to investigate these perceptions.

A reviewer also noted that given our focus was within a

particular level of job complexity, the proportions of white

Americans and African Americans may be different than

the values we generated and used. We call on future

investigative work delineating the degree to which job

complexity level influences such a statistic.

Finally, we urge more research into multiple hurdle

selection systems (as per Sackett & Roth, 1996). We hope

the applicant level estimates of r and d in this manuscript,

and those in the future, will allow individuals to model

multiple hurdle selection systems. Such researchers might

start with applicant values and then examine how varying

selection ratios and various hurdles might influence

validity, ethnic group differences, and adverse impact.

In sum, we have emphasized the importance of applicant

level analysis for constructing and evaluating composites

within selection research, and we found that adding

alternative predictors does not result in a situation in

which validity automatically goes up and adverse impact

potential automatically goes down. We believe that such

analyses allow the best possible comparison among

predictors in many situations and should facilitate future

selection system modeling and development. We urge

future research into many of the issues noted above and

look forward to such studies.

Acknowledgement

We would like to thank S. David Kriska for his help in

providing some data for this project.

References

Ackerman, P.L. and Heggestad, E.D. (1997) Intelligence, person-ality, and interests: Evidence for overlapping traits. Psychologi-cal Bulletin, 121, 219–236.

Barrett, G.V., Phillips, J.S. and Alexander, R.A. (1981) Concurrentand predictive validity designs: A critical reanalysis. Journal ofApplied Psychology, 66, 1–6.

Bernardin, H.J. (1984) An analysis of black-white differences in jobperformance. Paper presented at the Academy of Managementmeetings, Boston, MA.

Bobko, P. and Roth, P. (2004) The four-fifths rule for assessingadverse impact: An arithmetic, intuitive, and logical analysis ofthe rule and implications for future research. In J. Martocchio

(Ed.), Research in personnel and human resources management(Vol. 23, pp. 177–197). Amsterdam: Elsevier Press.

Bobko, P., Roth, P.L. and Bobko, C. (2001) Correcting the effect sizeof d for range restriction and unreliability. OrganizationalResearch Methods, 4, 46–61.

Bobko, P., Roth, P.L. and Potosky, D. (1999) Derivation andimplications of a meta-analytic matrix incorporating cognitiveability, alternative predictors and job performance. PersonnelPsychology, 52, 561–589.

Bureau of Labor Statistics. (2000) Employment and earnings, 46(1),13–14 (Chart A-4).

Carlson, K.D, Scullen, S.E., Schmidt, F.L., Rothstein, H. andErwin, F. (1999) Generalizabile biographical data validity can beachieved without multi-organizational development and keying.Personnel Psychology, 52, 731–753.

Cascio, W.F. and Phillips, N.F. (1979) Performance testing: A roseamong thorns. Personnel Psychology, 32, 751–766.

Cortina, J.M., Goldstein, N.B., Payne, S.C., Davison, H.K. andGilliland, S.W. (2000) The incremental validity of interviewscores over and above cognitive ability and conscientiousnessscores. Personnel Psychology, 53, 325–351.

Dean, M. (1999) On biodata construct validity, criterion validityand adverse impact. Unpublished doctoral dissertation, Louisi-ana State University.

Douglas, E.F., McDaniel, M.A. and Snell, A.F. (1996) The validity ofnon-cognitive measures decays when applicants fake. Paperpresented at the 11th Annual Conference of the Academy ofManagement, Cincinnati.

Ellingson, J.E., Sackett, P.R. and Hough, L.M. (1999) Socialdesirability corrections in personality measurement: Issues ofapplicant comparison and construct validity. Journal of AppliedPsychology, 84, 155–166.

Ellingson, J.E., Smith, D.B. and Sackett, P.R. (2001) Investigatingthe influence of social desirability on personality factorstructure. Journal of Applied Psychology, 86, 122–133.

Gandy, J.A., Dye, D.A. and MacLane, C.N. (1994) Federalgovernment selection: The individual achievement record. InG.S. Stokes, M.D. Mumford and W.A. Owens (Eds.), Biodatahandbook: Theory, research, and use of biographical informa-tion in selection and performance prediction (pp. 275–310). PaloAlto, CA: CPP Books.

Graham, K.E., McDaniel, M.A., Douglas, E.F. and Snell, A.F.(2002) Biodata validity decay and score inflation with faking:Do item attributes explain variance across items? Journal ofBusiness and Psychology, 16, 573–592.

Gully, S.M., Payne, S.C., Kiechel, K.L. and Whiteman, J.K. (2000).The impact of error training and individual differences ontraining outcomes: An attribute treatment interaction perspec-tive. Working manuscript.

Gully, S.M., Phillips, J.M., Beaubien, J.M. and Payne, S.C. (1998,April) Effects of Individual differences in goal orientation andself-regulatory tendencies on learning. In S.M. Gully and J.E.

COMPOSITE ETHNIC DIFFERENCES 313

r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005

Mathieu (Chairs), Individual differences, learning, motivation,and training outcomes. Symposium presented at the 13th AnnualConference of the Society for Industrial and OrganizationalPsychology, Dallas, TX.

Hakel, M.D. (1998) Beyond multiple choice: Evaluating alterna-tives to traditional testing for selection. Mahwah, NJ: Earlbaum.

Hough, L. (1998) Personality at work: Issues and evidence. InM. Hakel (Ed.), Beyond multiple choice: Evaluating alternativesto traditional testing for selection (pp. 131–159). Mahwah, NJ:

Erlbaum.Hough, L., Oswald, F. and Ployhart, R. (2001) Adverse impact and

group differences in constructs, assessment tools, and personnelselection procedures: Issues and lessons learned. InternationalJournal of Selection and Assessment, 9, 152–194.

Huffcutt, A.I. and Arthur, W. Jr. (1994) Hunter and Hunter (1984)revisited: Interview validity for entry-level jobs. Journal ofApplied Psychology, 79, 184–190.

Huffcutt, A.I., Conway, J., Roth, P.L. and Stone, N. (2001)Identification and meta-analysis of constructs measured in employ-ment interviews. Journal of Applied Psychology, 86, 897–913.

Huffcutt, A.I. and Roth, P.L. (1998) Racial group differencesin interview evaluations. Journal of Applied Psychology, 83,179–189.

Huffcutt, A.I., Roth, P.L. and McDaniel, M.A. (1996) A meta-analytic investigation of cognitive ability in employment inter-view evaluations: Moderating characteristics and implicationsfor incremental validity. Journal of Applied Psychology, 81,459–473.

Hunter, J.E. and Hunter, R.F. (1984) Validity and utility ofalternative predictors of job performance. Psychological Bulle-tin, 96, 72–98.

Hunter, J.E. and Schmidt, F.L. (2004) Methods of meta-analysis:Correcting for error and bias in research findings (2nd Edn.).Newbury Park: Sage.

Hunter, J.E., Schmidt, F.L. and Judiesch, M.K. (1990) Individualdifferences in output variability as a function of job complexity.Journal of Applied Psychology, 75, 28–42.

Hurtz, G.M. and Donovan, J.J. (2000) Personality and jobperformance: The big five revisited. Journal of AppliedPsychology, 85, 869–879.

Kehoe, J.F. (2002) General mental ability and selection in privatesector organizations: A commentary. Human Performance, 15,97–106.

Kriska, S.D. (2001, August) The validity-adverse impact trade-off:Real data and mathematical model estimates. Paper presented atthe Society for Industrial and Organizational Psychology meet-ings, San Diego, CA.

McHenry, J., Hough, L., Toquam, J., Hanson, M. and Ashworth, S.(1990) Project A validity results: The relationship betweenpredictor and criterion domains. Personnel Psychology, 43,335–354.

McDaniel, M.A., Whetzel, D.L., Schmidt, F.L. and Maurer, S.D.(1994) The validity of employment interviews: A comprehensivereview and meta-analysis. Journal of Applied Psychology, 79,599–616.

Mount, M.K. and Barrick, M. (1995) The Big Five personalitydimensions: For research and practice in human resourcesmanagement. Research in Personnel and Human ResourcesManagement, 13, 823–854.

Ones, D. and Viswesvaran, C. (1998) Gender, age, and racedifferences on overt integrity tests: Results across four largescale job applicant data sets. Journal of Applied Psychology, 83,35–42.

Ones, D.S., Viswesvaran, C. and Reiss, A.D. (1996) Role of socialdesirability in personality testing for personnel selection: The redherring. Journal of Applied Psychology, 81, 660–679.

Ostroff, C. and Harrison, D.A. (1999) Meta-analysis: Level ofanalysis, and best estimates of population correlations: Cautionsfor interpreting meta-analytic results in organizational behavior.Journal of Applied Psychology, 84, 260–270.

Phillips, J.M. and Gully, S.M. (1997) Role of goal orientation,ability, need for achievement, and locus of control in the self-efficacy and goal setting process. Journal of Applied Psychology,82, 792–802.

Pulakos, E. and Schmitt, N. (1996) An evaluation of two strategiesfor reducing adverse impact and their effects on criterion relatedvalidity. Human Performance, 9, 241–258.

Reilly, R.R. and Chao, G. (1982) Validity and fairness of somealternative employee selection procedures. Personnel Psychol-ogy, 35, 1–62.

Reilly, R.R. and Warech, M.A. (1994) The validity and fairness ofalternatives to cognitive ability tests. In L. Wing and B. Gifford

(Eds.), Policy issues in employment testing. Boston: Kluwer.

Roth, P.L., BeVier, C.A., Bobko, P., Switzer, F.S. III. and Tyler, P.(2001) Ethnic group differences in cognitive ability in employ-ment and education settings: A meta-analysis. PersonnelPsychology, 54, 297–330.

Roth, P.L. and Bobko, P. (1997) A research agenda for multi-attribute utility analysis in human resource management.Human Resources Management Review, 7, 341–368.

Roth, P.L., Bobko, P., Switzer, F.S. III. and Dean, M.A. (2001) Priorselection causes biased estimates of standardized ethnic groupdifferences: Simulation and analysis. Personnel Psychology, 54,591–617.

Roth, P.L., Van Iddekinge, C.H., Huffcutt, A.I., Eidson, C.E. Jr. andBobko, P. (2002) Correcting for range restriction in structuredinterview ethnic group differences: The values may be largerthan researchers thought. Journal of Applied Psychology, 87,369–376.

Rothstein, H., Schmidt, F.L., Erwin, F., Owens, W. and Sparks, C.P.(1990) Biographical data in employment selection: Can validities bemade generalizable? Journal of Applied Psychology, 75, 174–184.

Ryan, A.M., Ployhart, R.E. and Friedel, L.A. (1998) Usingpersonality testing to reduce adverse impact: A cautionary note.Journal of Applied Psychology, 83, 298–307.

Sackett, P.R. and Ellingson, J. (1997) The effects of forming multi-predictor composites on group differences and adverse impact.Personnel Psychology, 50, 707–721.

Sackett, P.R. and Roth, L. (1996) Multi-stage selection strategies:A Monte Carlo investigation of effects on performance andminority hiring. Personnel Psychology, 49, 1–18.

Sackett, P.R., Schmitt, N., Ellingson, J.E. and Kabin, M.E. (2001)High stakes testing in employment, credentialing, and highereducation: Prospects in a post affirmative action world.American Psychologist, 56, 302–318.

Salgado, J.F., Viswesvaran, C. and Ones, D.S. (2001) Predictors usedfor personnel selection: An overview of constructs, methods,and techniques. In N. Anderson, D. Ones, H. Sinangil and

C. Viswesvaran (Eds.), Handbook of industrial, work, &organizational psychology (pp. 165–199). London: Sage.

Schmidt, F.L. and Hunter, J.E. (1998) The validity of selectionmethods in personnel psychology: Practical and theoreticalimplications of 85 years of research findings. PsychologicalBulletin, 124, 262–274.

Schmitt, N., Clause, C.S. and Pulakos, E.D. (1996) Subgroupdifferences associated with different measures of some commonjob relevant constructs. In C.L. Cooper and I.T. Robertson

(Eds.), International review of industrial and organizationalpsychology (Vol. 11, pp. 115–137). Chichester, UK: JohnWiley.

Schmitt, N., Rogers, W., Chan, D., Sheppard, L. and Jennings, D.(1997) Adverse impact and predictive efficiency of various predictorcombinations. Journal of Applied Psychology, 82, 719–730.

314 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH

International Journal of Selection and Assessmentr 2005 The Authors

Journal compilation r Blackwell Publishing Ltd. 2005

Stokes, G.S., Mumford, M.D. and Owens, W.A. (1994) Biodatahandbook: Theory, research, and use of biographical informa-tion in selection and performance prediction. Palo Alto, CA:Consulting Psychologists Press.

U.S. Equal Opportunity Employment Commission, U.S. CivilService Commission, U.S. Department of Labor, U.S.Department of Justice. (1978) Uniform guidelines on emp-loyee selection procedures. Federal Register, 43, 38295–38309.

Viswevaran, C. and Ones, D.S. (1995) Theory testing: Combiningpsychometric meta-analysis and structural equations modeling.Personnel Psychology, 48, 865–886.

Weisner, W. and Cronshaw, S. (1988) The moderating impact ofinterview format and degree of structure on the validity of theemployment interview. Journal of Occupational and Organiza-tional Psychology, 61, 275–290.

Wonderlic Inc. (2000) Manual for the personal characteristicsinventory. Libertyville, IL: Wonderlic.

COMPOSITE ETHNIC DIFFERENCES 315

r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005