forming composites of cognitive ability and alternative measures to predict job performance and...
TRANSCRIPT
Forming Composites of Cognitive Ability andAlternative Measures to Predict Job
Performance and Reduce Adverse Impact:Corrected Estimates and Realistic Expectations
Denise Potosky*Pennsylvania State University
Philip BobkoGettysburg College
Philip L. RothClemson University
Although there has been empirical attention paid to the criterion-related validity ofpredictor composites, there has been much less attention paid to the standardized ethnicgroup differences associated with these composites. One important area of inquiry inpredictor composite research is the influence of adding predictors to a test of generalmental ability. The limited empirical literature on this practice is mixed, but the prevailingexpectation is that there is likely to be higher validity and less adverse impact.Unfortunately, much of the previous work is limited by the presence of inaccurate validityand standardized ethnic group difference values. In this analysis we formed meta-analyticmatrices to more accurately estimate the validity and standardized ethnic groupdifferences of several composites that combine a measure of cognitive ability withmeasures of conscientiousness, a structured interview, or biodata. While results weresomewhat complex, we found that adding alternative predictors does not result in asituation in which validity automatically goes up and adverse impact potentialautomatically goes down. In fact, the reductions in adverse impact (if any) from adding‘‘non-cognitive’’ predictors were more modest than much of the literature suggests.
Introduction
T here has been a great deal of research about the
criterion-related validity of predictors of job perfor-
mance (e.g., Hunter & Hunter, 1984). For over 80 years,
such research has dealt with individual predictors of
performance (Schmidt & Hunter, 1998). There has also
begun to be a substantial interest in the validity of
composites of predictors. For example, two sets of
researchers examined the uncorrected validity of compo-
sites (Bobko, Roth, & Potosky, 1999; Schmitt, Rogers,
Chan, Sheppard, & Jennings, 1997) and one set of
researchers examined the regression-weighted validity of
predictor composites in which the individual predictor
validities were corrected for research artifacts such as range
restriction and criterion unreliability (Schmidt & Hunter,
1998).
Less is known about the adverse impact and adverse
impact potential of predictor composites (Salgado,
Viswesvaran, & Ones, 2001). Although the issue has been
addressed from several avenues including psychometric
theory (Sackett & Ellingson, 1997), primary studies
(Pulakos & Schmitt, 1996; Ryan, Ployhart, & Friedel,
1998) and meta-analyses (Bobko et al., 1999; Schmitt et al.,
1997) provide mixed evidence and recommendations
regarding the effectiveness of adding alternative predictors
*Address for correspondence: Denise Potosky, Great Valley School
of Graduate Professional Studies, Pennsylvania State University, 30 E.Swedesford Rd., Malvern, PA 19355. E-mail: [email protected].
INTERNATIONAL JOURNAL OF SELECTION AND ASSESSMENT VOLUME 13 NUMBER 4 DECEMBER 2005
304
r 2005 The AuthorsJournal compilation r 2005 Blackwell Publishing Ltd, 9600 Garsington Road,
Oxford, OX4 2DQ, UK and 350 Main St, Malden, MA 02148, USA
to measures of general mental ability. Despite mixed
evidence, the view that adding other predictors to measures
of general mental ability will increase validity and decrease
adverse impact appears to have received moderate support
in the literature (e.g., Pulakos & Schmitt, 1996). Unfortu-
nately, many studies in this area suffer from methodologi-
cal limitations such as basing estimates of predictor validity
and predictor adverse impact potential on job incumbents.
This means that estimates of composite validity and
composite adverse impact potential will be biased because
they are likely to be range restricted.
The primary purpose of this manuscript is to examine
the validity, adverse impact potential, and estimated
adverse impact of composite predictors of job perfor-
mance. We use matrices that have been corrected for range
restriction and criterion reliability of measurement in order
to illustrate realistic expectations and the importance of
obtaining more accurate estimates of validity and ethnic
group differences. For illustrative purposes, we focus our
analyses on white American and African American ethnic
group differences. Before progressing, we define two key
terms: Adverse impact, and adverse impact potential.
Adverse impact refers to the 4/5th’s rule based on the
Uniform Guidelines (1978). Adverse impact occurs when
the selection ratio of the ‘‘minority’’ group is less than
4/5th’s (or 80%) of the selection ratio of the group with the
highest selection rate (often thought of as the ‘‘majority’’
group; see also Bobko & Roth, 2004). Adverse impact
potential refers to the standardized ethnic group difference
(d) associated with a given predictor of job performance.
The d statistic is computed by subtracting the mean of the
focal minority group from the mean of the majority group
in the numerator. The denominator is the sample-weighted
average standard deviation of the minority and majority
groups. For example, a d of .5 indicates that the majority
group scored, on average, one half of an averaged standard
deviation higher than the minority group.
The Importance of Predictor Composites
Predictor composites have important perceived advantages
over the use of single predictors. One such potential
advantage is that regression-weighted validity is increased.
For example, Schmidt and Hunter (1998) show that a
composite of cognitive ability (r 5.51) and a work sample
test (r 5.54) is more valid than either measure alone
(R 5 .63). These researchers show increased validities for
combinations of predictors such as cognitive ability and
conscientiousness (R 5 .60), cognitive ability and struc-
tured interviews (R 5 .63), and cognitive ability and
biodata (R 5 .52) when regression weighting approaches
are used. Similar advantages of composites are noted by
Cortina, Goldstein, Payne, Davison, and Gilliland (2000).
A second potential advantage is that adding more
predictors to a measure of cognitive ability could reduce
adverse impact, which is important from both a legal and
social perspective. Socially, Hakel (1998) points out the
need for considering more predictors than general mental
ability (g) in employee selection. He notes ‘‘there is a
national quest for a level playing field for employee
selection, and I cannot imagine that a model based on
g alone will turn out to be sufficient’’ (p. 212). Legally,
organizations may feel substantial pressure to consider a
composite of a test of cognitive ability and some additional
alternative measure in lieu of using a measure of cognitive
ability measure by itself.
There is a fairly wide-spread belief among researchers
that selection composites of cognitive ability and alter-
native predictors such as structured interviews, biodata, or
personality tests should typically reduce adverse impact
relative to when cognitive ability tests are used alone (Ryan
et al., 1998). As a specific example of this belief, some
authors advocate using biodata in conjunction with
cognitive ability to minimize adverse impact (Stokes,
Mumford, & Owens, 1994). Other researchers have noted
‘‘. . . the optimal combination of cognitive and non-
cognitive selection has the potential to improve both
validity and the equality of selection rates’’ between various
ethnic groups (Kehoe, 2002, p. 104).
It is our belief that the line of reasoning that a
combination of cognitive measures and ‘‘non-cognitive’’
selection measures will likely reduce adverse impact (and
at the same time increase validity) deserves substantial
empirical investigation. If the logic is correct, organizations
can move forward to reduce adverse impact and increase
validity. If the logic is problematic, decision-makers should
also know so that they can seek other ways to reduce
adverse impact and increase validity.
Previous Predictor Composite Research onStandardized Ethnic Group Differences
Most of the existing literature that compares validity and/
or adverse impact of predictors of job performance makes
comparisons between individual predictors. Examples
of such literature include comparing a large variety of
predictors to cognitive ability on a ‘‘one predictor to one
predictor’’ level (e.g., Hough, Oswald, & Ployhart, 2001;
Reilly & Chao, 1982; Reilly & Warech, 1994). As noted by
Salgado et al. (2001), there has been much less attention
to the adverse impact potential and adverse impact of
composites (again, we note the important work of Schmidt
& Hunter, 1998, on composite validities).
The work on adverse impact of composites can be
summarized by examining psychometric theory studies,
primary studies, and meta-analytic studies. We note that
there is a pattern of mixed findings and recommendations
regarding the usefulness of adding alternative predictors to
general mental ability. However, the overall synthesis tends
toward an expectation that adding alternative predictors
to measures of general mental ability will reduce adverse
impact.
COMPOSITE ETHNIC DIFFERENCES 305
r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005
Sackett and Ellingson (1997) illustrated that while it is
intuitively appealing to add a predictor with a low d to a
test of cognitive ability, results may not mitigate adverse
impact potential as much as some researchers might
believe. For example, adding a predictor with a d of .00
to a predictor with a d of 1.00 (and assuming the predictors
are uncorrelated) results in a unit-weighted composite d of
.71, or a decrease of .29 in adverse impact potential. The
diminution of d is less than expected on an intuitive basis.
Another stream of research in this area is based on
primary studies. In one study, the authors reported that the
strategy of adding a structured interview, a measure of
conscientiousness, and work simulation to a measure of
verbal ability decreased d from 1.03 for verbal ability alone
as a predictor to .63 for a composite of the predictors
(Pulakos & Schmitt, 1996). This represents a 39% decrease
in d. Unfortunately, this study was conducted on job
incumbents and the predictors may have been subject
to range restriction. We address the influence of range
restriction below.
A second primary study examined the influence of
adding personality variables to a measure of cognitive
ability when subjects were job applicants (Ryan et al.,
1998). Given the focus on applicants, the results of this
study were not affected by range restriction. Ryan et al.
found that adding a composite of three dimensions of
personality (service orientation, stress tolerance, and
reliability) to a test of verbal ability did not greatly decrease
adverse impact. The adverse impact ratios were virtually
unchanged in a police sample at a variety of selection
ratios. Adverse impact ratios for a firefighter sample were
also highly similar when comparing a test of verbal ability
to a composite of the test of verbal ability and the three
dimensions of personality.
There are two factors that potentially limit the general-
izability of Ryan et al. (1998), however. First, many of
the analyses of the role of personality were based on a
composite of three dimensions of personality (which were
then added to verbal ability). The unit-weighted composite
d of the personality measures was .19 for the firefighter
sample and .34 for the police sample. Such values are larger
than d’s in the range of .0 to .10 commonly assumed and
reported for personality constructs (Mount & Barrick,
1995; Ones & Viswesvaran, 1998). Hence, their ability to
reduce adverse impact may have been less effective relative
to other measures of personality. Second, sample specific
variations in standard deviations and score distributions
could have influenced the results from this primary study.
Meta-analytic work conducted to date provides a mixed
picture of the effectiveness of adding alternative predictors to
general mental ability. Schmitt et al. (1997) suggested that
adding a structured interview, a measure of conscientiousness,
and a biodata form to a measure of general mental ability
decreased d from 1.0 to .60 (or a 40% drop). An updated
version of this matrix by Bobko et al. (1999) showed that d
dropped from 1.0 to .76 (or a 24% drop). It is important to
note that the updated version of the matrix, using more
accurate values, resulted in a smaller decrease in d.
Bobko et al. (1999) also examined adverse impact ratios
simulated from their meta-analytic matrix and noted that
adverse impact ratios did not decline substantially with the
addition of the three alternative predictors. We also note
that these researchers used uncorrected correlation coeffi-
cients to parallel the work of Schmitt et al. (1997).
However, this still allows range restriction artifacts to
influence their covariance estimates.
Despite mixed evidence of the effectiveness of adding
alternative predictors in adverse impact reduction, there
are influential summaries of this literature that suggest
applied psychologists are likely to see less adverse impact
from this practice. In an influential and often-cited review,
several researchers review much of the above evidence and
write ‘‘With improvements in interviews, and better
methods for documenting job-related experience, valid
methods for measuring less cognitively oriented constructs
are becoming available. When these constructs are
included in test batteries, there is often less adverse impact’’
(Sackett, Schmitt, Ellingson, & Kabin, 2001, p. 315, italics
added for emphasis; also recall comments from Kehoe,
2002, above).
The Influence of Research Artifacts
By definition, a composite of selection devices implies that
several ‘‘tests’’ are given to a group of individuals so that a
single overall score can be computed for each individual. In
a primary study, the requirements for data collection are
that each individual is given each test and a composite score
is computed. When simulating composite selection from a
meta-analytic matrix, we should therefore study compo-
sites applied to an applicant population that has not been
prescreened on another predictor. This allows research
estimates to be most accurate relative to the use of
composites on actual job applicants. Such a perspective
suggests validities and standardized ethnic group differ-
ences should be corrected for potential range restriction.
In our analyses below, we also correct our validities for
criterion unreliability to provide the most accurate
estimates of operational validity without the influence of
research artifacts (Hunter & Schmidt, 2004).
The effect of research artifacts upon the formation of a
composite is a major limitation of the above streams of
research (although this criticism does not apply to Ryan
et al., 1998 or Sackett & Ellingson, 1997). In the case of
primary studies and meta-analytic studies, there are two
major problems with using observed (uncorrected) values
in the study of composites.
First, range restriction, based on previous selection,
results in standardized ethnic group differences (d) for
many individual predictors that are too small relative to
applicant population parameters (for further explanation,
see Bobko, Roth, & Bobko, 2001). In fact, based on Monte
306 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH
International Journal of Selection and Assessmentr 2005 The Authors
Journal compilation r Blackwell Publishing Ltd. 2005
Carlo simulations, substantial range restriction that results
from the selection process can result in estimates of d that
are 30–70% too small (Roth, Bobko, Switzer, & Dean,
2001). One empirical study demonstrated that the d for a
structured interview that was corrected for range restric-
tion was .46 – higher than previous meta-analytic estimates
that had been subject to range restriction (see Roth, Van
Iddekinge, Huffcutt, Eidson, & Bobko, 2002). Or, past
meta-analytic investigations have computed the estimated
d value for biodata based on two concurrent samples
(Gandy, Dye, & Maclane, 1994; Pulakos & Schmitt,
1996). We note below that the biodata d’s for applicant
samples are much larger.
The range restriction from previous selection or
concurrent samples is especially problematic for studying
composites in general and composites involving cognitive
ability in particular. Past meta-analytic investigations (e.g.,
Bobko et al., 1999; Schmitt et al., 1997) used d 5 1.00 for
cognitive ability. A subsequent meta-analysis has suggested
that more accurate values of d for applicants are .72 for
medium complexity jobs and .86 for low complexity jobs
(Roth, Bevier, Bobko, Switzer, & Tyler, 2001). Thus, the d’s
for some alternative predictors were too small (because
they were range restricted) and the d for cognitive ability
was too large. Thus, the field has not used accurate and
realistic ‘‘inputs’’ into meta-analytic matrices.
A second related major problem with using observed
values concerns the use of range-restricted validities. It
is well-known that range-restricted estimates of validity
underestimate the operational validity of predictors
(Hunter & Schmidt, 2004). For example, a composite
validity computed from a restricted validity for cognitive
ability and a restricted validity for a structured interview is
likely to yield a value that is too low. The individual
regression weights for forming the composite are also likely
to be affected.
These concerns apply directly to the study of composites
from previous research. For example, one set of researchers
(Bobko et al., 1999) used a validity estimate of .30 for
cognitive ability and an estimate of .28 for biodata to
parallel the calculations of previous studies. However, the
validities were differentially restricted such that the
corrected validities were .51 for cognitive ability and .32
for biodata. Thus, the weights for calculating both
composite d’s and composite validities were biased relative
to estimates based on corrected values.
One might be tempted to dismiss the above concerns
about accurate weighting on the grounds that concurrent
estimates of validity are similar to predictive estimates of
validity. For example, there is evidence of equivalent results
for predictive and concurrent validity studies for the
construct of cognitive ability (Barrett, Phillips, & Alex-
ander, 1981). These authors reviewed literature for
measures of cognitive ability and concluded that the
observed validity coefficients for concurrent studies were
similar to the observed validity coefficients for predictive
studies. However, their conclusions apply to statistics
based on job incumbents. In contrast, our focus is on the
applicant level of analysis in order to compare r, d, and
composite validity for individuals applying for jobs at the
first hurdle of selection.
Contributions to the Literature
We believe that a meta-analytic approach will allow us to
make a contribution to the literature on the validity,
adverse impact potential, and adverse impact of predictor
composites. First, we can empirically extend the analysis of
composite d’s by Sackett and Ellingson (1997) with meta-
analytic data matrices containing more accurate values. We
also extend the work of Ryan et al. (1998) by using meta-
analysis that cumulates work across a number of studies,
with several different predictors, and minimizes the
influence of sample-specific variance. Finally, we extend
the work of Bobko et al. (1999) by using corrected and
updated standardized ethnic group differences and validity
estimates. It is our hope to provide individuals with an
accurate summary of the influence of forming predictor
composites on validity, adverse impact, and demonstrate
the trade-offs inherent in such decisions.
Cognitive Ability and Conscientiousness Composite:Study 1
In this section, we examine a composite of a test of
cognitive ability and a measure of conscientiousness.
Several principles guided our selection of the most
appropriate d and R estimates for this meta-analytic matrix
(Viswesvaran & Ones, 1995). First, our criterion was
performance per se within a given job. There are interesting
arguments for use of measures of progression (e.g.,
promotion, salary, organizational level ‘‘points,’’ etc.;
Carlson, Scullen, Schmidt, Rothstein, & Erwin, 1999).
However, we tried to maintain a focus on performance
given its importance as a criterion in the Uniform Guide-lines (1978), and also in order to have a more clearly
defined dependent variable. Second, we looked for primary
studies or meta-analyses that reported within-job correla-
tions and then cumulated them. We focused on within-job
studies to avoid having extraneous or confounding across-
job variance in our population estimates (Ostroff &
Harrison, 1999). Third, we attempted to control the
influence of job complexity, i.e., the informational
demands of a job. For example, job complexity can
influence the validity of cognitive ability tests as more
complex jobs are associated with higher validities (Hunter
& Hunter, 1984). Complexity can also influence the
standardized ethnic group difference of cognitive ability
tests (Roth, BeVier et al., 2001). We focused on medium-
complexity jobs. Table 1 shows our values for a matrix
of the relationships between a test of cognitive ability,
a measure of conscientiousness, and job performance.
COMPOSITE ETHNIC DIFFERENCES 307
r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005
The best estimate for the corrected validity of cognitive
ability (g) for medium complexity jobs is .51 (Schmidt &
Hunter, 1998). This estimate is also presented in Hunter
and Hunter (1984) and is based on a meta-analysis of more
than 500 studies of the cognitive ability – performance
relationship. The figure of .51 is corrected for both range
restriction and criterion reliability and, thus, is suitable for
our analyses.
The best estimate of the conscientiousness – job
performance relationship is .22 from Hurtz and Donovan
(2000). We chose this value as it was computed only on
criteria measuring job performance. Further, the value of
.22 has been corrected for range restriction and criterion
reliability, but not predictor reliability. We did not choose
to include the value of .31 from Mount and Barrick (1995)
because it included studies using a variety of criterion
measures such as promotion, organizational level, and
turnover, as well as overall performance. As noted by a
reviewer, Mount and Barrick also used multiple scales to
construct their conscientiousness measure. This method
might produce a more reliable measure of conscientious-
ness than is actually used in practice, and such a measure
might therefore overestimate the typical validity observed
for conscientiousness.
In arriving at an estimate of the correlation between
conscientiousness and cognitive ability, we did not include
data from concurrent validity studies (e.g., McHenry,
Hough, Toquam, Hanson, & Ashworth, 1990). We found
only one study that reported a correlation from a sample of
job applicants (r 5.12 in Ryan et al., 1998). However,
we also found six other coefficients from samples of
the ‘‘general population’’ that reported correlations be-
tween self-report measures of conscientiousness and
general mental ability that we believe were likely to be
unaffected by range restriction. Three coefficients were
found in Ackerman and Heggestad (1997). We also
retrieved several other studies from Cortina et al. (2000)
that included Gully, Payne, Kiechel, and Whiteman (2000),
Gully, Phillips, Beaubien, and Payne (1998), and Phillips
and Gully (1997). The resulting meta-analytic estimate is
.03 (K 5 7, N 5 6759). Such a figure agrees with the
literature that estimates a relatively low correlation
between measures of personality and cognitive ability
(e.g., Mount & Barrick, 1995).
We used the meta-analytic value of d 5 .72 for appli-
cant-level differences between African Americans and
white Americans for paper and pencil tests of cognitive
ability within medium complexity jobs (Roth, BeVier et al.,
2001, Table 2). Our applicant d-value of .72 is somewhat
lower than the typically used value of d 5 1.00 which was
based on narrative reviews of the literature (e.g., Hunter &
Hunter, 1984).
Table 1. Effect size estimates for cognitive ability, conscientiousness, structured interviews, and biodata1
Job performance Cognitive abilityStandardized ethnicgroup differences
Cognitive ability r 5.51 d 5 .72K 5 515 K 5 18
N not specified N 5 31,990Schmidt and Hunter (1998) Roth, BeVier et al. (2001)
Conscientiousness r 5.22 r 5.03 d 5 .06K 5 42 K 5 7 K 5 3
N 5 7342 N 5 6759 N 5 4545Hurtz and Donovan (2000) Ackerman and Heggestad
(1997), Gully et al. (1998,2000), Ryan et al. (1998)
Ryan et al. (1998),Wonderlic Inc. (2000)
Structured interview r 5.48 r 5.31 d 5 .31K 5 1492 K 5 21 K 5 21
N 5 18,524 N 5 8817 N 5 8817Huffcutt and Arthur (1994),
McDaniel et al. (1994)Huffcutt et al. (1996) Huffcutt and Roth (1998)
Biodata r 5.32 r 5.37 d 5 .57K 5 5 K 5 2 K 5 2
N 5 11,332 N 5 5475 N 5 6115Rothstein et al. (1990) Dean (1999), Kriska (2001) Dean (1999), Kriska (2001)
Notes: 1Studies used to calculate the corrected correlations are noted below the corrected correlations.Numbers obtained from each of these studies are discussed in the text of this paper.2We recognize that there may be substantial overlap in the primary studies used for the three meta-analyses.
308 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH
International Journal of Selection and Assessmentr 2005 The Authors
Journal compilation r Blackwell Publishing Ltd. 2005
The value of d for conscientiousness was somewhat
more difficult to estimate. We first examined the literature
for a value based upon job applicants. We noted the value
of d 5 .09 for a sample of over 700,000 job applicants who
took integrity tests (Ones & Viswesvaran, 1998). However,
the construct of integrity contains other concepts than just
conscientiousness. However, we did note the value of
d 5 .08 for conscientiousness from a sample of MBA
students (N 5 814) for the Personal Characteristics In-
ventory conscientiousness scale (Wonderlic Inc., 2000) and
two d’s of .06 from samples of job applicants in Ryan et al.
(1998) (N’s of 2210 and 1521). We meta-analyzed these
values and found an average d of .06 (K 5 3, N 5 4545).
We believe the value of .06 reflects the general consensus
that personality factors are associated with relatively small
standardized ethnic group differences (Hough, 1998;
Hough et al., 2001; Mount & Barrick, 1995).
In addition to estimating both d and R for composites,
we also estimate the adverse impact that would accompany
such figures. This required estimates of the proportion of
African American and white American applicants. Bobko
et al. (1999) used the ratio of 80%–20% to parallel the
work of Schmitt et al. (1997). Instead, we used labor
statistics for relevant figures. The African American and
white American proportions of employed individuals in
the U.S. workforce for December 1999, seasonally adjusted,
were 11.9% African American and 88.1% white (Bureau of
Labor Statistics, 2000). Other possible proportions are
noted in the discussion. As in Schmitt et al. (1997) and
Bobko et al. (1999), our meta-analytic correlation matrix
was used to generate a regression equation predicting job
performance, as well as multiple R for the composite. The
standardized ethnic group differences in individual predic-
tors were used to calculate the standardized predicted
difference in composite scores, i.e., generation of the
composite d. We used the composite d to calculate adverse
impact at a range of selection ratios, as shown in Table 2.
The results of study 1 are reported in Table 2. We use a
test of cognitive ability as a benchmark (see Kehoe, 2002;
Schmidt and Hunter, 1998). As noted above, the validity
for cognitive ability is .51 and the applicant d for cognitive
ability alone is .72. We also show that for a single measure
of cognitive ability, adverse impact occurs at all selection
ratios until .9 (see second column of Table 2). This means
that adverse impact is estimated to occur unless one hires
90% or more of the applicants. Note that we assumed
normality as well as equal predictor standard deviations in
the two groups compared.
Adding a test of conscientiousness to form a regression-
weighted composite results in a d of .68 – a 5% reduction
relative to a test of cognitive ability (d 5 .72). The adverse
impact ratios for the composite improved by .01 to .03
when compared to a measure of cognitive ability alone. As
with cognitive ability alone, adverse impact is estimated to
occur unless one hires 90% of the applicants (see Table 2,
which reports results in increments of .10 in the overall
selection ratio). Overall, the regression-weighted compo-
site incorporating conscientiousness does not make large
differences in the degree of expected adverse impact
regardless of the selection ratio.
The regression-weighted composite provided increased
validity of .55 that is quite similar to the value of .60 from
Table 2. Results of analyses for cognitive ability and conscientiousness composite
Cognitive
Study 1(conscientiousness)
Regression
Study 2(structured interview)
Regression
Study 3(biodata)
RegressionAbility test Composite Composite Composite
Standardized group differences in validityd .72 .68 .65 .78r(R) .51 .55 .61 .53
Adverse impact ratiosSelection ratio.1 .25 .25 .28 .25.2 .30 .31 .36 .26.3 .36 .39 .45 .33.4 .44 .46 .47 .39.5 .48 .50 .54 .46.6 .56 .57 .59 .51.7 .63 .64 .66 .60.8 .71 .71 .74 .68.9 .801 .82 .83 .78
Note:1 Adverse impact ratios (as defined by the 4/5th’s rule) in bold font denote that adverse impact has beeneliminated.
COMPOSITE ETHNIC DIFFERENCES 309
r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005
Schmidt and Hunter (1998). The difference may be traced
to our use of the conscientiousness validity coefficient from
the most recent meta-analysis of this topic by Hurtz and
Donovan (2000).
Cognitive Ability and a Structured Interview:Study 2
We also examined the possibility of using a structured
interview in a composite with a measure of cognitive
ability. Before progressing, we note that the use of an
interview for all applicants does have substantial practical
concerns. The interview is a time- and labor-intensive
predictor to administer. Thus, some organizational deci-
sion-makers may not wish to use it in certain selection
scenarios. However, we consider its use here for a number
of reasons. First, it has been suggested as an alternative
predictor to cognitive ability, as well as being used with
cognitive ability in a composite (e.g., Schmitt et al., 1997).
Second, it has relatively large estimated validity. As such,
it will be associated with a larger regression-weight in
forming a composite and it could have a larger influence
on multiple R and d. Third, organizations may use
interviews early in the selection process (e.g., college
recruiting).
The relevant meta-analytic values are presented in Table
1. The validity and standardized ethnic group differences
for cognitive ability tests from study 1 were incorporated
here. Our best estimate of the structured interview – job
performance relationship is .48. It is based on the work
of Huffcutt and Arthur (1994) and McDaniel, Whetzel,
Schmidt, and Maurer (1994). Given our focus on job
performance as the criterion, we opted not to include
Weisner and Cronshaw’s (1988) meta-analytic validity
estimate in our demonstration. Weisner and Cronshaw’s
estimate appears to have combined training success
measures and job performance measures, with no sub-
analyses by criterion type.
We estimated the structured interview – cognitive ability
correlation from Huffcutt, Roth, and McDaniel (1996).
We corrected their observed value of r 5.23 for highly
structured interviews for range restriction (u 5 .74) based
on Huffcutt and Arthur (1994) – resulting in a value of .31.
The value of d 5 .31 in Table 1 represents our best estimate
of standardized African American-white American ethnic
group differences for structured interviews. Previous
researchers used the value of d 5 .23 based on the work
of Huffcutt and Roth (1998). We corrected the value of .23
for range restriction also based on u 5 .74.
The results of our analysis for study 2 are presented in
Table 2. Adding an interview to a measure of cognitive
ability by regression-weighting results in a d of .65 (a 10%
reduction in d). Adverse impact ratios are improved by .03
to .09 when compared to a measure of cognitive ability
used alone. Once again, adverse impact occurs unless one
hires 90% of the applicants. Composite validity was
estimated to be .61.
Cognitive Ability and Biodata: Study 3
Forming a matrix to study the composite of cognitive
ability and biodata was difficult given the relatively small
number of published studies in this area. The values we
used for this matrix are shown in Table 1. Rothstein,
Schmidt, Erwin, Owens, and Sparks (1990) reported a
range restriction and criterion measurement reliability
corrected validity for biodata studies. They reported a
corrected validity of .32 for the job of supervisor (which is
defined as a medium complexity job by Hunter, Schmidt, &
Judiesch, 1990). Other sources of biodata validity reported
in Bobko et al. (1999) and Schmitt et al. (1997) do not
report such correction information.
We were able to find two studies that reported applicant
level data for the correlation between biodata and cognitive
ability. A recent within-job study reported a correlation of
.42 (N 5 3599) for a job in a large federal agency (Dean,
1999). Although the biodata measure was administered
after a cognitive ability screen, the author was able to
correct the correlation for range restriction to estimate the
applicant group correlation. Kriska (2001) described a
situation in which all applicants took both a measure of
cognitive ability and a biodata instrument. The applicant
level r was .26. We used sample weights to average these
values for a final estimate of .37 (N 5 5475).
We found two studies that reported biodata applicant
level statistics. Dean (1999) reported a range restriction
corrected d of .73 (N 5 3599). Kriska (2001) reports an
applicant d of .27 (N 5 1876). We averaged these two
values for a final estimate of .57. We note that the value of
.57 is substantially higher than the value of d 5 .33 used by
Bobko et al. (1999) and Schmitt et al. (1997), which was
largely based on Gandy et al. (1994). Gandy et al. (1994)
estimate was based on incumbents (d 5 .35). We note that
our ability to find only two applicant level biodata d’s was
despite multiple literature searches and repeated requests
to leading biodata consulting firms for data. Unfortunately,
our requests for applicant level information met with no
substantive replies.
The results for study 3 are shown in Table 2. Adding a
biodata measure to a measure of cognitive ability results in
a regression-weighted composite d of .78. This is the largest
composite d for any of our three analyses and is notable
because it is larger than the d for cognitive ability alone
(8% increase in d). Adverse impact ratios were also .02 to
.06 worse than for cognitive ability alone. This may be at
least partially due to the moderate correlation between
cognitive ability and biodata as well as a somewhat lower
validity coefficient for biodata relative to g. The results of
this analysis may serve as a caution to decision-makers,
since adding an alternative predictor with a moderate d and
310 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH
International Journal of Selection and Assessmentr 2005 The Authors
Journal compilation r Blackwell Publishing Ltd. 2005
intercorrelation with cognitive ability may actually exacer-
bate adverse impact potential.
We caution the reader that even though the N’s in most
‘‘cells’’ of our matrix are moderate, there are only two
studies that report unrestricted d’s for biodata and two
unrestricted correlations between biodata and cognitive
ability. Thus, our results are not definitive.
Validity was estimated to be .53 for the regression-
weighted composite. The moderate correlation between
the two independent variables led to a small increment in
validity when using regression weights. Given our rela-
tively small sample sizes noted above, we also conducted
one set of additional analyses. We wanted to examine what
might happen to the composite d under the best circum-
stances that we could expect from the available biodata
information on applicants. To do this, we set the values of d
and r for the intercorrelation with g for biodata at the
‘‘optimistic’’ levels of .27 and .26, respectively. That is, we
set the values based on the work of Kriska (2001) alone.
This lowers the biodata correlation with g and lowers the d
associated with biodata. In both cases, this should serve to
decrease d as much as possible given available data. When
we computed the biodata – g composite under these
circumstances the d was .69 (3% decrease in d) and the
regression-weighted composite validity was .55. Thus,
results show only a slight decrease in d under the ‘‘best of
circumstances’’ and an increase of composite d when all the
available data are considered.
Discussion
The Adverse Impact and Validity of Composites
The results of our analyses are interesting and complex. For
a conscientiousness – g composite, regression weights lead
to a slight reduction in adverse impact potential (from
d 5 .72 to d 5 .68), and the composite does have slightly
increased validity relative to a test of cognitive ability alone
(from r 5.51 to R 5 .55). For the interview – g composite,
there was an increase in validity (from r 5.51 to R 5 .61)
and a decrease in adverse impact potential (d decreases
from .72 to .65). In terms of the biodata and g composite,
adverse impact potential is increased relative to a test of
cognitive ability alone. Our ‘‘bottom line’’ interpretation is
that adding alternative predictors does not automatically
result in a ‘‘win–win’’ situation in which validity auto-
matically goes up and adverse impact potential automati-
cally goes down. Overall, it appears that a strategy of
adding an alternative predictor to a measure of cognitive
ability in order to reduce adverse impact often results in
relatively modest decreases in adverse impact potential. For
example, d decreases about 5% when adding a measure
of conscientiousness and 10% when adding a structured
interview using regression weighting. As well, the adverse
impact ratios in Table 2 are almost always less than .80,
regardless of the composite used. Thus, our meta-analytic
results appear to be similar to the primary study results of
Ryan et al. (1998) in that adverse impact was not greatly
reduced. Together, these studies begin to raise an interesting
question of just how likely it is for organizations to actually
avoid adverse impact in selection systems.
Other Factors in Decisions toUse Selection Composites
While we have tried to provide accurate estimates of
validity and adverse impact, other factors can be important
in such decisions. In terms of conscientiousness, decision-
makers will need to think through the issue of faking and its
possible implications (e.g., Douglas, McDaniel, & Snell,
1996; Ellingson, Sackett, & Hough, 1999; Ellingson,
Smith, & Sackett, 2001; Graham, McDaniel, Douglas, &
Snell, 2002; Ones, Viswesvaran, & Reiss, 1996). We note
that not all researchers view faking on conscientiousness
scales as a problem (cf., Ellingson et al., 1999; Ones et al.,
1996). Decision-makers may also wish to consider the
practicality or feasibility of administering certain predic-
tors. Structured interviews have important cost and
feasibility concerns when interviewing a large number of
individuals. Further, one might need to think through the
utility of such actions if the jobs in question were not
associated with at least moderate economic returns (i.e.,
the standard deviation of job performance in dollars). Also,
biodata measures are typically costly to develop and may
require periodic ‘‘re-keying’’ in addition to the fact that
biodata d’s may be larger than some individuals had
thought. Finally, it might also prove interesting to see if the
pattern of results is robust to weighting schemes (e.g.,
regression weights versus unit weights).
Implications of Accurate Estimates ofStandardized Ethnic Group Differences
Accurate estimates of parameters are important to both
researchers and decision-makers. Assume that a group of
decision-makers is designing a selection system for medium
complexity jobs using a composite of g and an alternative
predictor. They might mistakenly assume that the d value of
the alternative is zero (or close to zero). For example, we
have seen published articles (e.g., Cascio & Phillips, 1979)
and technical reports suggesting this description charac-
terizes work sample tests. Assuming equal validity, no
intercorrelation of these variables, an applicant d of .72 for
g, and a d of .00 for work samples, the composite d would
be expected to be .51. However, as noted below, there is
reason to suggest the d may be at least .48 for work sample
tests. Still assuming no intercorrelation between predictors,
if d 5 .48 for work sample tests, the expected composite d
increases to .85 – an increase of .13 relative to the d for
cognitive ability alone. Alternatively, assuming an inter-
correlation of .3 for the two previously mentioned
predictors (d 5 .72 and d 5 .48, respectively) would lead
COMPOSITE ETHNIC DIFFERENCES 311
r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005
to a d of .75. Such estimates might surprise decision-makers
upon implementation and have negative unanticipated
consequences for the organization.
As new data become available, interested readers may be
able to incrementally add other values for d and r to arrive
at different meta-analytic estimates from those presented in
this paper. It is essential, however, that researchers and
practitioners in fact consider the issues and estimates
presented in this paper, rather than simply assume that
adding predictors will automatically reduce adverse
impact. In particular, when forming a composite with g,
the characteristics of the alternative predictor are quite
important. It appears that the alternative must have little
correlation with cognitive ability (to achieve the slight
reduction of adverse impact reduction associated with
conscientiousness) or the alternative must have high
validity (i.e., the structured interview; see also Schmitt
et al., 1997). Alternative predictors with sizeable d’s or
moderate correlations with cognitive ability might actually
increase adverse impact potential.
Another important implication for decision-makers and
researchers is the need to consider the adverse impact
potential and adverse impact of various predictors at the
same level of analysis. If decision-makers are considering
how to design a selection system to choose among
applicants, it is important to make sure that all psycho-
metric estimates are applicable to that same level of
analysis. If this issue is ignored, decision-makers using
restricted d’s to estimate applicant d’s may find they have
substantially higher levels of adverse impact than they had
projected.
Limitations
We note two sets of limitations to this work. First, there is a
lack of published research in several areas used in our
calculations. We had only two applicant level studies in the
biodata ethnic group differences cell and the biodata –
cognitive ability correlation cell (though total sample sizes
exceeded 5000). We also did not have a great deal of data
on applicant ethnic group differences in conscientiousness.
Our work is also limited by the fact that we used
unrestricted d’s and unrestricted predictor intercorrelation
values for only four major predictors of job performance in
our field. Other predictors such as situational judgment
tests or work samples were not included nor were other
portions of the job performance domain such as contextual
performance. These limitations give rise to extensive future
research needs noted below.
Future Research Needs
Ethnic Group Differences Research. In comparison to
a fairly sizeable body of literature that can be brought to
bear on composite validity (e.g., see Schmidt & Hunter,
1998), there is much more research needed in terms of the
ethnic group differences of various predictors. Once this
basic research is done, researchers could begin to study
many more composites meta-analytically. While there is
now a published meta-analysis of differences on paper and
pencil measures of cognitive ability (Roth, BeVier et al.,
2001), we were surprised by the lack of a major applicant-
level meta-analysis of conscientiousness and other factors
in the Five Factor Model of personality. Although the
current consensus is that that ethnic group differences are
small (e.g., Mount & Barrick, 1995), a meta-analytic effort
would be helpful to precisely document estimates with state
of the art empirical methods. We suggest that such future
work clearly articulate findings by constructs (e.g.,
conscientiousness studies that do not mix integrity and
conscientiousness) and methods when possible. It is also
important to report values separately for applicants and job
incumbents.
We also note a need to examine the unrestricted d’s of
other predictors of performance noted in Schmidt and
Hunter (1998) such as job knowledge tests, training and
experience records (using a behavioral consistency ap-
proach), job experience measures, and training and
experience records using a ‘‘points approach.’’ For exam-
ple, Bernardin’s (1984) work on standardized ethnic group
differences for measures of job knowledge reported a d
of .42. However, this work was done to examine job
knowledge measures as criteria rather than predictors and
the studies mainly included job incumbents so that we do
not know the unrestricted d for job applicants. We are also
aware of Schmitt, Clause, and Pulakos’ (1996) work that
resulted in a d of .38 for a category of job knowledge, work
sample, and situational judgment tests. Again, most of
these studies were conducted on incumbents. The authors
are not aware of any ethnic group difference studies of
training and experience records or other measures of
experience for unrestricted samples of applicants. There
are also a number of predictors of job performance
reviewed by Schmidt and Hunter (1998) that are more
focused on methods than particular constructs. For
example, structured interviews can be designed to focus
on a variety of constructs (Huffcutt, Conway, Roth, &
Stone, 2001), including cognitive ability. For another
example, a work sample test focusing on complex material
could load more highly on cognitive ability than a work
sample test focusing on psycho-motor skills. In investiga-
tions of all of the ‘‘method-based’’ predictors we urge
researchers to work towards a focus on the constructs
that are designed to be measured by such predictors, as
the constructs could substantially influence results. There
is still substantial work to be done to more accurately
articulate the unrestricted d’s for ‘‘method-based’’
predictors.
Research on Predictor Intercorrelations. There are
also many pressing needs in the area of predictor
intercorrelations. Schmidt and Hunter (1998) noted that
there were not enough data on predictor intercorrelations
312 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH
International Journal of Selection and Assessmentr 2005 The Authors
Journal compilation r Blackwell Publishing Ltd. 2005
to study composites of alternative predictors. The situation
has not changed greatly in the few years since their work
was published. For example, one obvious need for research
is to provide an applicant level estimate of the biodata-
conscientiousness correlation to facilitate investigation on
such a composite. There are also needs in virtually every
combination of predictors studied by Schmidt and Hunter
(1998) and we refer the reader to their Table 1 for an
extensive list. While such work is hardly likely to be
considered ‘‘glamorous’’ by researchers, it can be important
to meta-analytically examine both the validity and
standardized ethnic group differences of a variety of
predictors of job performance and to understand the
psychometric characteristics of these predictors in compo-
sites that involve even more than two predictors.
Other Research. While not explicitly a part of ethnic
group differences research, we also urge research into the
area of how managers view the trade-off between validity
and standardized ethnic group differences (and adverse
impact). If adverse impact is viewed as a relatively
continuous variable, what is the functional relationship
between levels of adverse impact and managerial evalua-
tions of selection systems? We suggest the use of Multi-
attribute Utility Analysis (Roth & Bobko, 1997) is one way
to investigate these perceptions.
A reviewer also noted that given our focus was within a
particular level of job complexity, the proportions of white
Americans and African Americans may be different than
the values we generated and used. We call on future
investigative work delineating the degree to which job
complexity level influences such a statistic.
Finally, we urge more research into multiple hurdle
selection systems (as per Sackett & Roth, 1996). We hope
the applicant level estimates of r and d in this manuscript,
and those in the future, will allow individuals to model
multiple hurdle selection systems. Such researchers might
start with applicant values and then examine how varying
selection ratios and various hurdles might influence
validity, ethnic group differences, and adverse impact.
In sum, we have emphasized the importance of applicant
level analysis for constructing and evaluating composites
within selection research, and we found that adding
alternative predictors does not result in a situation in
which validity automatically goes up and adverse impact
potential automatically goes down. We believe that such
analyses allow the best possible comparison among
predictors in many situations and should facilitate future
selection system modeling and development. We urge
future research into many of the issues noted above and
look forward to such studies.
Acknowledgement
We would like to thank S. David Kriska for his help in
providing some data for this project.
References
Ackerman, P.L. and Heggestad, E.D. (1997) Intelligence, person-ality, and interests: Evidence for overlapping traits. Psychologi-cal Bulletin, 121, 219–236.
Barrett, G.V., Phillips, J.S. and Alexander, R.A. (1981) Concurrentand predictive validity designs: A critical reanalysis. Journal ofApplied Psychology, 66, 1–6.
Bernardin, H.J. (1984) An analysis of black-white differences in jobperformance. Paper presented at the Academy of Managementmeetings, Boston, MA.
Bobko, P. and Roth, P. (2004) The four-fifths rule for assessingadverse impact: An arithmetic, intuitive, and logical analysis ofthe rule and implications for future research. In J. Martocchio
(Ed.), Research in personnel and human resources management(Vol. 23, pp. 177–197). Amsterdam: Elsevier Press.
Bobko, P., Roth, P.L. and Bobko, C. (2001) Correcting the effect sizeof d for range restriction and unreliability. OrganizationalResearch Methods, 4, 46–61.
Bobko, P., Roth, P.L. and Potosky, D. (1999) Derivation andimplications of a meta-analytic matrix incorporating cognitiveability, alternative predictors and job performance. PersonnelPsychology, 52, 561–589.
Bureau of Labor Statistics. (2000) Employment and earnings, 46(1),13–14 (Chart A-4).
Carlson, K.D, Scullen, S.E., Schmidt, F.L., Rothstein, H. andErwin, F. (1999) Generalizabile biographical data validity can beachieved without multi-organizational development and keying.Personnel Psychology, 52, 731–753.
Cascio, W.F. and Phillips, N.F. (1979) Performance testing: A roseamong thorns. Personnel Psychology, 32, 751–766.
Cortina, J.M., Goldstein, N.B., Payne, S.C., Davison, H.K. andGilliland, S.W. (2000) The incremental validity of interviewscores over and above cognitive ability and conscientiousnessscores. Personnel Psychology, 53, 325–351.
Dean, M. (1999) On biodata construct validity, criterion validityand adverse impact. Unpublished doctoral dissertation, Louisi-ana State University.
Douglas, E.F., McDaniel, M.A. and Snell, A.F. (1996) The validity ofnon-cognitive measures decays when applicants fake. Paperpresented at the 11th Annual Conference of the Academy ofManagement, Cincinnati.
Ellingson, J.E., Sackett, P.R. and Hough, L.M. (1999) Socialdesirability corrections in personality measurement: Issues ofapplicant comparison and construct validity. Journal of AppliedPsychology, 84, 155–166.
Ellingson, J.E., Smith, D.B. and Sackett, P.R. (2001) Investigatingthe influence of social desirability on personality factorstructure. Journal of Applied Psychology, 86, 122–133.
Gandy, J.A., Dye, D.A. and MacLane, C.N. (1994) Federalgovernment selection: The individual achievement record. InG.S. Stokes, M.D. Mumford and W.A. Owens (Eds.), Biodatahandbook: Theory, research, and use of biographical informa-tion in selection and performance prediction (pp. 275–310). PaloAlto, CA: CPP Books.
Graham, K.E., McDaniel, M.A., Douglas, E.F. and Snell, A.F.(2002) Biodata validity decay and score inflation with faking:Do item attributes explain variance across items? Journal ofBusiness and Psychology, 16, 573–592.
Gully, S.M., Payne, S.C., Kiechel, K.L. and Whiteman, J.K. (2000).The impact of error training and individual differences ontraining outcomes: An attribute treatment interaction perspec-tive. Working manuscript.
Gully, S.M., Phillips, J.M., Beaubien, J.M. and Payne, S.C. (1998,April) Effects of Individual differences in goal orientation andself-regulatory tendencies on learning. In S.M. Gully and J.E.
COMPOSITE ETHNIC DIFFERENCES 313
r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005
Mathieu (Chairs), Individual differences, learning, motivation,and training outcomes. Symposium presented at the 13th AnnualConference of the Society for Industrial and OrganizationalPsychology, Dallas, TX.
Hakel, M.D. (1998) Beyond multiple choice: Evaluating alterna-tives to traditional testing for selection. Mahwah, NJ: Earlbaum.
Hough, L. (1998) Personality at work: Issues and evidence. InM. Hakel (Ed.), Beyond multiple choice: Evaluating alternativesto traditional testing for selection (pp. 131–159). Mahwah, NJ:
Erlbaum.Hough, L., Oswald, F. and Ployhart, R. (2001) Adverse impact and
group differences in constructs, assessment tools, and personnelselection procedures: Issues and lessons learned. InternationalJournal of Selection and Assessment, 9, 152–194.
Huffcutt, A.I. and Arthur, W. Jr. (1994) Hunter and Hunter (1984)revisited: Interview validity for entry-level jobs. Journal ofApplied Psychology, 79, 184–190.
Huffcutt, A.I., Conway, J., Roth, P.L. and Stone, N. (2001)Identification and meta-analysis of constructs measured in employ-ment interviews. Journal of Applied Psychology, 86, 897–913.
Huffcutt, A.I. and Roth, P.L. (1998) Racial group differencesin interview evaluations. Journal of Applied Psychology, 83,179–189.
Huffcutt, A.I., Roth, P.L. and McDaniel, M.A. (1996) A meta-analytic investigation of cognitive ability in employment inter-view evaluations: Moderating characteristics and implicationsfor incremental validity. Journal of Applied Psychology, 81,459–473.
Hunter, J.E. and Hunter, R.F. (1984) Validity and utility ofalternative predictors of job performance. Psychological Bulle-tin, 96, 72–98.
Hunter, J.E. and Schmidt, F.L. (2004) Methods of meta-analysis:Correcting for error and bias in research findings (2nd Edn.).Newbury Park: Sage.
Hunter, J.E., Schmidt, F.L. and Judiesch, M.K. (1990) Individualdifferences in output variability as a function of job complexity.Journal of Applied Psychology, 75, 28–42.
Hurtz, G.M. and Donovan, J.J. (2000) Personality and jobperformance: The big five revisited. Journal of AppliedPsychology, 85, 869–879.
Kehoe, J.F. (2002) General mental ability and selection in privatesector organizations: A commentary. Human Performance, 15,97–106.
Kriska, S.D. (2001, August) The validity-adverse impact trade-off:Real data and mathematical model estimates. Paper presented atthe Society for Industrial and Organizational Psychology meet-ings, San Diego, CA.
McHenry, J., Hough, L., Toquam, J., Hanson, M. and Ashworth, S.(1990) Project A validity results: The relationship betweenpredictor and criterion domains. Personnel Psychology, 43,335–354.
McDaniel, M.A., Whetzel, D.L., Schmidt, F.L. and Maurer, S.D.(1994) The validity of employment interviews: A comprehensivereview and meta-analysis. Journal of Applied Psychology, 79,599–616.
Mount, M.K. and Barrick, M. (1995) The Big Five personalitydimensions: For research and practice in human resourcesmanagement. Research in Personnel and Human ResourcesManagement, 13, 823–854.
Ones, D. and Viswesvaran, C. (1998) Gender, age, and racedifferences on overt integrity tests: Results across four largescale job applicant data sets. Journal of Applied Psychology, 83,35–42.
Ones, D.S., Viswesvaran, C. and Reiss, A.D. (1996) Role of socialdesirability in personality testing for personnel selection: The redherring. Journal of Applied Psychology, 81, 660–679.
Ostroff, C. and Harrison, D.A. (1999) Meta-analysis: Level ofanalysis, and best estimates of population correlations: Cautionsfor interpreting meta-analytic results in organizational behavior.Journal of Applied Psychology, 84, 260–270.
Phillips, J.M. and Gully, S.M. (1997) Role of goal orientation,ability, need for achievement, and locus of control in the self-efficacy and goal setting process. Journal of Applied Psychology,82, 792–802.
Pulakos, E. and Schmitt, N. (1996) An evaluation of two strategiesfor reducing adverse impact and their effects on criterion relatedvalidity. Human Performance, 9, 241–258.
Reilly, R.R. and Chao, G. (1982) Validity and fairness of somealternative employee selection procedures. Personnel Psychol-ogy, 35, 1–62.
Reilly, R.R. and Warech, M.A. (1994) The validity and fairness ofalternatives to cognitive ability tests. In L. Wing and B. Gifford
(Eds.), Policy issues in employment testing. Boston: Kluwer.
Roth, P.L., BeVier, C.A., Bobko, P., Switzer, F.S. III. and Tyler, P.(2001) Ethnic group differences in cognitive ability in employ-ment and education settings: A meta-analysis. PersonnelPsychology, 54, 297–330.
Roth, P.L. and Bobko, P. (1997) A research agenda for multi-attribute utility analysis in human resource management.Human Resources Management Review, 7, 341–368.
Roth, P.L., Bobko, P., Switzer, F.S. III. and Dean, M.A. (2001) Priorselection causes biased estimates of standardized ethnic groupdifferences: Simulation and analysis. Personnel Psychology, 54,591–617.
Roth, P.L., Van Iddekinge, C.H., Huffcutt, A.I., Eidson, C.E. Jr. andBobko, P. (2002) Correcting for range restriction in structuredinterview ethnic group differences: The values may be largerthan researchers thought. Journal of Applied Psychology, 87,369–376.
Rothstein, H., Schmidt, F.L., Erwin, F., Owens, W. and Sparks, C.P.(1990) Biographical data in employment selection: Can validities bemade generalizable? Journal of Applied Psychology, 75, 174–184.
Ryan, A.M., Ployhart, R.E. and Friedel, L.A. (1998) Usingpersonality testing to reduce adverse impact: A cautionary note.Journal of Applied Psychology, 83, 298–307.
Sackett, P.R. and Ellingson, J. (1997) The effects of forming multi-predictor composites on group differences and adverse impact.Personnel Psychology, 50, 707–721.
Sackett, P.R. and Roth, L. (1996) Multi-stage selection strategies:A Monte Carlo investigation of effects on performance andminority hiring. Personnel Psychology, 49, 1–18.
Sackett, P.R., Schmitt, N., Ellingson, J.E. and Kabin, M.E. (2001)High stakes testing in employment, credentialing, and highereducation: Prospects in a post affirmative action world.American Psychologist, 56, 302–318.
Salgado, J.F., Viswesvaran, C. and Ones, D.S. (2001) Predictors usedfor personnel selection: An overview of constructs, methods,and techniques. In N. Anderson, D. Ones, H. Sinangil and
C. Viswesvaran (Eds.), Handbook of industrial, work, &organizational psychology (pp. 165–199). London: Sage.
Schmidt, F.L. and Hunter, J.E. (1998) The validity of selectionmethods in personnel psychology: Practical and theoreticalimplications of 85 years of research findings. PsychologicalBulletin, 124, 262–274.
Schmitt, N., Clause, C.S. and Pulakos, E.D. (1996) Subgroupdifferences associated with different measures of some commonjob relevant constructs. In C.L. Cooper and I.T. Robertson
(Eds.), International review of industrial and organizationalpsychology (Vol. 11, pp. 115–137). Chichester, UK: JohnWiley.
Schmitt, N., Rogers, W., Chan, D., Sheppard, L. and Jennings, D.(1997) Adverse impact and predictive efficiency of various predictorcombinations. Journal of Applied Psychology, 82, 719–730.
314 DENISE POTOSKY, PHILIP BOBKO AND PHILIP L. ROTH
International Journal of Selection and Assessmentr 2005 The Authors
Journal compilation r Blackwell Publishing Ltd. 2005
Stokes, G.S., Mumford, M.D. and Owens, W.A. (1994) Biodatahandbook: Theory, research, and use of biographical informa-tion in selection and performance prediction. Palo Alto, CA:Consulting Psychologists Press.
U.S. Equal Opportunity Employment Commission, U.S. CivilService Commission, U.S. Department of Labor, U.S.Department of Justice. (1978) Uniform guidelines on emp-loyee selection procedures. Federal Register, 43, 38295–38309.
Viswevaran, C. and Ones, D.S. (1995) Theory testing: Combiningpsychometric meta-analysis and structural equations modeling.Personnel Psychology, 48, 865–886.
Weisner, W. and Cronshaw, S. (1988) The moderating impact ofinterview format and degree of structure on the validity of theemployment interview. Journal of Occupational and Organiza-tional Psychology, 61, 275–290.
Wonderlic Inc. (2000) Manual for the personal characteristicsinventory. Libertyville, IL: Wonderlic.
COMPOSITE ETHNIC DIFFERENCES 315
r 2005 The AuthorsJournal compilation r Blackwell Publishing Ltd. 2005 Volume 13 Number 4 December 2005