applied psychology in human resource management 6th edition

of 271 /271

Author: centru-limbi-straine

Post on 09-Sep-2014




14 download

Embed Size (px)


Library of Congress Catalogingin.Publication Data Cascio. Wayne F. Appliedpsychology inhumnn resource management/WayneF. Cnscioand Herman Aguinis. -6th ed. p. em. Includes bibliographicalreferences andindex. ISBN0-13-148410-9 1. Personnel management-Psychological aspects.2. Psychology. Industrial. J.Personnel management- United States.-t. Psychology. Indus[[ialUnitedStates. l.Aguinis, Herman, 1966- 1l.1itle. HF5549.C2972oo5 658.3'001 '9- dc22 2004014002 Acquisitions Editor:Jennifer SimonCover Design:BruceKenselaar Editorial Director:Jeff ShelstaadDirector, Image Resource Center: Melinda Reo Assistant Editor:Christine GennekenManager, Rights andPennissions:ZinaArabia Editorial Assistant:RichardGomesManager, Visual Research: Beth Brenzel Mark.eting Manager:Shannon MooreManager, Cover VisualResearch & Pennissions: Marketing Assistant:Patrick DanzusoKaren Sanatar Managing Editor: JohnRobertsManager, Print Production: Christy M isausefulstatisticbecauseitenablesustotalkaboutanindividual's trueand error scores. Given anobserved score, a Meas enables usto estimate therange of score values that will, with a givenprobability, include thetrue score. In other words. we can establish confidence intervals. TheaMe" maybeusedsimilarlytodeterminetheamountof variabilitytobe expecteduponretesting. Toillustrate, assumethestandarddeviationof agroupof observed scores is7 andthe reliability coeffiCient is.90. ThencrMc",= =2 21 Givenanindividual'sscoreof 70,wecanbe95percentconfidentthatonretesting the individual's score will be within about four points (1.96crMeas =1.96 x 2.21=4.33) of hisoriginalscoreandthathistruescore probablyliesbetween(X +-/- 1.96crMeaS> or 65.67 and 74.33. ____________________________________CHAPTER 6Measuring andInterpreting Individual Differences14' Inpersonnel psychology.the standard error of measurementisusefulinthree ways (Guion,1965). First. itcan be used to determine wherher the measures describing individlUllsdiffer signitlCllfltly(e.g"assumingafive-pointdifferencebetweenapplicants, if the cr'vlea,forthetest is 6, the difference could certainly be attributed to chance). In fact, Gulliksen (1950) showed that the difference between the scores of two individuals on the sametestshouldnotbe interpreted assignificantunlessitisequ.altoatleasttwo standarderrorsofthedifference(SED),whereSED =cr\1.Jz. Second.itmaybe ,e,lS usedto determinewherheranindividual measure issignificantlydifferent fromsome hypothetical truescore.Forexample. assumingacutscoreonatestisthetruescore, chances aretwoout of threethat obtained scores willfallwithin +/- Icr'vleasof thecut score. Applicants withinthisrangecouldhavetruescoresaboveorbelow the cutting score:thus,theobtainedscoreis"predicted"fromahypotheticaltruescore. Athird usage isto determine whether l/ test discriminates differently indifferent groups (e.g" high versuslowability). Assuming thatthedistributionof scoresapproachesnormalityand thatobtainedscoresdonotextendovertheentirepossiblerange,thencrMea'willbe verynearlyequal forhigh-scorelevelsandforlow-scorelevels(Guilford&Fruchter. 1978). On the other hand. when subscale scores are computed or whenthe testitself has peculiarities, the test may do abetter job of discriminating at one part of the score range than at another. Under these circumstances, itis beneficial to report thecrMea,for score levels at or near the cut score. To do this, it isnecessary to develop a scatter diagram that showstherelationshipbetweentwoforms(or halves)of the same test. Thestandard deviations of the columns or rows at different score levels willindicate where predictions willhavethe greatest accuracy. Afinaladvantage of thecrMeasisthatit forcesone tothink of test scoresnot as exact points, but rather asbands or ranges ofscores. Since measurement error ispresent atleasttosomeextentinallpsychologicalmeasures. suchaviewisboth soundand proper. GENERALIZABILITY THEORY Thediscussionofreliabilitypresentedthusfaristheclassicalortraditional approach.Amorerecentstatisticalapproach,termedgeneralizabilitytheory,conceptualizesthereliabilityof atestscoreastheprecisionwithwhichthatscore. or sample. represents amore generalized universe value of the score (Cronbach, GIeser, Nanda, &Rajaratnam,1972:DeShon, ZOOZ:Murphy & DeShon, ZOOOa,2000b). Ingeneralizabilitytheory,observations(for example, examinees'scoresontests) are seen assamples froma universe of admissible observations. 'Theuniverse describes theconditionsunder whichexaminee, can be observed or testedthatproduce results that are equivalent to some'ipecifieddegree. An examinee's uni"erse scoreisdefined astheexpectedvalueof hisorher observedscoresoveralladmissibleobservations. Theuniversescoreisdirectlyanalogoustothetruescoreusedinclassicalreliability theory.Generalizabilitytheory emphasizes thatdifferentuniversesexistandmakes it thetestpublisher'; responsibilityto define carefully his or her universe.-Illis definition isdone interms of facetsor Jimensions. Theuseof generalizabilitytheory involve" conductingtwotypesof research studies:ageneralizability(G) study andadecision(D) study. AGstudy isdone aspart of thedevelopmentof themeasurementinstrument. Themaingoalof theG'ituJyisto j ....-""'__ ,-,,:th,1.AppliedPsychologyinHumanResource Management specifythedegreetowhichtestresultsareequivalentwhenobtainedunder different testing conditions.Insimplified terms. aGstudy involves collecting data for examinees testedunder specifiedconditions(that variouslevelsof specified facets). estimatingvariancecomponentsduetothesefacetsandtheirinteractionsusinganalysisof variance. and producing coefficients of generalizability. Acoefficientof generalizability isthe ratio of universe-score varianceto observed-score variance and isthe counterpart ofthereliabilitycoefficientusedinclassicalreliabilitytheory.Atesthasnotone generalizability coefficient. but many. depending on the facets examined inthe Gstudy. TheGstudy also provides informationabout howtoestimatean examinee's universe scoremost accurately. 1InaDstudy.themeasurementinstrumentproducesdatatobeusedinmaking decisions or reaching conclusions. such asadmitting people to programs. The information fromtheGstudy isused ininterpreting theresults of the Dstudy and inreaching

soundconclusions.Despiteitsstatisticalsophistication.however.generalizability C":'_.theory has not replaced the classical theory of test reliability(Aiken. 1999). Severalrecentlypublished studiesillustratetheuseof thegeneralizabilitytheory approach. As anillustration. Greguras. Robie. Schleicher. and Goff (2003) conducted a

fieldstudy inwhichover 400 managersinalargetelecommunications company were 01rated by their peers and subordinates using an instrument for both developmental and administrativepurposes.Results showedthatthecombinedrater andrater-by-rateej interactioneffectsweresubstantiallylargerthantheperson effect(i.e..theobject being rated) for both the peer and the subordinate sources forboth the developmental andtheadministrative conditions. However. theperson effect accounted for agreater I amount of varianceforthesubordinateraterswhenratingswereusedfordevelopmentalasopposedtoadministrativepurposes. andthisresultwasnotfoundforthe peer raters. Thus.theapplicationof generalizabilitytheoryrevealedthatsubordinate 11ratingswereof significantlybetter qualitywhen madefordevelopmentalrather than administrative purposes. but the same wasnot true for peer ratings. INTERPRETING THE RESULTS OF MEASUREMENT PROCEDURES In personnel psychology. a knowledgeof eachperson's individuality-his or her unique pattern of abilities. values. interests. and personality - is essential in programs designed to usehuman resources effectively. Suchknowledge enables usto makepredictions about howindividuals arelikelytobehaveinthefuture.Inordertointerprettheresultsof measurement procedures intelligently. however. weneedsomeinformation abouthow relevantothers have performed on the same procedure. For example, Sarah isapplying foradmissionto an industrial arts program at alocal vocationalhigh school. As part of the admissions procedure. she is given a mechanical aptitude test. She obtains a raw score of 48 correct responses out of a possible 68. Is this score average, above average. or below average?Inand of itself. thescore of 48 ismeaningless because psychologicalmeasurement isrelative rather than absolute.In order to interpret Sarah's score meaningfully. we needto compareher raw scoretothedistnbutionof scoresof relevant others-that is. personsof approximatelythesameage. sex. and educationaland regionalbackground whowerebeingtestedforthesamepurpose. Thesepersonsmakeupanorm group. 1_CHAPTER 6Measuring andInterpreting IndividualDifferences Theoretically. therecanbeasmanydifferentnormgroupsastherearepurposes for whichaparticulartestisgivenandgroupswithdifferentcharacteristics. Thus. Sarah's score of 48may be about average when compared to the scores of her reference group, it mightbedistinctlyaboveaveragewhencomparedtotheperformanceof agroupof musicmajors. anditmightrepresent markedlyinferiorperformancein comparisonto the performance of a group of instructor-mechanics. In short. norms must provide a relevant comparison group for the personbeing tested. Immediatelyafter the introduction of atesting or other measurementprogram. it may be necessary to use norms published inthe test manual. but local norms (based on thescoresof applicantsinaspecificorganization or geographicalarea)shouldbe preparedassoonas100ormorecasesbecomeavailable. Thesenormsshouldbe revisedfromtimetotimeasadditional data accumulate (Ricks, 1971).In employment selection. localnorms are especially desirable, sincetheyaremore representative and fitspecificorganizational purposesmoreprecisely.Localnormsallowcomparisons between the applicant's score and those of her immediate competitors. Uptothispoint. wehavebeenreferringtonormativecomparisonsintermsof "average.""aboveaverage."or "below average."Obviously weneedamoreprecise way of expressing each individual's position relative to the norm group. This isaccomplished easily by converting raw scores into some relative measure-usually percentile ranksorstandardscores.Thepercentilerankofagivenrawscorereferstothe percentage of personsinthenormgroupwhofallbelowit.Standard scoresmaybe expressed either as z scores (i.e., the distance of each raw score fromthemean in standarddeviationunits)or assomemodificationof thez scorethateliminatesnegative numbersanddecimalnotation. AhypotheticalnormtableispresentedinTable6-4. Therelationshipsamongpercentileranks. standard scores. andthenormalcurveare presented graphically in Figure 6-5. Notethattherearenorawscoresonthebaselineof thecurve. Thebaselineis presented inageneralized form. marked off instandard deviationunits.For example, if the mean of a distribution of scores is 30 and if the standard deviation is 8. then +/- I(J TABLE ,.....Norms/or. in x,Cotnp. Comprchelis19il... Raw ScorePercentilez Score 5099t2.2 4698+2.0 4290t1.3 38R4+1.0 3466+0.4 30500.0 2634-0.1 22t6-1.0 1888-1.3 t482-Z.O to8t-2.2 ......liiL+ AppliedPsychology in Human ResourceManagement curVJl shawidi! anc,l standard Percentof cases under portionsof 34.13%I13.59%1',,.2,14%0.13'% 13,59%I34.13% the normal curve f Standard = deviations-40-30-20-10a.102030+40 Cumulative percentages0.1% 2,3% 15.9%50.0%84.1%917%99.9% Rounded2% 16%50% 84%98% I I II1II1I111111111III ,10 120 3040 50 6070181 909599 TYPical standard scoresQ,Q2 l'-scoresI III I I!(II I -4.0-3.0-2.0- to01.0 ... 2.0-3.0... 4.0 T T-scores!I!I I I I I 20 30 4060 70 corresponds to38(30t8)and 22{30 - 8). respectively. Also. sincethetotal area under thecurverepresentsthetotal distributionof scores.wecanmarkoff subareasof the totalcorrespondingto +/- I.2.3. and 4standard deviations. Thenumbersinthese subareas are percentages of the total number of people. Thus, in a normal distribution of scores. roughly two-thirds(68.26 percent) of allcaseslie between+/- Istandard deviation. Thissame areaalsoincludesscoresthatlieabovethe16thpercentile(- 10')and belowthe84th percentile (+10'). In the previous example. if anindividual scores 38. we may concludethatthisscoreis10'abovethemean andranksat the84th percentile of persons onwhomthetest wasnormed (provided the distribution of scores in the norm group approximates a normal curve). Percentileranks,whileeasytocomputeandunderstand. suffer fromtwomajor limitations.First.theyareranks and.therefore, ordinal-levelmeasures; theycannot legitimately be added. subtracted. multiplied. or divided. Second. percentile ranks have a rectangular distribution. while test score distributions generally approximate the normal curve. Therefore. percentileunitsarenotequivalentatallpointsalongthescale. NotethatonthepercentileequivalentsscaleinFigure6-5thepercentiledistance between percentile ranks 5 and10(or 90 and 95)isdistinctly greater thanthe distance between45and50,althoughthenumericaldistancesarethesame. Thistendencyof percentile unitstobecomeprogressively smaller toward the center of thescalecauses specialdifficulties inthe interpretation of change. Thus. thedifferences in achievement represented bya shift from45to 50and from94to99are not equal onthepercentile rankscale. sincethe distancefrom45to 50 ismuch smaller thanthat from94to99.In short, if percentiles are used. greater weight shouldbegiventorank differences at the extremes of the scale than to those atthe center. CHAITER 6Measuring and Interpreting IndividualDifferences Standard scores. ontheotherhand, areinterval-scalemeasures(whichbydefinitionpossessequal-sizeunits)and,therefore.canbesubjectedtothecommon arithmeticoperations.Inaddition.theyallowdirectcomparisonof anindividual's performanceondifferentmeasures.For example. aspartofaselectionbattery,three measures with thefollowingmeans and standard deviations (in a sample of applicants) are used: MeanStd. Deviation Test](scorableapplication)305 Test 2 (written test)500100 Test 3 (interview)]00to Applicant A scores 35on Test 1,620 on Test 2. and 105 on Test 3. What does thistell usabouthisorher overallperformance? Assumingeachof thetestspossessessome validitybyitself. converting eachof thesescorestostandard scoreform.wefindthat applicant Ascores(35- )0)/5 =+10'on TestI, (620 - 5(0)1100 =+1.20'on Test2.and (105-100)/10 =+.50' on Test 3. Applicant A appears tobea good bet. Oneof thedisadvantagesof z scores. however,isthattheyinvolvedecimalsand negativenumbers. Toavoidthis,z scoresmaybetransformedtoadifferentscaleby adding or multiplying by a constant. Although many such derived scores are commonly inuse. most of them are based on z. One of the most popular isthe Zscale. in which the mean and standard deviationare set equal to50and10. respectively. Thetransformationissimply Z =50 +IOz(6-12) WhileZ doeseliminatedecimalsandnegativenumbers, sinceit isalinear transformation. the shape of the transformed scores will besimilar to that of the raw scores. If thedistributionof therawscoresisskewed. thedistributionof thetransformed scores alsowillbe skewed. This canbeavoided byconverting rawscoresintonormalizedstandardscores. Tocomputenormalizedstandardscores.percentileranksof raw scores are computed first. Then, fromatable of areas under thenormal curve, the z score corresponding to each percentile rank islocated. In order to get rid of decimals andnegative numbers, the z scores aretransformed intoT scores by the formula T= 50 + IOz(6-13) Notethat the right sidesofEquations 6-12and 6-15are identical. Theonly difference isthat Tscores are normalized standard scores. whereas z scores are simple linear transformations. Normalized standard scores aresatisfactory formost purposes, sincethey serveto smooth out sampling errors. but alldistributions should notbenormalizedasamatter of course.Normalizingtransformations should be carried out onlywhenthesampleis largeand representativeandwhenthereisreasonto believethatthedeviationfrom ;1normality results from defects inthe measurement procedure rather than from characteristics of the sample or from other factors affecting the behavior under considerationJ (Anastasi,1988).Of course.whentheoriginaldistributionof scoresisapproximatelyI normal. thelinearly derived scores andthe normalized scores willbequite similar. AppliedPsychology inHumanResourceManagement Although wedevoted extensive attention inthis chapter to the concept of reliability. the c.:omputationof reliability coefficientsisameanstoanend. Theendistoproduce scores thatmeasureattributes consistently across time. formsof a measure. items within ameasure. andraters. Consistent scores enable predictions and decisionsthatareaccurate. Making accuratepredictions and making correct decisions i ~ particularly significant inemployment contexts. wheremeasurementproceduresareused asvehiclesforforecastingperformance. Thenext chapter addresseestheissue of validity.whichconcerns theaccuracyof predictionsand decisions based on tests. and isclosely connectedtothe concept of reliability. Discussion Questions l.Whyare psychologicalmeasures consideredtobenominal or ordinal innature? 2.Is itproper to speak of the reliability of a test? Why? 3.Whichmethods of estimating reliability produce the highest andlowest (most conservative) estimates? 4.Is interrater agreement thesame asinterrater reliability? Why? 5.Whattypeof knowledgecanbe gatheredthroughtheapplicationof item-responsetheory and generalizabilitytheory? 6.Whatdoes the standard error of measurementtelltheHR specialist? 7.What dotest normstellus? What dotheynot tellus? CHAPTER Validation ane of Individual Differens Measures At aGlance Scores frommeasures of individual differences derive meaning only insofar as they can be relatedto other psychologically meaningful characteristics of behavior. The processes of gathering or evaluating the necessary data are called validation. So reliabilityisa necessary, but not a sufficient property forscores to be useful inHR research and practice. Two issues are of primary concern invalidation-what a test or other procedure measures and how wellitmeasures. Evidence regarding validity can beassessed in several ways: byanalyzing the procedure's content (contentrelated evidence). byrelating scores on theprocedure to measures of performance on some relevant criterion (predictive and concurrent evidence). or by more thoroughly investigating the extent to whichtheprocedure measures some psychological construct (construct-related evidence). When implementing empirical validation strategies. one needsto consider that group differences. the range restriction. the test'sposition intheemployment process, andthe form of the test-predictor relationship can have adramatic impact on the sizeof the obtained validity coefficient. Addi:ional strategies are available whenlocal validation studies are not practically feasible. as inthe case of small organizations. 'These includevalidity generalization, synthetic validity, andtest transportability. 'Thesetypes of evidence are not mutually exclusive. On the contrary, convergence inresults gathered using several lines of evidence should be sought and is highly desirable. Although the validity of individual differences measures isfundamentalto competent and usefulHR practice. there isanother, perhaps more urgent. reason whyboth public- and private-sector organizations are concerned about this issue. Legalguidelines on employee selection procedures require comprehensive, documented validity evidence for any procedure used as abasis foran employment decision ifthat procedure has anadverseimpact on a protected group. ,t!' I1 RELATIONSHIP BETWEENRELIABILITY AND VALIDITY Theoretically itwould be possible to develop a perfectly reliable measure whose scores were wholly uncorrelated with any other variable. Such a measure would have no practical value. nor c.:oulditbe interpreted meaningfully. since its scores couldbe relatedto ooestimates (Barrett et al.. 1981; Schmitt et al.. 1984). Wehasten to add. however, that the concurrent designignorestheeffectsof motivationandjob experienceonability. Whilethemagnitudeof theseeffectsmaybenonsignificantforcognitiveabilitytests. this isless likely tobe thecase with inventories (e.g., measures of attitudes or personality). Jennings (1953). for example, demonstrated empirically that individuals who are secureintheir jobs, whorealizethattheirtestscoreswillinno wayaffecttheirjob standing, and whoare participating inaresearchstudyare notmotivatedtothe same degree asare applicantsfor jobs. Concurrent designs also ignore the effect of job experience on theobtained validity coefficient. One of us once observed a group of police officers (whose average on-the-job experiencewasthreeyears)completing severalinstrumentsaspart of aconcurrent study. One of the instruments was a measure of situational judgment, and a second was a measure of attitudes toward people. It isabsurd to think that presently employed police officers who have been trained at a police academy and who have had three years' experience on the street will respondto a test of situational judgment or an inventory of altitudes in the same wayaswouldapplicants with no prior experience!People learn things inthecourseofdoingajob.andeventsoccurthatmayinfluencemarkedlytheir responsestopredictormeasures. Thus, validitymaybeenhancedor inhibited, withno way of knowing in advance the direction of such influences. In summary, forcognitive ability tests, concurrent studiesappear to provideuseful estimates of empiricalvalidityderivedfrompredictive studies. Althoughthisfacthas been demonstrated empirically, additional research is clearly needed to help understand the reasons for this equivalence. On both conceptual and practical grounds, the different validitydesignsarenotequivalentor interchangeableacrosssituations(Guion& Cranny, 1982). Without explicit consideration of the influence of uncontrolled variables (e.g., range restriction, differences due to age, motivation, job experience) in a given situation, one cannot simply substitute a concurrent design fora predictive one. Requirements of Criterion Measures inPredictive and Concurrent Studies Any predictor measure willbe no better than the criterion used to establish its validity. And, asistrue forpredictors, anything that introduces random error into a set of criterionscoreswillreducevalidity. Alltoo often, unfortunately, itsimplyisassumedthat criterion measures are relevantandvalid. AsGuion (1987)haspointed out, thesetwo terms are different. and it isimportant to distinguish between them. A job-related construct isone chosen because itrepresentsperformanceor behavior on the job that is valuedbyanemployingorganization. Aconstruct-relatedcriterionisonechosen because of itstheoretical relationship, or lackof one, to the construct to be measured. "Does itwork?" is a different question from "Does it measure what we wanted to measure?" Both questionsareuseful.andbothcallforcriterion-relatedresearch.For example. a judgment of acceptable construct-related evidence of validity for subjective ratings might be based on high correlations of the ratings with production data or work samples and of independence fromseniority or attendance data. It isalsoimportantthat criteria be reliable. AsdiscussedinChapter 6.although unreliabilityinthecriterioncanbecorrectedstatistically. unreliabilityisnotrifling matter.Ifratingsarethecriteriaandif supervisorsarelessconsistentinratingsome

Applied Psychology inHuman ResourceManagement 1.'> employeesthaninratingothers,thencriterion-relatedvaliditywillsuffer. Alternatively, if all employees are given identical ratings(e.g" "satisfactory"). then itis a case of trying to predict the unpredictable. Apredictor cannot forecast differences in behavior on the job that do not exist according to supervisors! Finallyweshouldbewareof criterion contaminationincriterion-relatedvalidity studies, It isabsolutely essentialthatcriterion data begathered independently of predictordataand thatnoperson who isinvolved inassigning criterionratings have any knowledgeof individuals' predictor scores. Brown(1979)demonstrated thatfailureto consider suchsourcesof validity distortioncanmisleadcompletelyresearcherswho are unfamiliar with the total selection and training process and withthe specifics of the validity study in question. FACTORS AFFECTING THE SIZE OF OBTAINED VALIDITY COEFFICIENTS Range Enhancement As we noted earlier. criterion-related evidence of validity varies withthe characteristics of the group on whom the test is validated. In general, whenever a predictor is validated onagroupthatismoreheterogeneousthanthegroupforwhomthe predictor ultimately isintended, estimates of validity will be spuriously high. Suppose a test of spatial relations ability, originally intended as ascreening device forengineering applicants, is validated by giving it to applicants for jobs as diverse as machinists, mechanics, tool crib attendants, and engineers inacertain firm. This group isconsiderablymoreheterogeneous than the group for whom thetest was originally intended (engineering applicants only). Consequently, there willbemuchvariance inthetest scores(i.e., range enhancement), and it may look likethetestisdiscriminating effectively. Comparison of validity coefficientsusingengineering applicantsonly withthoseobtainedfromthemoreheterogeneous group will demonstrate empirically the relative amount of overestimation. Range Restriction Conversely, becausethesizeof thevaliditycoefficientisafunctionof twovariables, restrictingtherange(i.e"truncatingorcensoring)either of thepredictor or of the criterion will serve to lower the size of thevalidity coefficient (see Figure 7-1). In Figure 7-1, the relationship between the interview scoresand the criterion data islinear. followsthe elliptical shape of thebivariate normal distribution, andindicates a systematic positive relationship of about .50, Scores are censored neither inthepredictornor inthecriterion, and scoresarefoundinnearlyallthepossiblecategories fromlowtohigh. Thecorrelationdrops considerably,however,whenonlyalimited group is considered, such as those scores fallingto theright of lineX. When such selectionoccurs,thepointsassumeshapesthatare notatallellipticalandindicatemuch lower correlations between predictors and criteria. It istempting toconclude fromthis that selection effects on validitycoefficientsresultfromchanges inthevariance(s)of thevariable(s).However, Alexander(1988)showed that sucheffectsaremoreproperlyconsideredasnonrandomsamplingthatseparatelyinfluencesmeans, variances, and correlations of thevariables. I I I :ij 1 .--l CHAPTER 7Validation andUse of IndividualDifferences Measures 16'" .c Cl Clor " c(J.2 c'"" E ..'" "0t:

I -'I Low XHigh Effect of range restriction 011Interview SCore(predictorlcorre,l\ttiPit. Rangerestrictioncanoccurinthepredictor when,forexample. onlyapplicants whohavesurvivedaninitialscreening are considered or whenmeasures areusedfor selection prior to validation. sothat criterion data are unavailableforlow scorers who did not get hired. This isknown as direct range restriction onthepredictor. Indirect or incidental rangerestriction on thepredictor occurs when an experimentalpredictor is administered to applicants, but isnot used asabasisfor selection decisions(Aguinis & Whitehead,1997).Rather, applicantsareselectedinaccordancewiththeprocedure currentlyinuse,whichislikelycorrelatedwiththenewpredictor.Incidentalrange restriction ispervasive invalidation research(Aguinis & Whitehead,1997). Thorndike (1949).recognizedthismorethan 55yearsagowhenhenotedthatrangerestriction "imposed byindirect selection on the basis of some variable other than the ones being compared... appearsbyfarthemost commonandmostimportant one for anypersonnel selectionresearchprogram" (p.175).Inbothcases,low scorerswhoarehired may become disenchanted with the job and quit before criterion data canbe collected, thus further restricting the range of available scores. The range of scores also maybe narrowed by preselection. Preselection occurs, for example. when a predictive validity study isundertaken afier a group of individuals has beenhired,butbeforecriteriondatabecomeavailableforthem.Estimatesof the validityof theprocedurewillbelowered, sincesuchemployeesrepresentasuperior selection of all job applicants, thus curtailing the range of predictor scores and criterion data.Inshort, selectionatthehiringpointreducestherangeof the predictorvariable(s), andselectiononthe job orduringtraining reducestherangeof thecriterion variable(s).Either type of restriction has theeffect of lowering estimates of validity. In order to interpret validity coefficients properly, information on the degree of range restrictionineither variableshouldbeincluded. Fortunately, formulasareavailablethat correctstatisticallyforthevariousformsof rangerestriction(Sackett&Yang,200(); Thorndike, 1949). There arethreetypesof informationthaIcan be usedtodecide which correctionformulatoimplement: (I) whether restrictionoccursonthepredictor, the criterion. or athirdvariablecorrelated withthepredictor and/orcriterion: (2)Whether unrestricted variances forthe relevant variables are known: and (3)whether the third variable. if involv,"d, is measured or unmeasured. SackettandYang(2000)describedIIdifferentrange-restrictionscenariosderivedfromcombiningthesethreetypesof information andpresentedequationsandproceduresthatcanbeusedforcorrectingvalidity coefficientsineachsituation. However, beforeimplementing a correction, oneshouldbeclear '" "';,>;;;,3,M" ;#:/ c,':i,:;'i'P!,#..,--,l,,-, ""M.A :;;:1%b SS;PU;:ZWk'7 l(ifl, Applied PsychologyinHumanResourceManagement '"\' about whichvariableshavebeen subjectedtodirectand/or indirect selectionbecause the incorrect application of formulas can lead to misleading corrected validity coefficients, Tocorrectfordirectrangerestrictiononthepredictor whennothirdvariableis involved, the appropriate formula is as follows (this formula can also beusedto correct fordirect rangerestriction onthe criterion whenno thirdvariable isinvolved): (7.4)= 52r 'r 5 s 2 1- r + r wherer"istheestimatedvaliditycoefficientintheunrestrictedsample,risthe obtainedcoefficientintherestrictedsample, 5isthestandarddeviationof theunrestricted sample, and s isthe standard deviation of the restricted sample. In practice, allof theinformationnecessary to useEquation 7-4may not be available. Thus,asecondpossiblescenarioisthatselectiontakesplaceononevariable (eitherthepredictororthecriterion), but theunrestrictedvarianceisnotknown. For example, this canhappen to thecriterion due toturnoverortransferbefore criterion data could be gathered. Inthis case, the appropriate formulais f 52, (75)r" =1--(l-r)52 where allsymbols aredefined as above. In yetathirdscenario, if incidentalrestrictiontakes place onthirdvariable z and theunrestrictedvarianceonz isknown,theformulafortheunrestrictedcorrelation between xand y is rxv +rz.t r:y(5;/5; - I) (7.6) .1-1},(5; / 5;15; -I) Inpractice, theremayberangerestrictionscenariosthataremoredifficultto addresswith corrections. Suchscenariosinclude(1)those wheretheunrestrictedvarianceonthepredictor.thecriterion, orthethirdvariableisunknownand(2)those where there issimultaneous or sequential restriction on multiple variables. Fortunately, there areprocedures to address each of these types of situations. Alexander, Alliger, and Hanges (1984) described an approach to address situations whereunrestrictedvariancesarenotknown.Forexample,assumethatthe scenario includesdirectrestrictiononthepredictor x,buttheunrestrictedvarianceon xis unknown. Hrst. one computes Cohen's (1959)ratio,(s"I(x - k)2),where 52the variance intherestricted sample,x isthemean of x forthe restricted sample, andkisan estimate of thelowestpossible xvalue that could have occurred. Because this ratio has auniquevalueforanypointof selection, itispossibletoestimatetheproportional reductionintheunrestrictedvariance(i.e.,)basedonthisratio.Alexanderetal. (1984)provided atableincluding various values forCohen's ratio and the corresponding proportional reductioninvariance. Based onthevalue shown inthe table, one can computeanestimateoftheunrestrictedvariancethatcanbeusedinEquation7-4. CHAPTER 7Validation and Use of Individual Differences Measures I('This procedure canalsobeusedto estimatethe(unknown)unrestrictedvariancefor third variable z, andthis information canbeused inEquation 7-6. Regarding simultaneous or sequential restriction of multiple variables. Lawley (1943) derivedwhatiscalled themultivariatecorrection formula. Themultivariatecorrection formulacanbeusedwhendirectrestriction(of oneor twovariables)andincidental restrictiontakeplacesimultaneously. Also,theequationcanbeusedrepeatedlywhen restriction occurs on asample that isalready restricted. Althoughtheimplementation of the multivariate correction is fairly complex, Johnson and Ree (1994) developed the computer program RANGEJ. which makes this correction easy to implement. Inanempiricalinvestigationof theaccuracyof suchstatistical corrections, Lee, Miller. and Graham (1982) compared corrected and uncorrected estimates of validity for the Navy Basic TestBattery to theunrestricted true validity of the test. Groups of sailors were selected according to fivedifferent selection ratios. In all cases. the corrected coefficients better estimatedtheunrestrictedtrue validity of the test. However. later research byLee and Foley (1986)andBrown, Stout, Dalessio. and Crosby (1988)has shownthat correctedcorrelationstendto fluctuateconsiderably fromtest score rangeto test score range. with higher validity coefficients at higher predictor score ranges, Indeed. if predictor-criterionrelationshipsareactually nonlinear. but alinear relationshipisassumed. application of the correction formulas will substantially overestimate the true population correlation. Also. insome instances, the sign of thevalidity coefficient can change after a correction is applied (Ree, Carretta. Earles, & Albert, 1994). It isalso worthnoting that corrected correlations didnot haveaknownsampling distributionuntil recently. However, Raju andBrand (2003) derivedequations forthe standarderror of correlations correctedforunreliabilitybothinthepredictor and thecriterion and forrangerestriction. So, itisnow possibleto assessthevariability of correctedcorrelations.aswellastoconducttestsofstatisticalsignificancewith correlations subjectedto a triple correction. Although thetestof statistical significanceforthe corrected correlationisrobust and TypeIerrorratesarekept at theprespecifiedlevel.theabilityconsistentlyto rejectafalsenullhypothesiscorrectly remains questionableunder certainconditions (i.e., statistical power does not reach adequate levels). The low power observed may be duetothe factthatRajuandBrand's(2003)proposedsignificancetestassumesthat thecorrectedcorrelations arenormallydistributed. Thisassumptionmay notbetenable inmany meta-analytic databases (Steel &Kammeyer-Mueller, 2(02). Thus, .. there isadefiniteneedfordevelopingnewsignificancetestsforcorrelationscorrectedfor unreliability and rangerestriction" (Raju &Brand. 2003, p. 66). Finally, weemphasizethatcorrections areappropriateonly whenthey are justified based on the target population (i.e., the population to which one wishes to generalize the obtained corrected validity coefficient). For example, if one wishes to estimate thevaliditycoefficientforfutureapplicantsforajob. butthecoefficientwasobtainedusing asampleof currentemployees(already selected) inaconcurrentvaliditystudy.thenit wouldbe appropriate to use a correction. On the other hand. if one wishes to use the test forpromotionpurposesinasampleof similarly preselectedemployees.thecorrection would notbe appropriate. Ingeneral, itisrecommended thatboth corrected anduncorrected coefficients be reported. together with information on the type of correctionthat was implemented (AERA,APA. &NCME, 1999. p.159).1bis is particularly important in situations whenunmeasured variables playa large role (Sackett&Yang. 2000). + Applied Psychology inHumanResource Management Position in the Employment Process Estimatesofvaliditybasedonpredictivedesignsmay differ depending onwhether ameasureofindividualdifferences isusedasaninitialselectiondeviceorasafinal hurdle. Thisisbecause varianceismaximized whenthepredictor isusedasan initial device(i,e..amoreheterogeneous groupof individuals provides data) and varianceis often restricted when the predictor isused later on inthe selection process (i.e.. a more homogeneous group of individuals provides data). Form of the Predictor-Criterion Relationship Scattergramsdepictingthenatureof thepredictor-criterionrelationshipalways shouldbeinspectedforextremedeparturesfromthestatisticalassumptionson which the computed measure of relationship isbased. If an assumed typeof relationshipdoesnot correctlydescribethedata. validitywillbeunderestimated. ThecomputationofthePearsonproduct-momentcorrelationcoefficientassumesthatboth variablesarenormally distributed, that the relationshipislinear, andthatwhenthe bivariatedistributionofscores(fromlowtohigh)isdividedinto segments.the columnvariancesareequal. Thisiscalledhomoscedasticity. In lesstechnicalterms, thismeans that the data points are evenly distributed throughout the regression line and the measure predicts as wellat high score ranges asat low scoreranges (Aguinis, Petersen, &Pierce.1991J;Aguinis&Pierce,1998).Inpractice. researchersrarely checkforcompliancewiththeseassumptions(Weinzimmer. Mone, &Alwan,1994). andtheassumptions often arenot met.In one study(Kahneman& Ghiselli, 1962), approximately40percentofthevaliditiesexaminedwerenonlinearand/or heteroscedastic. Generally, however, when scores on thetwovariables being related arenormallydistributed, theyalsoarehomoscedastic. Hence, if wecanjustifythe normalizing of scores, wearevery likely to have arelationship that ishomoscedastic aswell (Ghiselli et aI.. llJ81). CONSTRUCT-RELATED EVIDENCEt... .. b .b' t'. Neither content- nor criterion-relatedvaliditystrategies haveastheiraSlca!ec Ive .t tht a test measuresContent-related eVidenceIthe understanding of a trait or construca.d d.d't r'onI h 'h 'ter the intendeomalllancn eIisconcerned withthe extent to wIC I ems cov.'d'd' related evidence is concerned with the empirical relationship between a preIctor an .dd't' onsomesort of conceptual frame-a criterion. Yet.inour quest forImprovepreIC I..... fh .dI'dta and to isrequired to orgalllzeanexpamour.'htt d't'ninvestigation. The conceptual framework speCifiesthe meamng of te consruc.lid! guishes itfromother constructs. andindicateshowmeasuresof .theconstruct saufI relatetoothervariables(AERA. APA,&NCME,1999).Thlsthea construct-related evidence of validity. It provides the evidenttal baSISforthemterpretation of scores (Messick, 1995).. 'd..fbout a construct requires a demonstratIOn that a test mea-ValIatmg IIIerences a.fI'd' (b Ib'I'f..)'rkd .bk ..ththbeen showntobecriticalforJobperormance..InIcatorsaa1 Ityscoreromanmtelllgencetest.lesttassanJO tass suresaspeCIfIcconstrucdtha aS about J' obperformance fromtestscores OncethiSISaccomphshe,tenmferences.. are.bylogicalimplication. justified(Binning&Barrett.llJ89). The focusIS onaI '".y;'''',,,,col CHAPTER 7Validation andUse of Individual Differences Measures flt''?af'description of behavior that is broader and more abstract. Construct validation isnot accomplished ina single study; it requires an accumulation of evidence derived from many different sources to determine themeaning of thetestscores and anappraisal of theirsocialconsequences(Messick,1995).It is.therefore. both alogicalandan empirical process. The process of construct validation begins with the formulation bythe investigator of hypotheses aboutthe characteristics of thosewithhigh scoreson aparticular measurement procedure. in contrast tothose with low scores. Viewed intheir entirety, such hypotheses forma tentativetheory about the nature of the construct the test or other procedureisbelieved tobe measuring. These hypotheses then may be usedtopredict how people at different score levels on the test will behave on certain other testsor in certain defined situations. Notethatinthisprocessthemeasurementprocedureservesasasign (Wernimont&Campbell,1968), clarifyingthenatureof thebehavioraldomainof interest andthustheessential nature of the construct. The construct(e.g..mechanicalcomprehension. social power)isdefinednotbyanisolatedevent, but rather by anomologicalnetwork-a systemof interrelatedconcepts, propositions,andlaws that relates observable characteristics to other observables, observables totheoreticalconstructs,oronetheoreticalconstructtoanothertheoreticalconstruct (Cronbach&Meehl,1955).Forexample.forameasureof perceivedsupervisory socialpower(i.e..asupervisor'sabilitytoinfluenceasubordinateasperceivedby the subordinate; Nesler. Aguinis, Quigley, Lee, &Tedeschi,1999). one needs to specifythe antecedents and the consequents of this construct. Thenomological network mayincludeantecedentssuchasthedisplayof specificnonverbalbehaviors-e.g., making direct eye contact leading to a female(but not a male) supervisor being perceived ashaving high coercive power (Aguinis &Henle, 2001: Aguinis. Simonsen. & Pierce,1998) - andaresultingdissatisfactoryrelationshipwithhisor hersubordinate.which,inturn,mayadverselyaffectthesubordinate'sjobperformance (Aguinis, Nesler, Quigley, Lee, &Tedeschi,1996). Informationrelevanteithertotheconstructortothetheory surroundingthe construct may be gathered fromawide variety of sources. Eachcan yieldhypothesesthatenrichthedefinitionof aconstruct. Amongthesesourcesof evidenceare the following' . . 1.Questions asked of testtakers about their performance strategies or responsesto'particular .... .Itemsorquesllonsaskedat ratersaboutthereasonsfortheIrrattngs(AERA. APA. & NCME. 1999; Messick,1995). . .2.Analyses of theInternal consIstency of the measurement procedure. 3. Expert judgment that the content or behavioral domainbeing sampled bythe procedure pertainsto the construct inquestion. Sometimes this has led to a confusionbetween content and constructvalidities.but. sincecontentvaliditydealswithinferencesabouttestconstruction, whileconSlructvalidityinvolvesinferencesabouttest "cores,contentvalidity.atbest,isone typeof evidenceofconstructvalidity(Tenopyr.1977). Thus.inonestudy(Schoenfeldt, Schoenfeldt, Acker, &Perlson,1976).readingwasmeasureddirectlyfromactual malenalsreadontheJobratherthanthroughanmferentla!chamfromvanoUSpresumed e.g.,a ver matched so wellthatthere waslittle questionthat common constructs underlay performance on both' o' r,.;c1' "',ri" .L;ft...j4,nKF '[email protected]!Miiii&4M+",M"" vb.Applied Psychology inHuman Resource Management1. 4. Correlationsof anewprocedure(purportedlyameasureof someconstruct)withestablished measures of thesame construct. 5. Factor analyses of a group of procedures, demonstrating whichof them share common variance andthusmeasure the same construct (e.g..Shore &Tetrick. 1991). 6. Structural equation modeling(e.g..usingsuchsoftwarepackagesasAMOS.EQS.orLIS REL)thatallowsthetestingof ameasurementmodelthatlinksobservedvanablesto underlying constructs andthetesting of a structuralmodel of therelationshipsamong constructs (e.g.. Pierce. Aguinis. &Adams. 20(0). For example. Vance. Coovert. MacCallum. and Hedge(1989)usedthisapproachtoenhanceunderstandingof howalternativepredictors (ability, experience. andsupervisorsupport)relateto differenttypesof criteria(e.g..self. supervisor.andpeerratings; worksampleperformance: andtrainingsuccess)acrossthree categoriesoftasks(installationofengineparts.inspectionofcomponents.andforms completion). Suchunderstanding mightprofitablybeusedtodevelopa generalizabletask taxonomy. 7. Ability of the scores derived from a measurement procedure to separate naturally occurring or experimentally contrived groups(group differentiation) or todemonstraterelationships between differences in scores and other variables on whichthegroups differ. S. Demonstrations of systematic relationships between scores from a particular procedure and measures of behavior in situations where the construct of interest isthought tobe animportant variable. For example, a paper-and-pencilinstrument designed to measureanxiety can beadministeredtoagroupof individualswhosubsequentlyareputthroughananxietyarousing situation such asa finalexamination. Thepaper-and-pencil test scores wouldthen be correlatedwiththephysiologicalmeasuresof anxietyexpressionduringtheexam. A positiverelationshipfromsuchanexperimentwouldprovideevidencethattestscoresdo reflect anxietytendencies. 9.Convergent and discriminant validation. This purpose is closely relatedto procedures 3 and 4.Not only shouldscoresthat purportedly measure some construct berelatedtoscores on other measures of the same construct (convergent validation). but also they should beunrelatedtoscoresoninstrumentsthatarenotsupposedtobemeasuresof thatconstruct (discriminant validation). Asystematicexperimentalprocedureforanalyzingconvergentand discriminant validities has been proposed by Campbell and Fiske(1959). They pointed out that any test(or other measurementprocedure)isreallyatrait-methodunit-that is,atest measures agiventraitby asinglemethod. Therefore, since wewanttoknow therelativecontributions of trait and method varianceto test scores, we must study more than one trait (e.g.. dominance, affiliation) and use more than one method (e.g..peer ratings. interviews). Such studies are possible using amultitrait-multimethod (MTMM) matrix (see Figure 7-2). FIQU&t 1;'1.Example oCa: multip-atf:muiiiq1etbodmatrix. MerltodIMelhod 2 rraitsAlBlA2B2 Method IAIa BIb Method 2A2 B2d CHAPTER 7Validation andUseof Individual Differences Measures 17t, An MTMMmatrixissimplyatabledisplayingthecorrelations among(a)the sametraitmeasuredbythesamemethod,(b)differenttraitsmeasuredbythesame method. (c)the same trait measured by different methods. and (d) different traits measuredby different methods. The procedure canbeused to stUdyany number andvariety of traitsmeasured by anymethod. In order to obtainsatisfactory evidenceforthe validity of a construct, the (c) correlations (convergent validities) should be larger than zeroandhighenoughtoencouragefurtherstudy.Inaddition.the(c)correlations should behigher than the (b)and (d) correlations ( discriminant validity). For example, if the correlation between interview (method 1) ratings of two supposedly different traits (e.g.. assertiveness and emotional stability) ishigher than the correlationbetweeninterview(methodI)ratingsandwrittentest(method2)scoresthat supposedly measure the same trait(e.g" assertiveness). thenthevalidity of the interview ratings as ameasure of the construct "assertiveness" would be seriously questioned. Notethatinthisapproachreliabilityisestimatedbytwomeasuresofthesame traitusingthesamemethod(inFigure7-2,the(a)correlations),whilevalidityis defined astheextenlof agreementbetween twomeasures of thesame traitusing differentmethods(in Figure7-2,the(c)correlations).Once again,thisshowsthatthe concepts of reliabilityand validity are intrinsically connected andagoodunderstanding of both isneeded to gather construct-related validity evidence. Althoughthelogicof thismethodisintuitively compelling.itdoeshavecertain limitations, principally, (1)thelackof quantifiable criteria, (2)theinabilitytoaccount fordifferentialreliability. and(3)theimplicitassumptionsunderlyingtheprocedure (Schmitt &Stults, 1986). One such assumption isthe requirement of maximally dissimilar oruncorrelatedmethods, since.ifthecorrelationbetweenmethodsis0.0.shared method variance cannot affect the assessment of shared trait variance. Whenmethodsarecorrelated.however, confirmatoryfactoranalysisshOUldbe used.Usingthismethod, researchers candefinemodelsthatproposetraitor method factors(orboth)aprioriandthentesttheabilityof suchmodelstofitthedata. The parameter estimates andability of alternativemodelsto fitthedata areusedto assess convergentanddiscriminantvalidityandmethod-halo effects.Infact.whenmethods are correlated, use of confirmatory factor analysis instead of the MTMM approach may actually reverse conclusions drawn in prior studies (Williams. Cote. &Buckley,1989). When analysisbeginswithmultipleindicators of each TraitXMethodcombination, second-order or hierarchical confirmatory factor analysis (HCFA) should be used (Marsh&Hocevar,1988).Inthisapproach,first-orderfactorsdefinedbymultiple items or subscales are hypothesized for each scale. and the method and trait factors are proposed as second-order factors. HCFA supports several important inferences about the latent structure underlying MTMMdatabeyondthosepermittedbytraditionalconfirmatoryfactoranalysis (Lance. Teachout, &Donnelly.1992): 1.Asatisfactoryfirst-orderfactormOdel thatindicatorshavebeenassignedcor. rectlyto TraitX Methodunits. 2. Given a satisfaclory measurement modeL HCFA separates measurement error fromumque systematic variance. Theyremain confounded intraditional confirmatory factoranalyses of MTMMdata. 3.HCFA pnmltsinterencesregardingtheextenttowhichtraitsandmeasurementmethod, are correlated. AppliedPsychologyinHumanResourceManagement Illustration A construct validation paradigm designed to study predictor-job performance linkages inthe Navy recruiter's job was presented by Borman, Rosse, and Abrahams (1980) and refinedandextendedbyPulakos.Borman.andHough(1988).Theirapproachis describedhere,sineeitillustratesnicelyinterrelationshipsamongthesourcesof construct-related evidencepresented earlier. Factor analyses of personality and vocational interest items that proved valid in a previous Navy recruiter test validation study yieldedseveralfactorsthatwereinterpretedasunderlyingconstructs(e.g..selling skills,humanrelationsskills), suggestingindividualdifferencespotentially important for successon therecruiter job. Newitems, selected or writtentotaptheseconstructs, along withtheitems foundvalidinthepreviousrecruiter study, were administered to aseparatesampleof Navyrecruiters.Peer andsupervisoryperformanceratingsalso were gathered for these recruiters. Data analysesindicatedgood convergent anddiscriminantvaliditiesinmeasuring manyoftheconstructs.Forabouthalftheconstructs.theadditionofnewitems enhancedvalidityagainsttheperformancecriteria. Thisapproach(i.e..attemptingto discover, understand. and then confirm individual differences constructs that are important foreffectiveness on a job) isa workablestrategy for enhancing our understanding of predictor-criterion relationshipsandanimportant contributiontopersonnelselectionresearch. . 'j"'.'CROSS-VALIDATION ):;:The prediction of criteria usingtest scoresisoften implemented byassumingalinear andadditiverelationshipbetween the predictors(i.e..varioustests)andthe criterion. Theserelationshipsaretypicallyoperationalizedusingordinary leastsquares(OLS) regression.inwhichweightsareassignedtothepredictorssothatthedifference 1 betweenobservedcriterion scoresandpredictedcriterionscoresisminimized(see Appendix B). Theassumptionthatregressionweightsobtainedfromone samplecanbeused1 with other samples witha similar level of predictive effectiveness isnot true inmost situations. Specifically. the computation of regression weights isaffected by idiosyn1crasies of the sampleon whichtheyarecomputed, anditcapitalizeson chancefactors sothatpredictionisoptimizedinthesample. Thus, whenweights computedin I onesample(i.e..current employees)areused withasecondsamplefromthesame population(i.e., jobapplicants), themUltiplecorrelationcoefficientislikelytobe smaller. This phenomenonhasbeenlabeledshrinkage(Larson,IlJ31).Shrinkageis likelytobeespeciallylargewhen(1)initialvalidationsamplesaresmall(and, therefore, havelarger sampling errors), (2)a "shotgun" approach isused(i.e., when a miscellaneous set of questions isassembled withlittleregard to their relevanceto criterionbehaviorandwhenallitemssubsequentlyareretainedthatyieldsignificantpositive or negative correlations witha criterion), and(3)when thenumber of predictorsincreases(duetochancefactorsoperatinginthevalidationsample). Shrinkageislikelyll)belesswhenitemsarechosenonthebasisofpreviously formedhypothesesderivedfrompsychologicaltheoryor onthebasisof paslwith the criterion (Anastasi.1988). ! 1 CHAPTER 7ValidationandUse of IndividualDifferences Measures tt' Giventhepossibilityof shrinkage. animportantquestionistheextentto which weightsderivedfromasamplecross-validate(i.e..generalize).Cross-validity(i.e..Pc) referstowhethertheweightsderivedfromone samplecanpredictoutcomes/tothe samedegreeinthepopulationasawholeor inother samplesdrawnfromthesame popUlation. If cross-validityislow.theuseof assessmenttoolsandpredictionsystems derivedfromonesamplemaynotbeappropriateinothersamplesfromthesame population.Unfortunately. itseems researchers are notawareof thisissue. Areview of articles published in Academy of Management Journal, Administrative Science Quarterly, and StrategicManagement Journal between JanuaryIlJ90 and December 1995 foundthat noneof thearticlesreviewedreportedempiricalor formula-basedcross-validation estimates(S1.John&Roth,L9(9). Fortunatelythereareproceduresavailabletocompute cross-validity.CascioandAguinis(2001)provideddetailedinformation ontwo types of approaches: empirical and statistical. Empir";,,! CrwJ-l'''!I,)"liOl' The empirical strategy consists of fittinga regression modelina sample and using the resultingregressionweightswithasecondindependentcross-validationsample. The multiplecorrelationcoefficientobtainedbyapplyingtheweightsfromthefirst(i.e., "derivation") sampletothesecond(i.e., "cross-validation")sample isusedasanestimate of Pc' Alternatively, only one sample isused, but itis divided into two subsamples, thus creating aderivation subsample andacross-validation subsample. Thisisknown asa single-sample strategy. Sl"ll:"l"',,! Cro,id-l'"!u),,ll'''' Thestatisticalstrategyconsistsof adjustingthesample-basedmultiplecorrelation coefficient(R)byafunctionof sample size(N)andthenumberof predictors(k). Numerousformulasareavailabletoimplementthestatisticalstrategy(Raju,Bilgic, Edwards, &Fleer, 1997). The most commonly implemented formulato estimate crossvalidity (i.e., Pc)isthe following (Browne,L975): 2(N - k- 3)p4+ p2 p,=2 (7.7) (N-2k-2lP+p where p isthepopulation multiple correlation. The squared multiple correlation in the population, p2. canbecomputed asfollows: ? N- I(1- R2) (7-8)p.= 1- N_k. 1 NotethatEquation 7-'13 iswhatmost computer outputslabel "adjustedR2" andis only anintermediate stepincomputing cross-validity(i.e..Equation 7-6).Equation 7-8 doesnot directlyaddressthecapitalizationonchanceinthesampleathandand addressestheissueof shrinkageonlypartiallybyadjustingthemultiplecorrelation coefficientbasedonthesamplesizeandthenumber ofpredictorsintheregression -_._,. ,t L.., -PM -." Applied Psychology inHuman Resource Management ~ ~model (St. John &Roth. 1999). Unfortunately there isconfusion regarding estimators of rl andPl:,asdocumented byKromreyandHines (1995. pp. 902-903). The obtained "adjustedRZ"doesnotaddresstheissueof prediction optimization duetosample idiosyncrasiesand.therefore, underestimatestheshrinkage. Theuseof Equation 7-7 incombillillionwithEquation 7-8 addresses this issue. Compari.lOfl of t/llpiru:al and Stati.ltu:al Strategic'! Cascio and Aguinis (2001) reviewed empirical and statistical approaches and concluded that logistical considerations, as well asthe cost associated with the conduct of empirical cross-validationstUdies. canbequitedemanding.Inaddition,thereseemtobe no advantagestoimplementing empiricalcross-validation strategies. Regarding statistical approaches. the most comprehensive comparison of various formulae available to date wasconductedbyRaju,Bilgic. Edwards,andFleer(1999),whoinvestigated11 crossvalidityestimationprocedures. Theoverall conclusionof thisbodyof researchis .that Equation 7-7 provides accurate results as long as the total sample size isgreater than 40. The lessonshouldbeobvious. Cross-validation, includingrescaling andreweighting of items if necessary. should be continual (we recommend it annually), for as values change, jobs change, and people change. so also do the appropriateness and usefulness of inferences made fromtest scores. GATHERING VALIDITY EVIDENCE WHEN LOCAL VALIDATION IS NOT FEASIBLE Inmanycases,localvalidationmaynotbefeasibleduetologisticsorpractical constraints.Forexample, smallorganizationsfinditextremelydifficultto conduct criterion-relatedandconstruct-relatedvaliditystudies.Onlyone or. at most, several personsoccupy each job inthefirm.and. over aperiodof several years. onlyafew more may behired. Obviously, the sample sizesavailable do not permit adequate predictivestudiestobeundertaken. Fortunately. thereareseveral strategies availableto gather validityevidenceinsuchsituations. These include synthetic validity.testtransportability, and validitygeneralization. Synthetic Validity Synthetic validity(Balma.1959)istheprocessofinferringvalidityinaspecific situationfromasystematicanalysisof jobsintotheirelements,adetermination of testvalidityfortheseelements.andacombination orsynthesisof theelemental validitiesintoawhole(Johnson. Carter. Davison.&Oliver, 2001). Theprocedure hasacertainlogicalappeal. AswaspointedoutinChapter4.criteriaaremultidimensionalandcomplex, and.ifthevariousdimensionsof jobperformanceare independent. each predictor inabattery maybevalidatedagainst theaspect of job performanceitisdesignedtomeasure.Suchananalysislendsmeaning tothepredictorscoresintermsof themultipledimensionsof criterionbehavior.Although thereareseveraloperationalizations of synthetic validity(Jeanneret.1992). allthe availableproceduresarebasedonthecommoncharacteristicofusingavailable informationaboutajob togatherevidenceregardingthejob-relatednessofatest (Hoffman &McPhail.1998). CHAPTER 7Validation andUse of Individual Differences Measures 1, Forexample, the jobs clerk, industrialproductssalesperson, teamster, andteacher are different. but the teacher and salesperson probably share a basic requirement of verbaltluency: the clerkandteamster, manual dexterity: theteacher and clerk, numerical aptitude: and the salesperson and teamster, mechanical aptitude. Although no one test or other predictor is valid for the total job, tests are available to measure the more basic job aptitudes required. Todeterminewhichtests to usein selecting persons forany particular job, however. one firstmust analyzethejob into itselementsandspecifycommon behavioralrequirementsacross jobs.Knowingtheseelements. one thencan derivethe particular statistical weight attached to each element (the size of the weight isa function of theimportanceof theelementto overalljob performance). Whenthestatistical weights are combined with the test element validities, it ispossible not only to determine which tests to use. but also to estimate the expected predictiveness of the tests for the job inquestion. Thus. a "synthesized validbattery" of tests maybe constructed foreach job. The Position AnalysisQuestionnaire(McCormick, Jeanneret, &Mecham,1972), ajob analysis instrument that includes generalized behaviors required in work situations. routinelymakes synthetic validity predictions foreach job analyzed. Predictions are based on theGeneral Aptitude Test Battery (12teststhat measure aptitudesinthefollowing areas: intelligence. verbalaptitude, numerical aptitude. spatial aptitude, form perception, clerical perception, motor coordination. finger dexterity. and manual dexterity). Research to date has demonstrated that synthetic validation isfeasible(Jeanneret, 1992)andlegallyacceptable(Trattner,1982)andthattheresultingcoefficientsare comparable to (albeit slightly lower than) validity coefficients resulting from criterionrelatedvalidationresearch(Hoffman&McPhail,1998).Inaddition, Hollenbeckand Whitr(1988)showedthattheorder of validationandaggregationneednotbe fixed.'! hat is, it ispossible to aggregate across job elements and elemental performance ratingsandthentoassesstest-jobperformancerelationshipsempirically.Doing so reducesthesamplesizesrequiredforsynthetic validityand may allow more small businesses to use this procedure. Test Transportability Testtransportabilityisanotherstrategyavailabletogather validityevidencewhen alocal validation study isnot feasible. The UniformGuidelines onEmployee Selection Procedures(1978)notesthat, tobeableto useatestthathasbeenusedelsewhere locally without the need for a local validation study, evidence must be provided regarding the following(Hoffman &McPhail. 1998): The results of a criterion-related validity study conducted at another location lbe results of a test fairnessanalysis based on a study conducted atanotherlocation where technically feasible(test fairness is discussed indetail inChapter 8) Thedegree of similarity betweenthe job performedbyincumbents locally andthat performed atthe location where thetest has been usedpreviously: this can beaccomplished byusing task- or worker-oriented job analysisdata (Hoffman.1999;job analysisisdiscussed in detail inChapter 9) The degree of Similaritybetweentheapplicants inthe prior andlocal settings Given that data collected in other locations are needed. many situations are likely topreclude gathering validityevidenceunderthetesttransportabilityrubric.Onthe AppliedPsychology inHumanResource Management other hand, thetest transportability option isagood possibilitywhen atestpublisher hastakenthenecessarystepstoincludethisoptionwhileconductingtheoriginal validation research (Hoffman & McPhail, 1998). Validity Generalization Ameta-analysisisaliteraturereviewthatisquantitativeasopposedtonarrative in nature(Hedges&Olkin.1985:Huffcut, 2002:Hunter &Schmidt, 1990;Rothstein, McDanieL&Borenstein. 2002). Thegoalsof ameta-analysisaretounderstandthe relationshipbetweentwovariablesacrossstudies andthevariabilityof thisrelationshipacrossstudies (Aguinis&Pierce, 1998). Inpersonnelpsychology.meta-analysis has been usedextensively to provide a quantitative integration of validitycoefficients computedindifferent samples. Theapplicationof meta-analysistotheemployment testing literature was seen as necessary. giventhe considerable variability from study to studyinobservedvaliditycoefficientsandthe factthatsome coefficientsarestatistically significant. whereas others are not (Schmidt & Hunter. 1977), even when jobs and tests appear to be similar or essentially identical (Schmidt &Hunter, 20ma). If, infact, validitycoefficientsvaryfromemployertoemployer, regionto region,acrosstime periods, and so forth, the situation specificity hypothesis would betrue, local empirical validationwouldberequired ineachsituation, and itwould be impossibletodevelop general principles and theories that are necessary to take the fieldbeyond a mere technology to the status of ascience(Guion, 1976). Meta-analyses conducted with the goal of testingthesituationalspecificityhypothesishave beenlabeledvalidity generalization (VG) studies (Schmidt &Hunter, 2003b). VG studies have been applied to over 500 bodies of research inemployment selection, eachonerepresentingadifferentpredictor-criterioncombination(Schmidt& Hunter, 2003b). Rothstein (2003)reviewed several such studies demonstrating validity generalizationforsuchdiversepredictorsasgradepointaverage(Roth,BeVier, Switzer, &Schippmann. 1996), biodata(Rothstein, Schmidt.Erwin, Owens, &Sparks, 1990), and job experience (McDanieL Schmidt, &Hunter, 1988). But note that there is aslightdifferencebetweentestingwhetheravaliditycoefficientgeneralizesand whether the situation-specificity hypothesis istrue (Murphy. 2000, 2003). The VG question isanswered by obtaining a mean validity coefficient across studies and comparing ittosome standard (e.g., if 90 percent of validity coefficients aregreater than .10.then validity generalizes).The situation-specificity question isanswered by obtaining a measure of variability(e.g.,SD)of thedistributionof validity coefficientsacrossstudies. Validitymaygeneralizebecause mostcoefficients are greater thanapresetstandard, butthere stillmaybe substantialvariabilityinthe coefficientsacrossstudies(and, in thiscase,thereisaneedtosearchformoderatorvariablesthatcanexplainthis variance; Aguinis &Pierce. 1998). IfaVGstudyconcludesthatvalidityforaspecifictest-performancerelationship generalizes, then this information can be used inlieuof alocal validation study. 'Thisallowssmallorganizationstoimplementteststhathavebeenusedelsewhere withouttheneed to collect data locally.However. there isstillaneed to understand the job duties inthe local organization. In addition, sole reliance on VG evidenceto support testuseisprobably premature. Areview of thelegal status ofVG (Cascio & Aguinis,2004)revealedthatonlythreecasesthatreliedonVGhavereachedthe

CHAPTER 7Validation andUse of Individual Differences Measures tV' appeals-courtlevel, andcourtsdonotalwaysaccept VGevidence.Forexample. in Bernardv. Gllif Oil Corp.(1989), the court refused VG evidencebydisallowingthe argument that validity coefficientsfromtwo positions withinthe same organization indicatethatthesameselectionbatterywouldapplytoother jobswithinthecompanywithoutfurtheranalysisoftheother jobs.Basedonthisandotherevidence, Landy(2003)concludedthat "anyone considering thepossibility of invoking VG as thesoledefenseforatestortesttypemightwanttoseriouslyconsiderincluding additionaldefenses(e.g.,transportabilityanalyses)andwouldbewelladvisedto know the essential duties of the job inquestion, and inits local manifestation, weJr' (p.189). Holco .g to't; '" u I/ I,t">" co'" Eto

.f ;; 3V> '"c::> LI-'--- _ RejectAccept FlGUItE3-1Pt:!$ilive.Predictor score quadrants 1 and 3. with relatively few dots in quadrants 2 and 4, positive validity exists. If the relationship were negative (e.g., the relationship between the predictor conscientiousness and the criterion counterproductive behaviors), most of the dots would fall in quadrants 2and 4. Figure 8-1showsthat therelationship ispositive and people with high (low) predictor scores also tend tohave high (low) criterion scores. In investigating differential validityforgroups(e.g.,ethnicminorityandethnicnonminority),ifthejoint distribution of predictor andcriterionscoresissimilarthroughoutthesCiltterplotin each group, asinFigure8-1,noproblemexists, and useof thepredictor canbecontinued. On theother hand, if the joint distribution of predictor andcriterion scores is similar foreach group, but circular, asin Figure 8-2, there isalso no differential validity,butthepredictor isuselessbecauseitsuppliesnoinformationof apredictive nature. So thereisno pointininvestigating differentialvalidityintheabsenceof an overall pattern of predictor-criterion scoresthat allowsfortheprediction of relevant criteria. Differential Validity and AdverseImpact An important consideration in assessing differential validity is whether the test in questionproducesadverseimpact. TheUniformGuidelines(1978)statethata "selection >co .g toJ!l i: "c?llI I 1" c " '"2E

.fV> '"c ::> IL----"_ RejectAccept FIGt!1tE 8-2Zecp validit,Y:?;Predictor score CHAPTER 8Fairness inEmployment Decisions ." rate for any race, sex, or ethnic group which islessthan four-fifths(4/5)(or eighty percent)of therate forthegroup withthehighestratewillgenerally be regardedbythe Federal enforcement agencies asevidence of adverse impact. whilea greater than fourfifthsratewillgenerally not be regarded by Federal enforcement agenciesas evidence of adverse impact"(p.123). In other words, adverse impact meansthat members of one groupareselectedatsubstantiallygreaterratesthanmembersof anothergroup, Tounderstandwhetherthisisthecase, onethencomparesselectionratiosacross thegroupsunder consideration.Forexample, assumethat the applicantpool consists of 300ethnic minoritiesand 500 nonminorities. Further, assumethat 30 minoritiesare hired, fora selection ration of SRj=30/300 =.10, and that100 nonminorities are hired, foraselectionratio of SRz =100/500 =.20. The adverseimpact ratioisSR/SRz=.50, which is substantially smaller than the suggested .80ratio. Let's consider various scenarios relating differential validity with adverse impact. The ideas for many of the following diagramsarederivedfromBarrett(1967)and representvariouscombinationsof the concepts illustrated inFigure 8-1and 8-2. Figure 8-3isanexampleof adifferentialpredictor-criterionrelationshipthatis legal and appropriate. In this figure, validity for the minority and nonminority groups is equivalent. but theminority group scoresloweronthepredictor anddoespoorer on the job (of course, thesituation could be reversed). Inthisinstance, the very same factorsthatdepresstest scoresmayalsoserveto depress job performance scores, Thus, adverse impact isdefensibleinthiscase, since minorities dopoorer on what the organization considers a relevant and important measure of job success. On the other hand, government regulatory agenciesprobably wouldwant evidencethat the criterion was relevant, important, andnotitselfsubjecttobias.Moreover. alternative criteriathat resultinlessadverseimpactwouldhavetobeconsidered, alongwiththepossibility that some third factor(e.g., length of service)did not causethe observed differencein job performance(Byham &Spitzer,1971). Anadditionalpossibility. showninFigure 8-4, isapredictorthat isvalidforthe combined group, but invalidforeach group separately. Infact, there are several situations wherethe validity coefficient iszero or near zero foreach of the groups, but the validity coefficient inboth groups combined is moderate or even large (Ree, Carretta, &Earles,1999).Inmostcaseswhereno validityexistsforeither groupindividually. errors inselectionwouldresultfromusingthepredictorwithoutvalidation or from '! co.. IA " E c'"

.f FIGURE 8-SV'-\lidwithadverse i\l1p"et. RejectAccept Predjctor score ..A.ppliedPsychology inHumanResourceManagement >(;c:otJ

;;'5 (J) '" "" c: E'" 8 " '" 1;;0.. " c::::>1 FIGURE 8-4 '" Accept for entjIl1'invalidfoI' . elWhPredictorSCOre failingtotestfordifferentialvalidityinthefirstplace. Thepredictorinthiscase becomes solelyacrude measureofthegroupingvariable(e.g..ethnicity)(Bartlett& O'Leary, 1969). Thisisthemostclear-cut caseof usingselection measurestodiscriminateinterms of race. sex. or anyother unlawfulbasis.Moreover, itisunethicaltouse a selection device that has notbeen validated (see Appendix A). It also ispossibleto demonstrate equal validity inthetwogroups combined with unequalpredictor meansor criterion meansand thepresenceorabsenceof adverse impact. These situations. presented inFigures 8-5and 8-6. highlightthe need toexamine differential prediction. as wellasdifferential validity. InFigure 8-5. members of theminority group would notbeaslikely tobe selected, even thoughtheprobability of success onthe job forthetwo groups is essentially equaL Under these conditions. analterative strategy isto use separate cut scores in each group basedonpredictorperformance,whiletheexpectancyof job performancesuccess remains equaL lhus. aHispanic candidate withascureof 65on aninterview mayhave a75percent chance of success onthejob. Awhite candidate witha scoreof 75might have the same 75percent probability of success on the job. Although this situation might appear disturbinginitially,remember thatthepredictor(e.g..aselectioninterview)is beingusedsimplyasavehicletoforecastthelikelihoodof successful job performance.

o c: 1:>o]S(J) '5'" '" "c: 8'"E " .;;'"

'" '" 51! RejectAccept FIGURE 8-5..Equal validity,. unelv:1alpredictor means.Predictor score CHAPTER 8Fairness inEmployment Decisions 11" >(; c: tJ .g 't; '" (J)" 1l>c:(; '"tJ .g't; '"

c::::> '"LI ---'_ Reject Predictor sCOre Theprimaryfocusison job performancerather thanonpredictorperformance.Even thoughinterview scoresmaymeandifferentthingsfordifferentgroups. aslong asthe expectancyof success on the job isequal forthetwo(or more) groups, the use of separate cutscoresisjustified. Indeed. thereporting of an expectancy score for each candidateisoncrecommendationmadebyaNational Academyof Sciencespanelwith respect totheinterpretation of scores on the General Aptitude Test Battery (Hartigan & Wigdor, 1989). A legalcaveat exists, however. Inthe United States, itisillegal to use different selection rulesfor identifiable groups in some contexts (Sackett &Wilk,1994). Figure 8-6depictsa situationwhere, although there isno noticeabledifferencein predictor scores, nonminurity group memberstendtoperform betteron the jobthan minoritygroup members(or viceversa).If predictionswerebasedonthecombined sample, the resultwould bea systematic underprediction for nonminoritiesanda systematic overprediction forminorities. although there isno adverse impact. Thus, inthis situation. the failureto usedifferent selection rules(which would yieldmore accurate prediction forboth groups) may put minority persons in jobs where their probability of success is lowand where their resulting performance only provides additional evidence that helps maintain prejudice (Bartlett &O'Leary, 1969). Thenonminority individuals also suffer. If atest isused as aplacement device, forexample, since nonminority performance issystematically underpredicted, these individuals may well be placed in jobs that do not make thefullestuse of their talents. InFigure8-7, nodifferencesbetweenthegroupsexisteither onpredictororon criterion scores: yetthe predictor hasvalidity only forthe nonminority group. Hence, theselectionmeasure shouldbeused onlywiththenonminority group.sincethe job performance of minorities cannot bepredicted accurately. If the measure were used to selectboth minorityand nonminorityapplicants, noadverseimpactwouldbefound, sinceapproximatelythesameproportionof applicantswouldbehiredfromeach group.However, more nonminority memberswould succced on the job. thereby reinforcingpast stereotypes about minority groups and hindering futureattempts at equal employment opportunity. Inour finalexample (see Figure 8-8). the two groups differ in mean criterion performance aswellasin validity. The predictor might be usedto select nonminority applicants, but should not be used to select minority applicants. Moreover, the cut score or decision ruleusedtoselect nonminorityapplicantsmustbederived solelyfromthe nonminority .'; -.." ;K," ,; ,.. ,';.;L&U3 J +' Applied Psychology inHuman Resource Management ~Bc:g~ " ~ '" ." ~" ">c:" o coE'U1o ~I 't . ~a.." '" c:=> II RejectAccept Predictor score ~o c: 'U."o~"E "~ I jIY">c:" o coE'Uo ~If't . ~'" c:::> II RejectAccept Predictor score group, not fromthe combined group. If the minority group (for whom the predictor is not valid)isincluded, overall validity willbe lowered, as willthe overall mean criterion score. Predictions willbe less accurate because the standard error of estimate willbe inflated. As in the previous example, theorganization shouldusetheselectionmeasureonly forthe nonminority group(takingintoaccountthe caveat aboveaboutlegalstandards)while' continuing to search for a predictor that accurately forecasts minority job performance. In summary, numerous possibilities exist whenheterogeneous groups are combined in making predictions. When differential validity exists, theuse of a single regression line, cut score, or decision rule can lead to serious errors in prediction. While one legitimately mayquestiontheuseof raceor gender asavariablein selection, theproblemisreally one of distinguishing betweenperformance on the selection measureand performance on the job (Guion,[965). If (hebasis for hiring isexpected job performance and if different selectionrulesareusedtoimprovethepredictionof expected job performance ratherthantodiscriminateon thebasisof race,gender, and soon, thenthisprocedure appears both legal and appropriate. Nevertheless, the implementation of differenrial sys(emsisdifficult inpractice because the fairness of any procedure that uses different standards for different groups islikely to be viewedwithsuspicion ("More," 1989). CHAPTER 8Fairness in Employment Decisions ut Differentia! Validity: The Evidence Letusbe clear at the outset that evidence of differential validity provides information only on whether a selection deviceshould be used to make comparisons within groups. Evidenceof unfairdiscriminationbetweensubgroups cannotbeinferredfromdifferences in validity alone: mean job performance also must be considered. In other words, aselection procedure may be fairand yet predict performanceinaccurately, or it may discriminate unfairly and yet predict performance within a given subgroup with appreciable accuracy (Kirkpatrick, Ewen, Barrett, &Katzell, 1968). In discussingdifferentialvalidity, wemustfirstspecifythe criteriaunder which differentialvaliditycanbesaidtoexistatall.Thus,Boehm(1972)distinguished betweendifferentialandsingle-groupvalidity.Differentialvalidityexistswhen (1)thereisasignificantdifferencebetween the validitycoefficients obtained fortwo subgroups(e.g" ethnicityor gender)and(2)the correlations foundinoneorboth of these groups are significantly different from zero. Related to, but different from differentialvalidityissingle-groupvalidity, inwhichagivenpredictorexhibitsvaliditysignificantly different fromzero forone group only and there isno significant difference between the two validity coefficients. Humphreys(1973)haspointed outthatsingle-group validityisnotequivalentto differentialvalidity, norcan it beviewedasameansof assessingdifferentialvalidity. The logic underlying this distinction isclear: To determinewhether two correlations differ from each other, they mustbe compared directly with each other. Inaddition, aserious statistical flaw in the single-group validity paradigm isthat the sample size is typically smaller fortheminoritygroup, whichreducesthechancesthat astatistically significant validitycoefficientwillbefoundinthisgroup. Thus,theappropriatestatisticaltestis a test of the null hypothesis of zero difference between the sample-based estimates of the population validity coefficients. However, statistical power islow for such a test, and this makes a TypeIIerror (i.e., not rejecting the nullhypothesis whenitisfalse)more likely. Therefore, theresearcherwhounwisely doesnot compute statisticalpowerandplans research accordingly islikely to err on the side of too fewdifferences. For example, if the true validities inthe populations to be compared are .50 and .30, but both are attenuated by acriterionwithareliabilityof .7,then evenwithout anyrangerestrictionat all, one must have 528 persons ineach groupto yield a90percent chanceof detecting the existing differential validity at alpha =.05(for more on this, see Trattner &O'Leary, 1980). Thesamplesizestypicallyusedinanyonestudyare,therefore,inadequateto provideameaningfultest of thedifferentialvalidityhypothesis.However, higher statistical power ispossible if validity coefficients are cumulated across studies, which can be done using meta-analysis (as discussed inChapter 7). The bulk of the evidence suggeststhat statisticallysignificantdifferentialvalidityistheexceptionratherthanthe rule(Schmidt,1988: Schmidt&Hunter,1981: Wigdor&Garner,1982).Inacomprehensive review and analysis of 866black-white employment test validity pairs, Hunter, Schmidt. and Hunter (1979) concluded that findings of apparent differential validity in samples areproduced by theoperation of chance anda number of statisticalartifacts. True differential validity probably does not exist. In audition, no support was found for (he suggestionbyBoehm (l972)andBrayandMoses(1972)that findingsof validity differencesbyraceare associated with the use of subjective criteria (ratings, rankings, etc.) and that validity differences seldom occur when more objective cri teria are used. .. lira '.'" Applied Psychology inHuman ResourceManagement Similar analysesof 1,337pairsof validitycoefficientsfromemployment andeducationaltestsforHispanic Americansshowednoevidenceofdifferentialvalidity (Schmidt, Pearlman, &Hunter, 1980). Differentialvalidityformalesand femalesalso hasbeenexamined,Schmitt Mellon, andBylenga(1978)examined6,219pairsof validitycoefficientsformalesandfemales(predominantlydealingwitheducational outcomes)andfoundthat validity coefficientsforfemaleswereslightly,05 correlationunits),butsignificantlylargerthancoefficientsformales. Validitiesformales exceeded thosefor femalesonly when predictors were less cognitive innature, such as highschoolexperiencevariables.Schmittetat.(1978)concluded: "Themagnitudeof thedifferencebetweenmaleand femalevaliditiesisverysmallandmaymakeonly trivial differences inmost practical situations"(p.150), In summary, available research evidence indicates that the existence of differential validity in well-controlled studies israre. Adequate controls include large enough samplesizesineachsubgrouptoachievestatisticalpowerof atleast.80;selection of predictors basedontheirlogicalrelevancetothecriterionbehaviortobe predicted; unbiased, relevant, andreliable criteria; and cross-validationof results, ASSESSING DIFFERENTIAL PREDICTION AND MODERATOR VARIABLES The possibility of predictive bias in selection procedures is a central issue in any discussion of fairnessandequal employment opportunity (EEO). Aswenoted earlier, theseissues requirea consideration of theequivalence of prediction systemsfordifferentgroups. Analyses of possible differences in slopes or intercepts in subgroup regression lines result inmore thorough investigations of predictive bias than does analysis of differential validity alone because the overall regression line determines how a test isused for prediction. Lack of differential validity, inand of itself, does notassurelack of predictive bias, SpecificallytheStandards(AERA, APA,&NCME,1999)note: "Whenempirical studiesof differentialprediction of acriterion formembersof differentgroupsare conducted, theyshouldincluderegressionequations(oranappropriateequivalent) computed separately for each group or treatment under consideration or an analysis in which the group or treatment variables are entered asmoderator variables" (Standard 7,6,p,82),Inother words,whenthereisdifferentialpredictionbasedonagrouping variablesuchasgender or ethnicity,thisgroupingvariableiscalledamoderator. Similarly, the1978UniformGuidelines onEmployee SelectionProcedures(Ledvinka, 1979)adopt what isknown asthe Cleary (1968)model of fairness: A test isbiased for members of a subgroup of the population if, in the prediction 1of acriterion forwhichthetest wasdesigned, consistent nonzeroerrorsof prediction are made formembers of thesubgroup. Inother words, the test isbiased ifthe criterion scorepredicted fromthe commonregressionlineisconsistently toohighor too lowformembersof the subgroup. Withthisdefinitionof bias, there may be a connotation of "unfair,"particularly if the use of thetest produces apredictionthatistoolow.Ifthetestisusedforselection, membersof a subgroup may be rejected when they were capable of adequate performance. (p.l15) CHAPTER 8FairnessinEmployment DecisionsI,," InFigure8-3,althoughtherearetwoseparateellipses,onefortheminority groupandoneforthenonminority. asingleregressionlinemaybecastforboth groups. Sothistestwoulddemonstratelackof differentialpredictionorpredictive bias.In Figure 8-6. however. themanner inwhich the position of theregression line iscomputedclearlydoesmakeadifference.If asingleregressionlineiscastfor bothgroups(assuming theyareequalinsize), criterion scoresforthenonminority group consistently willbeullderpredicted, whilethose of the minority group consistently willbeo)ierpredicted. In this situation, there isdifferential prediction, andthe use of a single regression line isinappropriate, but itisthenonminority group that is affectedadversely.Whiletheslopesofthetworegressionlinesareparallel, theinterceptsaredifferent. Therefore,thesamepredictorscorehasadifferent predictivemeaninginthetwogroups.AthirdsituationispresentedinFigure8-8. Heretheslopesarenotparallel. Aswenotedearlier, thepredictorclearlyisinappropriate forthe minority group inthis situation. When theregression lines arenot parallel, thepredictedperformancescoresdifferforindividualswithidenticaltest scores. Under thesecircumstances. onceitisdetermined wheretheregressionlines cross, the amount of over- or underprediction depends on the position of apredictor score in its distribution. So far, wehave discussedthe issue of differential prediction graphically. However, amoreformalstatisticalprocedureisavailable.AsnotedinPrinciplesJorthe Validation and UseoJ Personnel SelectionProcedures (SlOP, 2003), "testing for predictivebias involves using moderated multipleregression. wherethe criterion measure is regressedonthepredictorscore,subgroupmembership,andaninteractionterm between the two" (p. 32).In symbols, and assuming differential prediction istestedfor twogroups(e.g..minorityandnonminority),themoderatedmultipleregression (MMR) model isthe following: A(81)Y=a + b1X + b2Z+ whereY Aisthe predicted value for the criterionY. a isthe least-squares estimate of the intercept, b istheleast-squaresestimateofthepopulationregressioncoefficientfor the predictor jX, bis the least-squares estimate of the population regression coefficient 2forthemoderator Z, andb isthe least-squares estimateof the population regression 3 coefficient for the product term, whichcarries information about the moderating effect ofZ(Aguinis, 2004b). The Zisacategoricalvariablethat representsthe binary subgrouping variable under consideration. MMR can also be usedforsituations involving more than twogroups(e.g.. three categories based on ethnicity). Todo so, it is necessary to include k- 2 Zvariables (or code variables)inthe model. where kisthe number of groupsbeing compared (see Aguinis, 2004b for details). Aguinis(2004b)describedtheMMRprocedureindetail. covering suchissuesas the impact of using dummy coding (e.g.. minority:I, nonminority: 0)versus other types of coding on the interpretation of results. Assuming dummy coding isused, the statisticalsignificanceof b ,whichteststhenullhypothesisthatr33 = D,indicateswhether3theslopeof the criteriononthepredictor differsacrossgroups. The statisticalsignificanceof b whichteststhenullhypothesisthat132 = 0,teststhenullhypothesisthat 2groupsdifferregarding the intercept. Alternatively, one can testwhether the addition oftheproducttermtoanequation, includingthefirst-ordereffectsofXandZ, .... "--,,, 't:C:i -- -- ':c;;d1.?iW;;;;:lth,' 3-%-." .'i!..'.' \-,.. .. ">-./"JOB ANALYSIS WORKSHEET NAME OF EMPLOYEEDATE: CLASSIFICATION:ANALYST: DEPARTMENT:DIVISION: LENGTH OFTlME IN JOB:LENGTH OFTlME WITH ORGANIZATION: Adescription of whatthe classificalion dUlles currently areand whatis actually ncededtodo thejob. NoIndications need bemade of experiences, abilities, or training acquired afler employment 1. Generalsummary of Job(primary dUlies): 2. Joblasks (tasks with X in frontindIcate observed duti..::s: useactual examples. indkate frequency. consequences of ..::rror (O-IO). difficulty ((J-1O),training received. supervlSlon). 3.How detailed are assignments? Describethe formworkcomes indecisionsthat herage salary method:Becausemost estimates of SOy seem to t1uctuatebetween 40and 70 percent of mean salary. 40percent of meansalary canbeused asa low(i.e"conservative) estimate forSOy. and70percent of meansalary canbeusedasa high(i,e., liberal) estImate (Schmidt &Hunter.1983). Subsequent workbyHunter. Schmidt. and Judiesch( [990) demonstratedthat thesefiguresarenotfixed.and, instead, they covary with job complexity (the information-processing requirements of jobs). Cascio-Ramos estimate ofperformance in dollars (CREPID): Thismethod involves decomposing a job into itskeytasks. weighting thesetasksbyimportance. 37A Applied Psychology inHuman Resource Management ModelUtility IndexData RequirementsDistinctive Assumptions Taylor-RussellIncrease inpercentageValidity, baserate.All selectees classified (1939)successfulinselection ratioeither as successful selected groupor unsuccessfuL Naylor-ShineIncrease inmeanValidity. selectionEqual criterion (1965)criterion scoreofratioperformance byall selected groupmembers of each group: cost of selection =$0. Brogden-Cronbach- Increase indollarValidity. selctionValiditylinearly Gieser (1965)payoff of selectedratio. criterionrelated to utility: groupstandard deviationcost of selection =$0. in dollars Note: All threea validity coefflClent based on presenl employees (L'Onrurrentvalidity). Source: C1.rciu, WF. Responding to die demand for acwunwbility. Acnlicu! ana(v.w; of threeutilil)! models. OrganizatmalBehavior andHuman Performance, 1980, 25, 32-45. Copyright 1980 WIth permisSIOn fromElseVIerand computing the "relative worth" of each taskbymultiplying the weights by average salary(Cascio &Ramos, 1986). TI,en performance data fromeach employeeareused to multiplythe rating obtained for each taskbytherelative worth of thattask,Finally, these numbers areadded together to produce the "total worth" of each employee. andthe distribution of allthetotal-worth scores isusedto obtain SDy. Refinements of this procedurehavealso been proposed (Edwards. Frederick, &Burke,1988; Orr, Sackett. &Mercer, 1989). Superior equivalents and systemeffectiveness techniques: These methods consider the changesinthenumbers and performance levels of system unitsthat lead toincreascd aggregateperformance (Eaton. Wing, &MitchelLt985), The superior equivalents techniqueconsists of estimating how many superior (85th percentile) performers wouldbe neededto produce the output of a fixednumber of average (50thpercentile) performers. The system effectiveness technique isbased on the