measurement mana 4328 dr. jeanne michalski [email protected]
TRANSCRIPT
Employment Tests
Employment Test An objective and standardized measure of a sample
of behavior that is used to gauge a person’s knowledge, skills, abilities, and other characteristics (KSAOs) in relation to other individuals.
Pre-employment testing hasthe potential for lawsuits.
Classification of Employment Tests
Cognitive Ability Tests Aptitude tests
Measures of a person’s capacity to learn or acquire skills.
Achievement tests Measures of what a person knows or can do right
now. Personality and Interest Inventories
“Big Five” personality factors: Extroversion, agreeableness, conscientiousness,
neuroticism, openness to experience.
Classification of Employment Tests (cont’d) Physical Ability Tests
Must be related to the essential functions of job. Job Knowledge Tests
An achievement test that measures a person’s level of understanding about a particular job.
Work Sample Tests Require the applicant to perform tasks that are
actually a part of the work required on the job.
Reliability: Basic Concepts
Observed score = true score + error Error is anything that impacts test scores that is not
the characteristic being measured Reliability measures error
Lower the error the better the measure Things that can be observed are easier to measure
than things that are inferred
Basic Concepts of Measurement
1. Variability and comparing test scores Mean / Standard Deviation
2. Correlation coefficients
3. Standard Error of Measurement
4. The Normal Curve Many people taking a test Z scores and Percentiles
EEOC Uniform Guidelines
Reliability – consistency of the measureIf the same person takes the test again will he/she earn the
same score?
Potential contaminations: Test takers physical or mental state Environmental factors Test forms Multiple raters
Reliability Test Methods
Test – retest Alternate or parallel form Inter-rater Internal consistency
Methods of calculating correlations between test items, administrations, or scoring.
Correlation
How strongly are two variables related? Correlation coefficient (r) Ranges from -1.00 to 1.00 Shared variation = r2
If two variables are correlated at r =.6 then they share .62 or 36% of the total variance.
Illustrated using scatter plots Used to test consistency and accuracy of measure
Correlation Scatterplots
Figure 5.3
Summary of Types of Reliability
Compare scores within T1
Compare Scores across T1 and T2
Objective Measures
(Test items)
Internal Consistency or
Alternate FormTest-retest
Subjective Ratings
Interrater –
Compare different Raters
Intrarater –
Compare same Rater different times
Standard Error of Measure (SEM)
Estimate of the potential error for an individual test score
Uses variability AND reliability to establish a confidence interval around a score
95% Confidence Interval (CI) means if one person took the test 100 times, 95 of the scores will fall within the upper and lower bounds.
SEM = SD * √ (1- reliability)
There is a 5% chance that scores observed outside the CI are due to chance, therefore the differences are “significant”.
Standard Error of Measure (SEM)
SEM = SD * √ (1- reliability)
Assume a mathematical ability test has a reliability of .9 and a standard deviation of 10:
SEM = 10 * √ (1- .9) = 3.16
If an applicant scores a 50, the SEM is the degree to which the score would vary if she were retested on another day.
Plus or minus 2 SEM gives you a ~95% confidence interval.
50 + 2(3.16) = 56.32
50 – 2(3.16) = 43.68
Standard Error of Measure
If an applicant scores 2 points above a passing score and the SEM is 3.16 – then there is a good chance of making a bad selection choice.
If two applicants score within 2 points of one another and the SEM is 3.16 then it is possible that the difference is due to chance.
Standard Error of Measure
The higher the reliability, the lower the SEM
Std. Dev. r SEM
10 .96 2
10 .84 4
10 .75 5
10 .51 7
Confidence Intervals
Jim -- 40 Mary -- 50 Jen -- 60
SEM -2 SEM
+2 SEM
-2 SEM
+2 SEM
-2 SEM
+2 SEM
2 36 44 46 54 56 64
4 32 48 42 58 52 68
Do the applicants differ when SEM = 2?Do the applicants differ when SEM = 4?
Validity
Accuracy of the measure
Are you measuring what you intend to measure?
OR
Does the test measure a characteristic related to job performance?
Types of test validity Criterion – test predicts job performance
Predictive or Concurrent
Content – test representative of the job
Approaches to Validation
Content validity The extent to which a selection instrument, such as a
test, adequately samples the knowledge and skills needed to perform a particular job. Example: typing tests, driver’s license examinations,
work sample Construct validity
The extent to which a selection tool measures a theoretical construct or trait. Example: creative arts tests, honesty tests
Approaches to Validation
Criterion-related Validity The extent to which a selection tool predicts, or
significantly correlates with, important elements of work behavior. A high score indicates high job performance potential; a
low score is predictive of low job performance. Two types of Criterion-related validity
Concurrent Validity Predictive Validity
Approaches to Validation
Concurrent Validity The extent to which test scores (or other predictor
information) match criterion data obtained at about the same time from current employees. High or low test scores for employees match their respective
job performance. Predictive Validity
The extent to which applicants’ test scores match criterion data obtained from those applicants/ employees after they have been on the job for some indefinite period. A high or low test score at hiring predicts high or low job
performance at a point in time after hiring.
Tests of Criterion-Related Validity
Predictive validity
“Future Employee or Follow-up Method”
Test Applicants Performance of Hires
Time 1 6-12 mos. Time 2
Concurrent validity
“Present Employee Method”
Test Existing Employee AND Measure Performance
Time 1
Types of Validity
Job Duties
KSA’s Selection Tests
Job PerformanceCriterion-Related
Content-Related
Reliability vs. Validity
Validity Coefficients Reject below .11 Very useful above .21 Rarely exceed .40
Reliability Coefficients Reject below .70 Very useful above .90 Rarely approaches 1.00
Why the difference?
More About Comparing Scores
The Normal Curve
-3 -2 -1 0 +1 +2 +3
.1% 2% 16% 50% 84% 98% 99.9%
Rounded Percentiles
Z Scores
Note: Not to Scale
Variability
How did an individual score compared to others? How to compare scores across different tests?
Test 1 Test 1 Test 2 Test 2
Bob Jim Sue Linda
Raw Score 49 47 49 47
Variability
How did an individual score compared to others? How to compare scores across different tests?
Test 1 Test 1 Test 2 Test 2
Bob Jim Sue Linda
Raw Score 49 47 49 47
Mean 48 48 46 46
Variability
How did an individual score compared to others? How to compare scores across different tests?
Test 1 Test 1 Test 2 Test 2
Bob Jim Sue Linda
Raw Score 49 47 49 47
Mean 48 48 46 46
Std. Dev 2.5 2.5 .80 .80
Score – MeanScore – MeanZ ScoreZ Score = =
Std. DevStd. Dev
Z Score or “Standard” Score
Test 1 Test 1 Test 2 Test 2
Bob Jim Sue Linda
Raw Score 49 47 49 47
Mean 48 48 46 46
Std. Dev 2.5 2.5 .80 .80
Z score .4 -.4 3.75 1.25
The Normal Curve
Note: Not to Scale
Jim Bob Linda Sue
Z scores and Percentiles
Look up z scores on a “standard normal table” Corresponds to proportion of
area under normal curve
Linda has z score of 1.25 Standard normal table
= .9265 Percentile score of 92.65% Linda scored better than
92.65% of test takers
Z score Percentile
3.0 99.9%
2.0 97.7%
1.0 84.1%
0.0 50.0%
-1.0 15.9%
-2.0 2.3%
-3.0 .1%
Proportion Under the Normal Curve
Note: Not to Scale
Jim Bob Linda Sue