physical ability testing and practical examinations: they fought the law and the law won nikki...
Post on 31-Mar-2015
215 Views
Preview:
TRANSCRIPT
Physical Ability Testing and Practical Examinations:
They Fought the Law and the Law
Won
Nikki Shepherd Eatchel, M.A. Robin Rome, Esq.
Vice President, Test Development Vice President, Legal and Contracts
Thomson Prometric Thomson Prometric
2006 Annual Conference
Alexandria, Virginia
Council on Licensure, Enforcement and Regulation
Expect the Unexpected: Are We Clearly Prepared?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability Testing and Practical Exams
Goals for today’s presentation:
• Outline the major risk factors for physical ability and practical examinations
• Recommend specific developmental activities and other measures that will help withstand a legal challenge
• Provide recommendations for evaluating exams developed by you or for you
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity
Although all employment, certification, and
licensure testing is certainly open to challenge,
exams designed to physically assess a candidate’s
performance on specific job skills and tasks are often
more vulnerable to challenge than objective written
exams.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity
Examples of physical ability and practical exams:
• Firefighter certification
• Police officer pre-employment
• Nursing practical for licensure
• Corporate product certification
• Food safety practical for licensure
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity
Why are physical ability and practical exams more vulnerable to challenge?
• Reliance on exam rater judgments regarding how a task was performed introduces the possibility of error in the assessment of the skill or task (human error)
• Often, when only one rater is used to assess a candidate, there is increased likelihood of disagreement between the rater and the candidate
• Physical ability exams typically have greater adverse impact upon protected groups than the written exams involved in a employment, certification, or licensure process (though practical exams do not tend to show the same pattern)
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Standards Used For Exam Evaluation
There are two set of standards that are often used to guide the development and evaluation of exams, they are as follows:
Standards for Educational and Psychological Testing, 1999Developed jointly by the American Educational Research
Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME)
Uniform Guidelines on Employee Selection Procedures, 1978Developed by the Equal Employment Opportunity
Commission (EEOC)
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Standards Used For Exam Evaluation
Although both sets of standards contain valuable information regarding the development process (and both should be considered when developing a testing program), courts more frequently refer to the Uniform Guidelines as the resource for evaluating exams.
The Uniform Guidelines are “entitled to great deference” by courts deciding whether selection devices such as physical ability or practical tests comply with Title VII.
Griggs v. Duke Power Co., 401 U.S. 424 , 434
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity
What are the aspects of an examination that are most
likely to be scrutinized if the validity of a physical
ability or practical exam is challenged ?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Challenges to Validity
Job Analysis
Criterion-Related Validity
Cutscore
Rater Training
Candidate Appeal Process
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Job Analysis
A job analysis is crucial in establishing that the content of the physical ability or practical exam is valid. Key components of the job analysis include:
Content Validity
Validity Generalization
Adequate and Diverse Sample Sizes
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Content Validity
Although there are multiple validity methods thatcan be used during the test development process, the foundation for acceptable development practice continues to reside with traditional content validity methods.
Supplemental validity methods are typically seen as beneficial, yet not sufficient, when courts evaluate testing processes.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Content Validity
When evidence of validity based on test content is presented, the rationale for defining and describing a specific job content domain in a particular way (e.g., in terms of task to be performed or knowledge, skills, abilities, or other personal characteristics) should be stated clearly.
Standard 14.9
A job analysis is necessary to identify the knowledge, skills and abilities necessary for successful job performance.
A selection procedure can be supported by a content validity strategy to the extent that it is a representative sample of the content of the job.
Guidelines, 29 CFR 1607.14(C)(1)
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Content ValidityCase Study
Williams v. Ford
Facts
• Class action claiming that pre-employment test for unskilled hourly production workers, Hourly Selection System Test Battery (HSSTB), discriminated against African Americans.
• Physical/practical parts of HSSTB measured parts assembly, visual speed and accuracy and precision/manual dexterity.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Content ValidityCase Study
Williams v. Ford (cont’d)
Plaintiff’s Position
Disparate impact discrimination, i.e., African Americans failed or scored lower on the test in disproportionately high numbers when compared to whites.
HSSBT was not content valid because the job analysis failed to demonstrate a clear linkage of specific requirements.
Ford’s Position
HSSTB was content valid as supported by a job analysis.
Job analysis consisted of:
– Supervisor identification of job inventories– Supervisor rating of importance of job
requirements and job abilities identified in the inventories
– Analysis of reliability ratings and data to identify key job requirements
– Development of test to measure skills needed to perform the job requirements rated as “important”
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Content ValidityCase Study
Williams v. Ford (cont’d)
Holding
Ford demonstrated that the HSSTB was content valid.
Reasoning
• Ford had the burden of showing that the HSSBT was job related:
“[Must show] by professionally acceptable methods, [that the test is] predictive or significantly correlated with important elements of work behavior that comprise or are relevant to the job or jobs for which the candidates are being evaluated.” Williams v. Ford, 187 F.3d 533, 539 (6th Cir. 1999).
• Ford met this burden by showing that the HSSTB was content valid – It used a professional test developer to conduct a job analysis that complied with the EEOC Guidelines.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Validity Generalization
An issue often referred to in test development is validity
generalization. Validity generalization is defined as:
“Applying validity evidence obtained in one or more situations to other similar situations on the basis of simultaneous estimation, meta-analysis, or synthetic validation arguments.”
Standards, 1999, p. 184
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Validity Generalization
Transfer of validity work from one demographic and/or geographic
area to another, while certainly possible when based on good initial
validity work and a clear delineation of the original and secondary
populations, has not been well received by courts as a defensible practice.
This has typically been due to lack of appropriate documentation
regarding the similarity of both the populations involved with the
generalization and the interpretations resulting from the instrument.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Validity GeneralizationCase Study
Legault v. aRusso
Facts
• Challenge to physical abilities tests used to select fire department recruits.
• Selection process included a four-part pass/fail physical abilities test involving climbing a ladder, moving a ladder from a fire engine, running 1.5 miles in 12 minutes, and carrying and pulling a fire hose. It also included a separate physical abilities test focusing on a balance beam, second hose pull and obstacle course.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis - Validity GeneralizationCase Study
Legault v. aRusso (cont’d)
Holding
Fire department failed to show the physical abilities tests were job related.
Reasoning
• The job analysis relied on by the fire department was not temporal or specific:
- Validity was not supported by a “several-year-old job specification that describe[d] the firefighter’s general duties.” Legault v. aRusso, 842 F. Supp. 1479, 1488 (D.N.H. 1994)
- Validity was not supported by a specification identifying only general tasks (e.g., “strenuous physical exertion,” “operating equipment and appurtenances of heavy
apparatus,” etc.). The specification also failed to break these tasks into component skills, assess their relative importance or indicate the level of proficiency required.
• The physical abilities tests were not valid simply because they were similar to those used by other cities - There was no evidence these similar tests were validated and “follow the leader is not an acceptable means of test validation.” Legault, 842 F. Supp. at 1488.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample Sizes
Adequate and diverse sample sizes are a necessity for ensuring validity and increasing the defensibility of an exam.
“A description of how the research sample compares with the relevant labor market or work force, . . ., and a discussion of the likely effects on validity of differences between the sample and the relevant labor market or work force, are also desirable. Descriptions of educational levels, length of service, and age are also desirable.”
“Whether the study is predictive or concurrent, the sample subjects should insofar as feasible be representative of the candidates normally available in the relevant labor market for the job or group of jobs in question . . .”
Uniform Guidelines
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample SizesCase Study
Blake v. City of Los Angeles
Facts
• Female applicants challenged the police department’s height requirement and physical abilities test.
• Applicants were required to be 5’6’’ and to pass a physical abilities test including scaling a wall, hanging, weight dragging and endurance within specific parameters.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample SizesCase Study
Blake v. City of Los Angeles (cont’d)
Plaintiffs’ Position
Challenged the methodology and findings of validation studies presented by the City.
The validation studies relating to the height requirement did not include the individuals whom the police department was seeking to reject, i.e., those under 5’6.”
The validation studies relating to the physical abilities test did not include those who failed the test and tested only success during academy training, not success on the job.
The City’s Position
The height requirement was job related – Offered validation studies correlating height to performance:
– Questionnaire showing that taller officers tend to use more force and experience less suspect resistance
– Simulations demonstrating that taller officers performed bar-arm control better than shorter officers
The physical abilities test was job related – Offered validation studies correlating skills tested to measures of success during academy training and on the job requirements (e.g., foot pursuit, field shooting and emergency rescue).
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Adequate and Diverse Sample SizesCase Study
Blake v. City of Los Angeles (cont’d)
Holding
The validation studies did not demonstrate that the height requirement and physical abilities tests were job related.
Reasoning
• The validation studies did not reflect an adequate and diverse sampling.
• The City failed to demonstrate the height requirement was job related because persons shorter than 5’6” were not included in the validation study (the study included
individuals from 5’8” to 6’2”).
• The City failed to demonstrate the physical abilities test was job related because the validation study relied on measures of training success without showing that those measures were significantly related to job performance.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Criterion-Related Validity
When possible, the collection of criterion-related validity is extremely helpful in the defense of a physical ability test or practical exam.
A criterion-related study “should consist of empirical data demonstrating that the selection procedure is predictive of or significantly correlated with correlated with important elements of performance.”
Guidelines, 29 CFR 1607.5(B)
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Criterion-Related Validity
The goal of criterion-related validity is to show a significant relationship between how candidates perform on an exam and how they subsequently perform on the job (with higher scores resulting in better performance).
This can be accomplished through the use of concurrent or predictive criterion-related validity.
– Job Ratings– Promotional Exams– Etc.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Criterion-Related ValidityCase Study
Zamlen v. City of Cleveland
Facts
• Female plaintiffs challenged the rank-order and physical abilities selection examination for firefighters.
• The physical abilities test required three skills: overhead lift using barbells, fire scene set up and tower climb and dummy drag.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Criterion-Related ValidityCase Study
Zamlen v. City of Cleveland (cont’d)
Plaintiff’s Position
The physical abilities test did not test for attributes identified in the City’s job analysis as important to an effective firefighter.
The test measured attributes in which men traditionally excel, such as speed and strength (anaerobic traits), and ignored attributes in which women traditionally excel, such as stamina and endurance (aerobic traits).
The City’s Position
The test was created by a psychologist with significant experience developing tests for municipalities.
The physical abilities test measured attributes related to specific job skills.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Job Analysis – Criterion-Related ValidityCase Study
Zamlen v. City of Cleveland (cont’d)
Holding
The physical abilities test was valid since it was based on a criterion-related study.
Reasoning
• Referred to an earlier case, Berkman v. City of New York, 812 F.2d 52 (2d Cir. 1987), in which the court held that although aerobic attributes are an important component of firefighting, the City’s failure to include physical ability events that tested for such attributes did not invalidate the examination.
• Given the extensive job analysis performed, “although a simulated firefighting examination that does not test for stamina in addition to anaerobic capacity may be a less effective baromoter of firefighting abilities than one that does include an aerobic component, the deficiencies of this examination are not of the magnitude to render it defective, and vulnerable to a Title VII challenge.” Zamlen, 906 F.2d 209, 219 (6th Cir. 1990).
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Cut Score
The setting of an examination cut score is perhaps the most controversial step within the test development process, as it is this step that has the most obvious impact on the candidate population.
The Uniform Guidelines state the following in regard to the determination of the cut score:
“Where cutoff scores are used, they should normally be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.”
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Cutscore Case Study
Lanning v. SEPTA
Facts
• Title VII class action challenging SEPTA’s requirement that applicants for the job of transit police officer be able to run 1.5 miles in 12 minutes.
• In prior related cases, it was established that the running requirement was job related. The sole issue before the
court was whether the cut off was valid.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Cutscore Case Study
Lanning v. SEPTA (cont’d)
Holding
The cut off established by SEPTA was valid.
Reasoning
• The court looked at whether the cut off measured the minimum qualifications necessary for the successful performance of a transit police officer.
• Studies introduced by SEPTA showed a statistical link between the success on the run test and the performance of identified job standards - Individuals who passed the run test had a success rate of 70% to 90% and individuals who failed the run test had a
success rate of 5% to 20%.
• The court emphasized that the cut off does not need to reflect a 100% rate of success, but there should be a showing of why the cut off is an objective measure of the
minimum qualifications for successful performance.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
A Good Defense
Many organizations spend a considerable amount of time and money on the valid and defensible development of a practical exam or a physical ability test.
Surprisingly, after the lofty investment in the development of these exams, some organizations fail to establish appropriate training for raters involved in the administration of the exam.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
A Good Defense
When using practical exams or physical ability tests,
there are two aspects of the testing program that,
when well established, can reduce the likelihood of a
challenge:
1. Rater Training
2. Candidate Appeal Process
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training
Proper rater training is key in minimizing
challenges to a practical exam/physical ability
test.
Standardized training materials and sessions Inter-Rater and Intra-Rater Reliability StudiesFollow up training
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Standardized Materials
Practical exams and physical ability tests rely on examination raters to identify whether or not a candidate performed the activity or event appropriately.
One way to reduce challenges to this type of exam is to have a robust training program that is required of all raters on a regular basis.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Standardized Materials
Standardized materials can include the following components:
1. Train the Trainer Manual/Materials
2. Examination Rater Manual
3. Examination Rater Video
4. Rater Checklist
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Rater Reliability
“When subjective judgment enters into test scoring, evidence should be provided on both inter-rater consistency in scoring and within-examinee consistency over repeated measurements.”
Standard 2.13
1. Does an individual rater apply the testing standards consistently across multiple candidates?
2. Do groups of raters rate the same candidate consistently?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Rater Reliability
Rater Reliability during the training process:
• Part of the rater training process should involve groups of raters rating the same performance, to evaluate whether or not a consistent testing standard is being applied.
• This process should include an opportunity for all raters to discuss outliers and reach consensus about the appropriate standards.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Rater Reliability
Rater Reliability after the training process:
Trends of individual raters should be evaluated to monitor the
consistency of individual raters over time. Although it should be
expected that raters will evaluate candidates differently, it is
possible to review whether raters are consistently shifting over time.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Follow Up Training
There are instances when individuals have developed a valid exam,
appropriately trained their raters, and then experience problems due
to a lack of consistent, follow up training sessions for examination
raters.
Like any other aspect of a testing program, raters should be
evaluated on a regular basis. In addition, raters should be required to
undergo re-training on a periodic basis.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Appeal Process
One aspect of a testing program that should always be considered
during inception is the avenue for candidate feedback and (if
necessary) appeals.
Often, allowing an avenue for candidates to request feedback or
investigation into a exam administration will reduce the likelihood
that the challenge will progress to a legal one.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Appeal Process
Important aspects of a candidate feedback and appeal process:
• Public documentation of the feedback and appeal process
• Clear candidate instructions on the information that should be included in feedback and/or appeal
• Specific timeframes for responses to feedback or appeals
• Designated group of resources to address feedback and appeal issues
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Rater Training – Appeal Process
Developing an avenue for client feedback at the inception of a
program is viewed much more positively by courts than one that
is set up after a challenge to the exam.
Processes developed post-challenge tend to be viewed with an
air of suspicion.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
RecommendationsCase Study
Firefighters United for Fairness v. City of Memphis
Facts
• Class action challenging the practical portion of fire department promotional test.
• Practical portion consisted of a videotaped response to a factual situation presenting problems commonly encountered by fire department lieutenants and battalion chiefs.
• Plaintiffs claimed the practical test violated their due process and equal protection rights under the Fourteenth Amendment.
Holding
The practical test did not violate Plaintiffs’ rights under the Fourteenth Amendment.
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
RecommendationsCase Study
Firefighters United for Fairness v. City of Memphis (cont’d)
Fairness in grading
Court upheld the use of two raters to grade transcripts of practical video components of test using answer key developed by subject matter experts.
According to the court, this system “ensured that the capricious whim of individual assessors would not contribute to any alleged incorrect scores.” Firefighter United for Fairness v. City of Memphis, 362 F. Supp. 2d 963, 972 (W.D. Tenn. 2005).
Fairness in review
City established a multi-level review process:
– Candidates were permitted to review practical video, transcript of practical video, answer key of raters and submit “redlines” citing specific concerns with their tests
– Subject matter experts reviewed the redlines and changed scores to reflect problems inherent in the form, content or grading of the test, where appropriate
Reasoning
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists
Job Analysis
Does the job analysis define the knowledge, skills, and abilities that compose the important and/or critical aspects of the job in question?
Was the job analysis conducted specifically for the job in question?
Is the job analysis current and based on a relevant candidate population?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists
Criterion-Related Validity
If possible, were criterion-related validity studies conducted?
Concurrent Study?
Predictive Study?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists
Cut Score
Was a cut score study conducted with a representative sample of subject-matter experts (e.g., Modified Angoff Technique)?
Has the cut score process been documented?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists
Rater Training
Has a standardized rater training program been established?
Does the rater training include opportunities to ensure rater reliability?
Is follow up training provided on a regular basis?
Is rater data reviewed on a regular basis to identify changes in rating trends?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams: Recommendations and Evaluation Checklists
Candidate Appeal Process
Is there an avenue for candidates to provide feedback or submit an appeal regarding an examination administration?
Is that avenue well documented and publicly available?
Are there designated resources available for addressing feedback and appeals?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Physical Ability and Practical Exams
Questions?
Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia
Speaker Contact Information
Nikki Shepherd Eatchel, M.A.
Vice President, Test Development
Thomson Prometric
1260 Energy Lane
St. Paul, MN 55108
651-603-3396
nikki.eatchel@thomson.com
Robin Rome, Esq.Vice President, Legal and ContractsThomson Prometric2000 Lenox DriveLawrenceville, NJ 08648609-895-5160
robin.rome@thomson.com
top related