physical ability testing and practical examinations: they fought the law and the law won nikki...

Physical Ability Testing and Practical Examinations:

They Fought the Law and the Law

Nikki Shepherd Eatchel, M.A. Robin Rome, Esq.

Vice President, Test Development Vice President, Legal and Contracts

Thomson Prometric Thomson Prometric

2006 Annual Conference

Alexandria, Virginia

Council on Licensure, Enforcement and Regulation

Expect the Unexpected: Are We Clearly Prepared?

Presented at the 2006 CLEAR Annual ConferenceSeptember 14-16 Alexandria, Virginia

Physical Ability Testing and Practical Exams

Goals for today’s presentation:

• Outline the major risk factors for physical ability and practical examinations

• Recommend specific developmental activities and other measures that will help withstand a legal challenge

• Provide recommendations for evaluating exams developed by you or for you

Physical Ability and Practical Exams: Challenges to Validity

Although all employment, certification, and

licensure testing is certainly open to challenge,

exams designed to physically assess a candidate’s

performance on specific job skills and tasks are often

more vulnerable to challenge than objective written

exams.

Examples of physical ability and practical exams:

• Firefighter certification

• Police officer pre-employment

• Nursing practical for licensure

• Corporate product certification

• Food safety practical for licensure

Why are physical ability and practical exams more vulnerable to challenge?

• Reliance on exam rater judgments regarding how a task was performed introduces the possibility of error in the assessment of the skill or task (human error)

• Often, when only one rater is used to assess a candidate, there is increased likelihood of disagreement between the rater and the candidate

• Physical ability exams typically have greater adverse impact upon protected groups than the written exams involved in a employment, certification, or licensure process (though practical exams do not tend to show the same pattern)

Standards Used For Exam Evaluation

There are two set of standards that are often used to guide the development and evaluation of exams, they are as follows:

Standards for Educational and Psychological Testing, 1999Developed jointly by the American Educational Research

Association (AERA), the American Psychological Association (APA), and the National Council on Measurement in Education (NCME)

Uniform Guidelines on Employee Selection Procedures, 1978Developed by the Equal Employment Opportunity

Commission (EEOC)

Standards Used For Exam Evaluation

Although both sets of standards contain valuable information regarding the development process (and both should be considered when developing a testing program), courts more frequently refer to the Uniform Guidelines as the resource for evaluating exams.

The Uniform Guidelines are “entitled to great deference” by courts deciding whether selection devices such as physical ability or practical tests comply with Title VII.

Griggs v. Duke Power Co., 401 U.S. 424 , 434

What are the aspects of an examination that are most

likely to be scrutinized if the validity of a physical

ability or practical exam is challenged ?

Job Analysis

Criterion-Related Validity

Cutscore

Rater Training

Candidate Appeal Process

Physical Ability and Practical Exams: Job Analysis

A job analysis is crucial in establishing that the content of the physical ability or practical exam is valid. Key components of the job analysis include:

Content Validity

Validity Generalization

Adequate and Diverse Sample Sizes

Job Analysis - Content Validity

Although there are multiple validity methods thatcan be used during the test development process, the foundation for acceptable development practice continues to reside with traditional content validity methods.

Supplemental validity methods are typically seen as beneficial, yet not sufficient, when courts evaluate testing processes.

Job Analysis - Content Validity

When evidence of validity based on test content is presented, the rationale for defining and describing a specific job content domain in a particular way (e.g., in terms of task to be performed or knowledge, skills, abilities, or other personal characteristics) should be stated clearly.

Standard 14.9

A job analysis is necessary to identify the knowledge, skills and abilities necessary for successful job performance.

A selection procedure can be supported by a content validity strategy to the extent that it is a representative sample of the content of the job.

Guidelines, 29 CFR 1607.14(C)(1)

Job Analysis - Content ValidityCase Study

Williams v. Ford

• Class action claiming that pre-employment test for unskilled hourly production workers, Hourly Selection System Test Battery (HSSTB), discriminated against African Americans.

• Physical/practical parts of HSSTB measured parts assembly, visual speed and accuracy and precision/manual dexterity.

Williams v. Ford (cont’d)

Plaintiff’s Position

Disparate impact discrimination, i.e., African Americans failed or scored lower on the test in disproportionately high numbers when compared to whites.

HSSBT was not content valid because the job analysis failed to demonstrate a clear linkage of specific requirements.

Ford’s Position

HSSTB was content valid as supported by a job analysis.

Job analysis consisted of:

– Supervisor identification of job inventories– Supervisor rating of importance of job

requirements and job abilities identified in the inventories

– Analysis of reliability ratings and data to identify key job requirements

– Development of test to measure skills needed to perform the job requirements rated as “important”

Williams v. Ford (cont’d)

Holding

Ford demonstrated that the HSSTB was content valid.

Reasoning

• Ford had the burden of showing that the HSSBT was job related:

“[Must show] by professionally acceptable methods, [that the test is] predictive or significantly correlated with important elements of work behavior that comprise or are relevant to the job or jobs for which the candidates are being evaluated.” Williams v. Ford, 187 F.3d 533, 539 (6th Cir. 1999).

• Ford met this burden by showing that the HSSTB was content valid – It used a professional test developer to conduct a job analysis that complied with the EEOC Guidelines.

Job Analysis - Validity Generalization

An issue often referred to in test development is validity

generalization. Validity generalization is defined as:

“Applying validity evidence obtained in one or more situations to other similar situations on the basis of simultaneous estimation, meta-analysis, or synthetic validation arguments.”

Standards, 1999, p. 184

Job Analysis - Validity Generalization

Transfer of validity work from one demographic and/or geographic

area to another, while certainly possible when based on good initial

validity work and a clear delineation of the original and secondary

populations, has not been well received by courts as a defensible practice.

This has typically been due to lack of appropriate documentation

regarding the similarity of both the populations involved with the

generalization and the interpretations resulting from the instrument.

Job Analysis - Validity GeneralizationCase Study

Legault v. aRusso

• Challenge to physical abilities tests used to select fire department recruits.

• Selection process included a four-part pass/fail physical abilities test involving climbing a ladder, moving a ladder from a fire engine, running 1.5 miles in 12 minutes, and carrying and pulling a fire hose. It also included a separate physical abilities test focusing on a balance beam, second hose pull and obstacle course.

Job Analysis - Validity GeneralizationCase Study

Legault v. aRusso (cont’d)

Holding

Fire department failed to show the physical abilities tests were job related.

Reasoning

• The job analysis relied on by the fire department was not temporal or specific:

- Validity was not supported by a “several-year-old job specification that describe[d] the firefighter’s general duties.” Legault v. aRusso, 842 F. Supp. 1479, 1488 (D.N.H. 1994)

- Validity was not supported by a specification identifying only general tasks (e.g., “strenuous physical exertion,” “operating equipment and appurtenances of heavy

apparatus,” etc.). The specification also failed to break these tasks into component skills, assess their relative importance or indicate the level of proficiency required.

• The physical abilities tests were not valid simply because they were similar to those used by other cities - There was no evidence these similar tests were validated and “follow the leader is not an acceptable means of test validation.” Legault, 842 F. Supp. at 1488.

Job Analysis – Adequate and Diverse Sample Sizes

Adequate and diverse sample sizes are a necessity for ensuring validity and increasing the defensibility of an exam.

“A description of how the research sample compares with the relevant labor market or work force, . . ., and a discussion of the likely effects on validity of differences between the sample and the relevant labor market or work force, are also desirable. Descriptions of educational levels, length of service, and age are also desirable.”

“Whether the study is predictive or concurrent, the sample subjects should insofar as feasible be representative of the candidates normally available in the relevant labor market for the job or group of jobs in question . . .”

Uniform Guidelines

Job Analysis – Adequate and Diverse Sample SizesCase Study

Blake v. City of Los Angeles

• Female applicants challenged the police department’s height requirement and physical abilities test.

• Applicants were required to be 5’6’’ and to pass a physical abilities test including scaling a wall, hanging, weight dragging and endurance within specific parameters.

Blake v. City of Los Angeles (cont’d)

Plaintiffs’ Position

Challenged the methodology and findings of validation studies presented by the City.

The validation studies relating to the height requirement did not include the individuals whom the police department was seeking to reject, i.e., those under 5’6.”

The validation studies relating to the physical abilities test did not include those who failed the test and tested only success during academy training, not success on the job.

The City’s Position

The height requirement was job related – Offered validation studies correlating height to performance:

– Questionnaire showing that taller officers tend to use more force and experience less suspect resistance

– Simulations demonstrating that taller officers performed bar-arm control better than shorter officers

The physical abilities test was job related – Offered validation studies correlating skills tested to measures of success during academy training and on the job requirements (e.g., foot pursuit, field shooting and emergency rescue).

Blake v. City of Los Angeles (cont’d)

Holding

The validation studies did not demonstrate that the height requirement and physical abilities tests were job related.

Reasoning

• The validation studies did not reflect an adequate and diverse sampling.

• The City failed to demonstrate the height requirement was job related because persons shorter than 5’6” were not included in the validation study (the study included

individuals from 5’8” to 6’2”).

• The City failed to demonstrate the physical abilities test was job related because the validation study relied on measures of training success without showing that those measures were significantly related to job performance.

When possible, the collection of criterion-related validity is extremely helpful in the defense of a physical ability test or practical exam.

A criterion-related study “should consist of empirical data demonstrating that the selection procedure is predictive of or significantly correlated with correlated with important elements of performance.”

Guidelines, 29 CFR 1607.5(B)

The goal of criterion-related validity is to show a significant relationship between how candidates perform on an exam and how they subsequently perform on the job (with higher scores resulting in better performance).

This can be accomplished through the use of concurrent or predictive criterion-related validity.

– Job Ratings– Promotional Exams– Etc.

Job Analysis – Criterion-Related ValidityCase Study

Zamlen v. City of Cleveland

• Female plaintiffs challenged the rank-order and physical abilities selection examination for firefighters.

• The physical abilities test required three skills: overhead lift using barbells, fire scene set up and tower climb and dummy drag.

Zamlen v. City of Cleveland (cont’d)

Plaintiff’s Position

The physical abilities test did not test for attributes identified in the City’s job analysis as important to an effective firefighter.

The test measured attributes in which men traditionally excel, such as speed and strength (anaerobic traits), and ignored attributes in which women traditionally excel, such as stamina and endurance (aerobic traits).

The City’s Position

The test was created by a psychologist with significant experience developing tests for municipalities.

The physical abilities test measured attributes related to specific job skills.

Zamlen v. City of Cleveland (cont’d)

Holding

The physical abilities test was valid since it was based on a criterion-related study.

Reasoning

• Referred to an earlier case, Berkman v. City of New York, 812 F.2d 52 (2d Cir. 1987), in which the court held that although aerobic attributes are an important component of firefighting, the City’s failure to include physical ability events that tested for such attributes did not invalidate the examination.

• Given the extensive job analysis performed, “although a simulated firefighting examination that does not test for stamina in addition to anaerobic capacity may be a less effective baromoter of firefighting abilities than one that does include an aerobic component, the deficiencies of this examination are not of the magnitude to render it defective, and vulnerable to a Title VII challenge.” Zamlen, 906 F.2d 209, 219 (6th Cir. 1990).

Physical Ability and Practical Exams: Cut Score

The setting of an examination cut score is perhaps the most controversial step within the test development process, as it is this step that has the most obvious impact on the candidate population.

The Uniform Guidelines state the following in regard to the determination of the cut score:

“Where cutoff scores are used, they should normally be set so as to be reasonable and consistent with normal expectations of acceptable proficiency within the work force.”

Cutscore Case Study

Lanning v. SEPTA

• Title VII class action challenging SEPTA’s requirement that applicants for the job of transit police officer be able to run 1.5 miles in 12 minutes.

• In prior related cases, it was established that the running requirement was job related. The sole issue before the

court was whether the cut off was valid.

Cutscore Case Study

Lanning v. SEPTA (cont’d)

Holding

The cut off established by SEPTA was valid.

Reasoning

• The court looked at whether the cut off measured the minimum qualifications necessary for the successful performance of a transit police officer.

• Studies introduced by SEPTA showed a statistical link between the success on the run test and the performance of identified job standards - Individuals who passed the run test had a success rate of 70% to 90% and individuals who failed the run test had a

success rate of 5% to 20%.

• The court emphasized that the cut off does not need to reflect a 100% rate of success, but there should be a showing of why the cut off is an objective measure of the

minimum qualifications for successful performance.

A Good Defense

Many organizations spend a considerable amount of time and money on the valid and defensible development of a practical exam or a physical ability test.

Surprisingly, after the lofty investment in the development of these exams, some organizations fail to establish appropriate training for raters involved in the administration of the exam.

A Good Defense

When using practical exams or physical ability tests,

there are two aspects of the testing program that,

when well established, can reduce the likelihood of a

challenge:

1. Rater Training

2. Candidate Appeal Process

Rater Training

Proper rater training is key in minimizing

challenges to a practical exam/physical ability

Standardized training materials and sessions Inter-Rater and Intra-Rater Reliability StudiesFollow up training

Rater Training – Standardized Materials

Practical exams and physical ability tests rely on examination raters to identify whether or not a candidate performed the activity or event appropriately.

One way to reduce challenges to this type of exam is to have a robust training program that is required of all raters on a regular basis.

Rater Training – Standardized Materials

Standardized materials can include the following components:

1. Train the Trainer Manual/Materials

2. Examination Rater Manual

3. Examination Rater Video

4. Rater Checklist

Rater Training – Rater Reliability

“When subjective judgment enters into test scoring, evidence should be provided on both inter-rater consistency in scoring and within-examinee consistency over repeated measurements.”

Standard 2.13

1. Does an individual rater apply the testing standards consistently across multiple candidates?

2. Do groups of raters rate the same candidate consistently?

Rater Reliability during the training process:

• Part of the rater training process should involve groups of raters rating the same performance, to evaluate whether or not a consistent testing standard is being applied.

• This process should include an opportunity for all raters to discuss outliers and reach consensus about the appropriate standards.

Rater Reliability after the training process:

Trends of individual raters should be evaluated to monitor the

consistency of individual raters over time. Although it should be

expected that raters will evaluate candidates differently, it is

possible to review whether raters are consistently shifting over time.

Rater Training – Follow Up Training

There are instances when individuals have developed a valid exam,

appropriately trained their raters, and then experience problems due

to a lack of consistent, follow up training sessions for examination

raters.

Like any other aspect of a testing program, raters should be

evaluated on a regular basis. In addition, raters should be required to

undergo re-training on a periodic basis.

Rater Training – Appeal Process

One aspect of a testing program that should always be considered

during inception is the avenue for candidate feedback and (if

necessary) appeals.

Often, allowing an avenue for candidates to request feedback or

investigation into a exam administration will reduce the likelihood

that the challenge will progress to a legal one.

Important aspects of a candidate feedback and appeal process:

• Public documentation of the feedback and appeal process

• Clear candidate instructions on the information that should be included in feedback and/or appeal

• Specific timeframes for responses to feedback or appeals

• Designated group of resources to address feedback and appeal issues

Developing an avenue for client feedback at the inception of a

program is viewed much more positively by courts than one that

is set up after a challenge to the exam.

Processes developed post-challenge tend to be viewed with an

air of suspicion.

RecommendationsCase Study

Firefighters United for Fairness v. City of Memphis

• Class action challenging the practical portion of fire department promotional test.

• Practical portion consisted of a videotaped response to a factual situation presenting problems commonly encountered by fire department lieutenants and battalion chiefs.

• Plaintiffs claimed the practical test violated their due process and equal protection rights under the Fourteenth Amendment.

Holding

The practical test did not violate Plaintiffs’ rights under the Fourteenth Amendment.

RecommendationsCase Study

Firefighters United for Fairness v. City of Memphis (cont’d)

Fairness in grading

Court upheld the use of two raters to grade transcripts of practical video components of test using answer key developed by subject matter experts.

According to the court, this system “ensured that the capricious whim of individual assessors would not contribute to any alleged incorrect scores.” Firefighter United for Fairness v. City of Memphis, 362 F. Supp. 2d 963, 972 (W.D. Tenn. 2005).

Fairness in review

City established a multi-level review process:

– Candidates were permitted to review practical video, transcript of practical video, answer key of raters and submit “redlines” citing specific concerns with their tests

– Subject matter experts reviewed the redlines and changed scores to reflect problems inherent in the form, content or grading of the test, where appropriate

Reasoning

Physical Ability and Practical Exams: Recommendations and Evaluation Checklists

Job Analysis

Does the job analysis define the knowledge, skills, and abilities that compose the important and/or critical aspects of the job in question?

Was the job analysis conducted specifically for the job in question?

Is the job analysis current and based on a relevant candidate population?

If possible, were criterion-related validity studies conducted?

Concurrent Study?

Predictive Study?

Cut Score

Was a cut score study conducted with a representative sample of subject-matter experts (e.g., Modified Angoff Technique)?

Has the cut score process been documented?

Rater Training

Has a standardized rater training program been established?

Does the rater training include opportunities to ensure rater reliability?

Is follow up training provided on a regular basis?

Is rater data reviewed on a regular basis to identify changes in rating trends?

Candidate Appeal Process

Is there an avenue for candidates to provide feedback or submit an appeal regarding an examination administration?

Is that avenue well documented and publicly available?

Are there designated resources available for addressing feedback and appeals?

Physical Ability and Practical Exams

Questions?

Speaker Contact Information

Nikki Shepherd Eatchel, M.A.

Vice President, Test Development

Thomson Prometric

1260 Energy Lane

St. Paul, MN 55108

651-603-3396

nikki.eatchel@thomson.com

Robin Rome, Esq.Vice President, Legal and ContractsThomson Prometric2000 Lenox DriveLawrenceville, NJ 08648609-895-5160

robin.rome@thomson.com

physical ability testing and practical examinations: they fought the law and the law won nikki...

practical exams goals

clear annual conference

evaluation of exams

practical examinations

practical tests

virginia standards

licensure slide

evidence of validity

Documents

phd approach for multi-target tracking nikki hu nikki hu

nikki file

nikki goal setting

catálogo carnaval nikki

nikki minaj

“né à nikki.” (nacido en nikki). proyecto de...

nikki, bryan, kyla

i fought the law and the law won › au › journals ›...

nikki macinnescone

mirai nikki p

briefing from “special education law” nikki murdick...

nikki cass catalogue

nikki grimes

nikki r. haley

ask nikki coppa

cover nikki

1 i fought the law and the law won · i fought the law (and...

dallas: they fought the...

tosa nikki - seminario

nikki blackborow activities