how to create reliable and valid classroom tests

24
SARA TSAHAKIS, PSY.D. TESTING CENTER COORDINATOR How to Design Reliable and Valid Classroom Exams

Upload: khangminh22

Post on 30-Apr-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

S A R A T S A H A K I S , P S Y . D .

T E S T I N G C E N T E R C O O R D I N A T O R

How to Design Reliable and Valid Classroom Exams

Objectives

After this workshop, you will know how to

Use basic testing terminology

Build a test blueprint

Choose the appropriate item format

Develop a test

Minimize discrepancies in grading

Avoid bias in assessment

Testing Terminology

Test Blueprint Identifies the objectives/skills to be assessed by the test and

their relative weight.

Item Development Test question or prompt. Can only be developed once the

blueprint is established. Should only measure a single objective

Item Format The form the item takes and the type of answer it requires.

Objective (Multiple-choice), Free Response (Short-answer, essay or show-your-work).

Depends on the blueprint (that states what needs to be assessed)

Testing Terminology

Multiple-choice specific terminology:

Stem: The question or prompt/statement. This is the part that states the problem.

Options/Alternatives: All the answer choices given to the examinee, including the correct one.

Distractors: The incorrect answer choices.

Key: The correct answer choice.

Test Blueprint

Write down course objectives (or objectives specific to this exam).

Describe each objective in a measurable way (use verbs that signify an observable behavior, rather than “know” or “understand”).

For example: Explain, Analyze, Define, Discuss, Interpret, Justify, Recognize

“Explain the difference between the borderline personality disorder and the bipolar disorder”

Decide on relative weight (%) for each objective

Test Blueprint (continued)

3 multiple-choice questions

Choosing an item format

Essay

Takes little time to write a question

Offers no opportunity for guessing

Can measure ability to analyze, synthetize or evaluate information

Measures language skills (great for language courses)

Takes longer to score

Choosing an item format

Multiple choice Can be scored quickly/automatically

Produces consistent scores if scored by several people

Takes longer to write

Both types of items can measure knowledge and understanding of course content if formulated properly.

Both can measure ability to apply information.

When choosing item format, focus on type of skill to be measured, not on personal preference. Most skills can be measured by multiple-choice items, but skills that require writing skills or application of a process are best measured with essay/free response items.

Show-your-work questions also fall into that category (math or science).

Multiple-choice item development

The stem:

Should contain the problem/question and all necessary information

Should precede the alternatives

As short/uncomplicated as possible: you do not want to make this a question about reading ability (unless you do).

Avoid negative items. If you have to use them, underline the negative part or put ALL negative questions in a separate section.

Avoid words like “always” and “never” in true/false question. Most people think these statements are untrue, regardless of the statement.

Multiple-choice item development

The Alternatives Choose between 3 and 5 alternatives

More alternatives means less opportunity for guessing:

3 alternatives = 33% chances of guessing

5 alternatives = 20% chances of guessing

If more than 1 alternative is partially correct, ask for the best answer

If 2 alternatives are very similar (spelling), put them next to one another to facilitate comparison.

Make all incorrect alternatives attractive. Common misconceptions make good alternatives.

Make alternatives as short as possible. If they start the same way, put that section in the stem.

Multiple-choice item development

Example Question

Which of the following is a disadvantage of relying on external rewards to motivate behavior?

A. There is potential to reduce extrinsic motivation.

B. There is potential to reduce intrinsic motivation.

C. It increases fear of failure.

D. It increases fear of success.

E. It decreases competency.

Similar alternatives

Attractive distractors

Key

Multiple-choice item development

Useful tips Use a logical sequence for the order of alternatives

From the shortest to the longest

If alternatives are numbers, from lowest to highest

Move the position of the key in the list of distractors from one item to another

In foreign languages, only provide alternatives with correct spelling and words that exist

If the alternatives are meant to complete the sentence in the stem, make sure they all have the same grammatical form

Avoid repeating words in stem and key. Words can be repeated in distractors to make them more attractive

Avoid all of the above. It makes it easier to guess.

Sample items: Multiple-choice

Measuring application of knowledge Which of the following demonstrates negative reinforcement of a behavior:

A. A teenager stays out past curfew and loses a week of TV time.

B. An adult takes medication, a headache disappears.

C. A child rakes the leaves in his yard and receives money.

D. A toddler bites a playmate and is spanked.

Measuring analysis of information If 2x is an even number, then x is:

A. An even number.

B. An odd number.

C. And even or an odd number

Short answer questions

Free-response items minimize guessing risks Short answer items are easier to score than essay items, especially

when they are 1-word answers (they are still subjective) Easier to write than multiple-choice and still have less subjective

scoring Can be fill-in-the blank: useful for language exams (sentence

completion) Must ensure all the correct answers, if more than one, are listed for

scorers to use (try to anticipate other correct answers students can think of)

Focus on one precise area of content Use words like: complete, list, identify, name. Examples:

List 3 of the 10 symptoms for borderline personality disorder. Complete the following sentence by conjugating the verb in parenthesis:

Anita ____________ (to cook) when Roger came home from school.

Essay Item Development

Do not require the answer to be too long

The longer the answer, the more chances you will be distracted by fluency, spelling, handwriting, etc. when you score.

Help testers stay on point

Give them a starting sentence if necessary

Focus on a single issue, limit freedom

Make the question as clear as possible

Avoid making testers guess what exactly you are looking for

Essay Item Development

Keep the objective in mind when writing the question

If objective is to explain, ask for an explanation.

Words to use: explain, illustrate, contrast.

Avoid: discuss, describe (can send test taker to different directions).

In doubt, refer to your blueprint for the objective and use the same wording in your question.

Avoid giving choices among essay questions: unlikely to be equivalent

Scoring Essay exams

Prepare a scoring table

List the elements that need to be present for students to obtain all the points allotted for the question

Decide on a scale:

5 elements= 3 points

3-4 elements = 2 points

1-2 element = 1 point

0 element = 0 point

Stick to it

If several people are involved in scoring, have them compare scores several times throughout the process and adjust

Advice for all types of exams

Avoid using humor or sarcasm in questions. It can have these unintended effects on students:

Not taking the exam seriously

Making them anxious or confused

It can be culturally biased (not understanding a very cultural joke)

Minimizing bias

Avoid climate-related contexts in questions when not part of the construct you want to measure

Driving on the snow/Student from Florida

Avoid questions with a context that can be gender-related

For example: Who won the golf game? (You need to know golf rules and know that the lower score wins)

Avoid using pop culture in context

Avoid asking about religion if not part of the course

Vary gender roles in questions

Minimizing bias

Have your test reviewed by various coworkers from different backgrounds (gender, ethnic, etc.)

Avoid assuming general knowledge (stick to what was taught in class).

Do not mention ethnic, political or religious groups/beliefs in test, unless part of course content.

Socioeconomic bias: avoid using contexts that some socioeconomic groups may have been more exposed to than other (ex: expensive objects, higher education language, etc.)

Using Assessment Data

Assessment data can be used in the classroom:

Pinpoint questions with lower success rates to review content in classroom

Review each test after a year:

Increase and decrease difficulty by modifying distractors in multiple-choice items

Clarify essay questions or modify scoring criteria

Analyze success rates of various groups to verify bias

Compare male and female students

Compare ethnic groups or socioeconomic status if data available

Compare full-time vs part-time students

Adjust assessment and course delivery to include all groups

Process summary

Test blueprint

Item format selection

Item development

Scoring decisions (free-

response)

Review/Bias evaluation

Questions?

Thank you!

Feel free to contact me for assessment support/reviews/questions!

[email protected], Ext. 7431

References

Cohen, A. S., & Wollack, J. A. (Year Unknown). Handbook on Test Development. Helpful Tips for Creating Reliable and Valid Classroom Tests. Testing & Evaluation Services, University of Wisconsin-Madison.

Friedenberg, L. (1995). Psychological Testing: Design, Analysis and

Use. Allyn and Bacon: Needham Heigths.

Popham, W. J. (2012). Assessment Bias: How to Banish It, 2nd Ed. Pearson.

Reiner, C. M., Bothell, T. W., Sudweeks, R. R., and Wood, B. (2002).

Preparing Effective Essay Questions: A Self-Directed Workbook for Educators. Testing Center, Brigham and Young University.