how to create reliable and valid classroom tests
TRANSCRIPT
S A R A T S A H A K I S , P S Y . D .
T E S T I N G C E N T E R C O O R D I N A T O R
How to Design Reliable and Valid Classroom Exams
Objectives
After this workshop, you will know how to
Use basic testing terminology
Build a test blueprint
Choose the appropriate item format
Develop a test
Minimize discrepancies in grading
Avoid bias in assessment
Testing Terminology
Test Blueprint Identifies the objectives/skills to be assessed by the test and
their relative weight.
Item Development Test question or prompt. Can only be developed once the
blueprint is established. Should only measure a single objective
Item Format The form the item takes and the type of answer it requires.
Objective (Multiple-choice), Free Response (Short-answer, essay or show-your-work).
Depends on the blueprint (that states what needs to be assessed)
Testing Terminology
Multiple-choice specific terminology:
Stem: The question or prompt/statement. This is the part that states the problem.
Options/Alternatives: All the answer choices given to the examinee, including the correct one.
Distractors: The incorrect answer choices.
Key: The correct answer choice.
Test Blueprint
Write down course objectives (or objectives specific to this exam).
Describe each objective in a measurable way (use verbs that signify an observable behavior, rather than “know” or “understand”).
For example: Explain, Analyze, Define, Discuss, Interpret, Justify, Recognize
“Explain the difference between the borderline personality disorder and the bipolar disorder”
Decide on relative weight (%) for each objective
Choosing an item format
Essay
Takes little time to write a question
Offers no opportunity for guessing
Can measure ability to analyze, synthetize or evaluate information
Measures language skills (great for language courses)
Takes longer to score
Choosing an item format
Multiple choice Can be scored quickly/automatically
Produces consistent scores if scored by several people
Takes longer to write
Both types of items can measure knowledge and understanding of course content if formulated properly.
Both can measure ability to apply information.
When choosing item format, focus on type of skill to be measured, not on personal preference. Most skills can be measured by multiple-choice items, but skills that require writing skills or application of a process are best measured with essay/free response items.
Show-your-work questions also fall into that category (math or science).
Multiple-choice item development
The stem:
Should contain the problem/question and all necessary information
Should precede the alternatives
As short/uncomplicated as possible: you do not want to make this a question about reading ability (unless you do).
Avoid negative items. If you have to use them, underline the negative part or put ALL negative questions in a separate section.
Avoid words like “always” and “never” in true/false question. Most people think these statements are untrue, regardless of the statement.
Multiple-choice item development
The Alternatives Choose between 3 and 5 alternatives
More alternatives means less opportunity for guessing:
3 alternatives = 33% chances of guessing
5 alternatives = 20% chances of guessing
If more than 1 alternative is partially correct, ask for the best answer
If 2 alternatives are very similar (spelling), put them next to one another to facilitate comparison.
Make all incorrect alternatives attractive. Common misconceptions make good alternatives.
Make alternatives as short as possible. If they start the same way, put that section in the stem.
Multiple-choice item development
Example Question
Which of the following is a disadvantage of relying on external rewards to motivate behavior?
A. There is potential to reduce extrinsic motivation.
B. There is potential to reduce intrinsic motivation.
C. It increases fear of failure.
D. It increases fear of success.
E. It decreases competency.
Similar alternatives
Attractive distractors
Key
Multiple-choice item development
Useful tips Use a logical sequence for the order of alternatives
From the shortest to the longest
If alternatives are numbers, from lowest to highest
Move the position of the key in the list of distractors from one item to another
In foreign languages, only provide alternatives with correct spelling and words that exist
If the alternatives are meant to complete the sentence in the stem, make sure they all have the same grammatical form
Avoid repeating words in stem and key. Words can be repeated in distractors to make them more attractive
Avoid all of the above. It makes it easier to guess.
Sample items: Multiple-choice
Measuring application of knowledge Which of the following demonstrates negative reinforcement of a behavior:
A. A teenager stays out past curfew and loses a week of TV time.
B. An adult takes medication, a headache disappears.
C. A child rakes the leaves in his yard and receives money.
D. A toddler bites a playmate and is spanked.
Measuring analysis of information If 2x is an even number, then x is:
A. An even number.
B. An odd number.
C. And even or an odd number
Short answer questions
Free-response items minimize guessing risks Short answer items are easier to score than essay items, especially
when they are 1-word answers (they are still subjective) Easier to write than multiple-choice and still have less subjective
scoring Can be fill-in-the blank: useful for language exams (sentence
completion) Must ensure all the correct answers, if more than one, are listed for
scorers to use (try to anticipate other correct answers students can think of)
Focus on one precise area of content Use words like: complete, list, identify, name. Examples:
List 3 of the 10 symptoms for borderline personality disorder. Complete the following sentence by conjugating the verb in parenthesis:
Anita ____________ (to cook) when Roger came home from school.
Essay Item Development
Do not require the answer to be too long
The longer the answer, the more chances you will be distracted by fluency, spelling, handwriting, etc. when you score.
Help testers stay on point
Give them a starting sentence if necessary
Focus on a single issue, limit freedom
Make the question as clear as possible
Avoid making testers guess what exactly you are looking for
Essay Item Development
Keep the objective in mind when writing the question
If objective is to explain, ask for an explanation.
Words to use: explain, illustrate, contrast.
Avoid: discuss, describe (can send test taker to different directions).
In doubt, refer to your blueprint for the objective and use the same wording in your question.
Avoid giving choices among essay questions: unlikely to be equivalent
Scoring Essay exams
Prepare a scoring table
List the elements that need to be present for students to obtain all the points allotted for the question
Decide on a scale:
5 elements= 3 points
3-4 elements = 2 points
1-2 element = 1 point
0 element = 0 point
Stick to it
If several people are involved in scoring, have them compare scores several times throughout the process and adjust
Advice for all types of exams
Avoid using humor or sarcasm in questions. It can have these unintended effects on students:
Not taking the exam seriously
Making them anxious or confused
It can be culturally biased (not understanding a very cultural joke)
Minimizing bias
Avoid climate-related contexts in questions when not part of the construct you want to measure
Driving on the snow/Student from Florida
Avoid questions with a context that can be gender-related
For example: Who won the golf game? (You need to know golf rules and know that the lower score wins)
Avoid using pop culture in context
Avoid asking about religion if not part of the course
Vary gender roles in questions
Minimizing bias
Have your test reviewed by various coworkers from different backgrounds (gender, ethnic, etc.)
Avoid assuming general knowledge (stick to what was taught in class).
Do not mention ethnic, political or religious groups/beliefs in test, unless part of course content.
Socioeconomic bias: avoid using contexts that some socioeconomic groups may have been more exposed to than other (ex: expensive objects, higher education language, etc.)
Using Assessment Data
Assessment data can be used in the classroom:
Pinpoint questions with lower success rates to review content in classroom
Review each test after a year:
Increase and decrease difficulty by modifying distractors in multiple-choice items
Clarify essay questions or modify scoring criteria
Analyze success rates of various groups to verify bias
Compare male and female students
Compare ethnic groups or socioeconomic status if data available
Compare full-time vs part-time students
Adjust assessment and course delivery to include all groups
Process summary
Test blueprint
Item format selection
Item development
Scoring decisions (free-
response)
Review/Bias evaluation
Questions?
Thank you!
Feel free to contact me for assessment support/reviews/questions!
[email protected], Ext. 7431
References
Cohen, A. S., & Wollack, J. A. (Year Unknown). Handbook on Test Development. Helpful Tips for Creating Reliable and Valid Classroom Tests. Testing & Evaluation Services, University of Wisconsin-Madison.
Friedenberg, L. (1995). Psychological Testing: Design, Analysis and
Use. Allyn and Bacon: Needham Heigths.
Popham, W. J. (2012). Assessment Bias: How to Banish It, 2nd Ed. Pearson.
Reiner, C. M., Bothell, T. W., Sudweeks, R. R., and Wood, B. (2002).
Preparing Effective Essay Questions: A Self-Directed Workbook for Educators. Testing Center, Brigham and Young University.