Constructing an Assessment Scale
Assessing the performance of…Chocolate…
Special Thanks to the UCLA Medical Education Fellowship(original source: Deb Simpson)
See https://creativecommons.org/licenses/by-nc-sa/3.0/ for full license.
Session Objectives
Develop an appropriate assessment instrument Determine evidence to support ratings from the
instrument Describe sources of error that threaten the
validity of the results List specific strategies to address the threats
Lesson Plan (2 hours)
Goal: develop an instrument for measuring chocolate performance for which there is evidence of validity of the resulting rating.
1. Develop rating criteria (20 min)
2. Develop scale (35 min)
3. Train raters (10 min)
4. Rate chocolates (10 min)
5. Identify sources of error (10 min)
6. Calculating (10 min)
7. “Beyond chocolate” and wrap up (15 min)
STEP 1: Develop quality indicators of chocolate
4 min Individually: List on worksheet all key quality indicators associated with “best” (or worse) chocolate
8 min Three teams: Agree on 5 indicators. List author for each indicator.
8 min Everyone: Compare the lists and agree on final 5 indicators.
STEP 2: Developing Rating Scale
35 min Each team develops a 7 point scale:Scale may be either
Behaviorally anchored: Silky - Gravel Likert-Like: Excellent - Horrible
Write questions/statements related to criteria and anchors
Write a descriptor for at least 3 scale anchors (two extremes and midpoint)
Transfer to large sheet for postingOverall rating?Everyone: agree on anchors, create final form on
large sheet, and copy on individual sheet (each indicator has an “owner”)
Wisconsin humor
How does the appearance of the chocolate affect your desire to eat it?
1 2 3 4 5 6 7 “I’m totally repulsed”
“Only if I am starving”
“Only if it’s free”
“Indifference” “I would eat it if I wasn’t on a diet”
“I can’t stop drooling”
“Take my child, but give me the chocolate”
UCSF Humor (?)
The appearance of the chocolate is sexy and sensual
1 2 3 4 5 6 7 Strongly disagree
disagree Disagree somewhat
neutral Agree somewhat
Agree Strongly Agree
October 2009
Criterion Question 1 2 3 4 5 6 7
texture What is the texture on the tongue after first bite
Coarse/
gritty
Somewhat smooth; consistent
Smooth and silky
balance Does it achieve a balance between bitter and sweet
Only one flavor present
Both flavors present not fully balanced
Complete marriage
melt How quickly does it melt?
Like rocks After mastication instantly
comp How complex is flavor/aroma
Bores me Thinking about it Stimulates all senses
yearn How much do you yearn for another
Couldn’t pay me
If no one else wants it
I am calling in sick since I am eating chocolate
overall How satisfied? Not at all moderately Highly satisfied
Behaviorally Anchored Scale
February 2011
Indicator for Dark chocolate 1 4 7
The balance between sweetness/bitterness is ideal. SD N SA
The flavor evolves from start to finish pleasingly.
There is a pleasurable balance between brittleness and smoothness when bitten
I absolutely have to have a second piece
Smelling of the chocolate makes me want to eat it.
I would recommend this chocolate to a friend
Likert Scale
November 2011
Criterion Question 1 2 3 4 5 6 7
The chocolate is pleasing because of SD
N SA
meltability The way it melts in my mouth
texture The texture
flavor The flavor
appearance The appearance
scent The scent
overall I would serve this to my best friend
Likert Scale
December 2014
Criterion Question 1 2 3 4 5 6 7
Step 3: Train Your Raters
10 min Each “owner” has 2 min to train the group for his/her indicator.
Goal is to eliminate bias.
You may use samples.
How is the final score determined?
Step 4: Sampling
Step 1 Pick up a sample place from each plate. Remember the number for each sample.
Step 2 Return to you seat. And prepare scoring sheets.
Step 3 At a signal sample each piece and complete the rating form for that sample. Follow training instructions.
Entering Data for Analysis
Excel or SPSS data collection
Validity
One element is reliabilityHow consistent are scores?
From one administration of an instrument to another (test-retest)
From one rater to another (inter-rater) Within a set of items (internal consistency,
Cronbach’s alpha)
No instrument is perfect
An individual’s observed performance/score is a combination “True score” and error:
Two types of error: Random Controllable
All measurements seek to control errors so that the observed score approaches true score
Observed Score =True Score + Errors of Measurement
Step 5: Sources of Error
Note on the Sources of Error Worksheet any factors that would affect reliability (consistency) of ratings
Indicate source of error: Rater Administration/procedure Instrument Chocolate
Sources of error
Random variability in person tested Mood changes Misinterpretation of question
Random variability in procedure Interruptions
Random variability of the rater Effects of rater’s sex Accuracy of rating
Random variability due to instrument Bad formatting
Instrumentation error
Respondents’ Task in Responding to a Question
“How many times have you consulted MD Consult during the last month?”
1. Never
2. Once or twice
3. 3-5 times
4. More than 5 times
Step 1: Understanding the question
Step 2: Recalling relevant behavior
Step 3: Inference and estimation
Step 4: Mapping the answer onto the response format
Step 5: “Editing” the answer for reasons of social desirability
“How many times have you consulted MD Consult during the last month?”
1. 0-5 times
2. 6-10 times
3. 11-30 times
4. More than 30 times
vs.
Systematic Error Random Error
Reliability & Measurement Reliability & Measurement ErrorError
Reasonably Reliable
vs.
Less Reliable
Sources of error in chocolate ratings Raters
Did they influence each other’s ratings? Contamination? Drink juice, coffee? Rating strategy:
Taste them all then rate? Rate each independently? Recognize brands? Bias? Halo effect?
Rater Fatigue? Instrument
Scale descriptors clear? Sufficient number of items?
Chocolate Shop Tips On
Step 6: Review of Scores
Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
How reliable was our measure?
Is there evidence of validity of our score? Is the score from the instrument really indicating what it
intends to measure?
Your instrument What Intend to Measure
glycated haemoglobin = Diabetes control?
USMLE Step 2 Score = Knowledgeable Resident?
HPE Chocolate score = Best Chocolate?
What could you do to obtain evidence that supports the validity of the score from the chocolate ratings?
What evidence is there for validity of the score?
Internal Structure/Content Do the questions represent the domain of interest? Appropriate content? Items have the right level of difficulty Scale is reliable
Relationship to other variables How does the score compare with a “gold standard?” Can the score discriminate between different classes of subjects?
Downing SM. Validity: on the meaningful interpretation of assessment data. Med Ed 2003;37:830-837.
Evidence Data Analysis
Obtain rank order based on our rating scale
Consumer Report consumer report chocolate review
Sources of Error: Instrument, Raters, Design/Admin, Subjects (chocolate)
Beyond Chocolate…
Apply Principles to Assessment of Learner Performance
What are common errors / problems in learner assessment?
The Ideal Instrument Evidence of validity for the score
Results are consistent (reliable) Instrument measures what it is supposed to measure It is relevant: its purpose matches the research
question/curriculum objectives Is appropriate for your target population
Feasible How long does it take to administer? How much training is involved? What are the costs (to individual, institution & society)? Do I have
the resources? Useful
Aids decision making Impacts future learning and practice Acceptable to learners and faculty
(Morrison, 2003; Amin & Eng, 2003)