test development. stages test conceptualization –defining the test test construction –selecting...
TRANSCRIPT
![Page 1: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/1.jpg)
Test Development
![Page 2: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/2.jpg)
stages• Test conceptualization
– defining the test
• Test construction– Selecting a measurement scale– Developing items
• Test tryout• Item analysis• Revising the test
![Page 3: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/3.jpg)
1. Test conceptualization
• Defining the scope, purpose, and limits of the test.
![Page 4: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/4.jpg)
Initial questions in test construction
• Should the item content be similar or varied?
Should the range of difficulty be narrow or broad? – ceiling effect vs. floor effect
• How many items should be created?
![Page 5: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/5.jpg)
• Which domains should be tapped? – the test developer may specify content domains
and cognitive skills that must be included on the test.
• What kind of test item should be used?
![Page 6: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/6.jpg)
2. Test construction
![Page 7: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/7.jpg)
Selecting a scaling method
![Page 8: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/8.jpg)
levels of measurement
• N
• O
• I
• R
![Page 9: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/9.jpg)
Scaling methods
• Most are rating scales that are summative
• May be unidimensional or multi-dimensional
![Page 10: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/10.jpg)
Method of paired comparisons
• Aka forced choice
• Test taker is forced to pick one of two items paired together
![Page 11: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/11.jpg)
Comparative scaling
• Test takers sort cards or rank items from “least” to “most”
![Page 12: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/12.jpg)
Categorical scaling
• Test takers sort cards into one of 2 or more categories.
• Stimuli are thought to differ quantitatively not qualitatively
![Page 13: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/13.jpg)
Likert type scales
• Response choices are ordered on a continuum from one extreme to the other (e.g., strongly agree to strongly disagree).
• Likert assumes an interval scale although this may not be realistically accurate.
![Page 14: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/14.jpg)
Guttman scales
• Response choices for each item are various statements that lie on a continuum.
• Endorsing the most extreme statement reflects endorsement of milder statements as well.
![Page 15: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/15.jpg)
Method of equal-appearing intervals
• Presumed to be interval• For knowledge scale:
– obtain T/F statements – Experts rate each item
• For attitude scale– Judges rate each item on a likert scale assuming equal
intervals
• For both • Total test score for the test taker is based on “weighted” items
(determined by averaging the experts ratings)
![Page 16: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/16.jpg)
Method of absolute scaling
• Way to determine the difficulty level of items. – Give items to several age groups, with one age
group acting as the anchor.
• Item difficulty is assessed by noting the performance of each age group on each item as compared to the anchor group.
![Page 17: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/17.jpg)
Method of empirical keying
• Based entirely on empirical findings. • Test developer comes up with several items
and then gives these to a group of people who are known to possess the construct and a group who is known not to possess the construct.
• Items are selected based on how well they distinguish one group from the other.
![Page 18: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/18.jpg)
Writing the items
![Page 19: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/19.jpg)
Item format
• Selected response
• Constructed response
![Page 20: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/20.jpg)
Multiple choice
• Pros----
• Cons----
![Page 21: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/21.jpg)
Matching
• Pros----
• Cons----
![Page 22: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/22.jpg)
True/False
• Pros----
• Cons----
• Forced-choice methodology.
![Page 23: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/23.jpg)
Fill in
• Pros----
• Cons----
![Page 24: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/24.jpg)
Short answer objective item
• Pros---
• Cons---
![Page 25: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/25.jpg)
Essay
• Pros----
• Cons----
![Page 26: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/26.jpg)
Scoring items
• Cumulative model
• Class/category
• Ipsative
• Correction for guessing
![Page 27: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/27.jpg)
3. Test tryout
• Should be on group that represents the ultimate group of test takers (who the test is intended for)
• Good items – Reliable– Valid– Discriminate well
![Page 28: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/28.jpg)
• Before item analysis, look at the variability of scores within the test– Floor effect?– Ceiling effect?
![Page 29: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/29.jpg)
4. Item analysis
• helps determine which items should be kept, revised, deleted.
![Page 30: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/30.jpg)
Item-difficulty index
• proportion of examinees who get the item correct.
• can get a mean item difficulty.
![Page 31: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/31.jpg)
Ideal item difficulty
• when using multiple guess items, try to account for the probability of chance. – Optimal item difficulty = 1+g/2 – exception to choosing item difficulty around
mid-range involves tests of extreme groups.
![Page 32: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/32.jpg)
Item endorsement
• proportion of examinees who endorsed the item.
![Page 33: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/33.jpg)
Item reliability index
• Indication of internal consistency
• Product of the item SD and the correlation between the item and total scale
• Items with low reliability can be eliminated
![Page 34: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/34.jpg)
Item validity index
• Correlate item with criterion – (helps identify predictively useful test items)
• Multiply the item score and the criterion total score with the SD of the item.– The usefulness of an item also depends on its
dispersion or ability to discriminate
![Page 35: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/35.jpg)
Item discrimination index• how well the item discriminates between high
scorers and low scorers on the test. • For each item, compare the performance of those
in the upper vs lower performance ranges. Formula: d= (U-L)/N
• U = # of pple in the upper range who got it right • L= # of pple in the lower range who got it right• N= total # of pple in the upper OR lower range.
![Page 36: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/36.jpg)
Interpreting the IDI
• can vary from –1 to +1. • A (–) number =• A 0 indicates =• The closer the IDI is to +1• Can also use the IDI approach to examine the
pattern of incorrect responses.
![Page 37: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/37.jpg)
Item characteristic curves
• “Graphic representation of item difficulty and discrimination”
• horizontal line = ability
• vertical line = probability of a correct response
![Page 38: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/38.jpg)
• plots the probability of a correct response relative to the position on the entire test.
• If the curve is an incline slope or like an S, the item is doing a good job of separating low and high scorers.
![Page 39: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/39.jpg)
Item fairness
– Items should measure the same thing across groups
– Items should have similar ICC across groups– Items should have similar predictive validity
across groups
![Page 40: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/40.jpg)
Speed tests
• Easy items, similar items – everyone gets correct.
• Measuring response time
• Traditional analyses of items do not apply
![Page 41: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/41.jpg)
Qualitative item analysis
• Test takers descriptions of the test
• Think aloud administrations
• Expert panels
![Page 42: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/42.jpg)
5. Revising the test
• based on the info we obtained from the item analysis. New items and additional testing of these items may be required.
![Page 43: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/43.jpg)
Cross validation
• Once you have your revised test, need to seek new, independent confirmation of the test’s validity.
• The researcher uses a new sample to determine if the test predicts the criterion as well as it did in the original sample.
![Page 44: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/44.jpg)
Validity shrinkage
• Typically, with cross validation, you will find that the test is less accurate in predicting the criterion with this new sample.
![Page 45: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/45.jpg)
Co-validation
• Validating two or more tests at the same time
• Co-norming
• Saves $
• Beneficial for tests that are used together
![Page 46: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/46.jpg)
6. Publishing the test
• final step that involves development of a test manual.
![Page 47: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/47.jpg)
Production of testing materials
• Testing materials that are user friendly will be more accepted. The lay out of the materials should allow for smooth administration.
![Page 48: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/48.jpg)
Technical manual
• Summarizes the technical data and references. Item analyses, scale reliabilities, validation evidence , etc can be found here.
![Page 49: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/49.jpg)
User’s manual
• provides instruction for administration, scoring, and interpretation.
• The Standards for Educational and Psychological Testing recommend that manuals meet several goals (p 135).
• two of the most important: • 1. describe the rationale and recommended uses
of the test• 2. provide data on reliability and validity.
![Page 50: Test Development. stages Test conceptualization –defining the test Test construction –Selecting a measurement scale –Developing items Test tryout Item](https://reader033.vdocuments.mx/reader033/viewer/2022061419/5518c30c550346a61f8b567d/html5/thumbnails/50.jpg)
Testing is big business