item analysis. purpose of item analysis –evaluates the quality of each item –rationale: the...
TRANSCRIPT
![Page 1: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/1.jpg)
Item Analysis
![Page 2: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/2.jpg)
Purpose of Item Analysis
– Evaluates the quality of each item
– Rationale: the quality of items determines the quality of test (i.e., reliability & validity)
– May suggest ways of improving the measurement of a test
– Can help with understanding why certain tests predict some criteria but not others
![Page 3: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/3.jpg)
Item Analysis When analyzing the test items, we have several
questions about the performance of each item. Some of these questions include:
Are the items congruent with the test objectives? Are the items valid? Do they measure what they're
supposed to measure? Are the items reliable? Do they measure consistently? How long does it take an examinee to complete each
item? What items are most difficult to answer correctly? What items are easy? Are there any poor performing items that need to be
discarded?
![Page 4: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/4.jpg)
Types of Item Analyses for CTT
Three major types:
1. Assess quality of the distractors
2. Assess difficulty of the items
3. Assess how well an item differentiates between high and low performers
![Page 5: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/5.jpg)
A. Multiple-Choke
B. Multiply-Choice
C. Multiple-Choice
D. Multi-Choice
DISTRACTOR ANALYSIS
![Page 6: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/6.jpg)
Distractor Analysis
First question of item analysis: How many people choose each response?
If there is only one best response, then all other response options are distractors.
Example from in-class assignment (N = 35):
Which method has the best internal consistency? #
a) projective test 1
b) peer ratings 1
c) forced choice 21
d) differences n.s. 12
![Page 7: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/7.jpg)
Distractor Analysis (cont’d)A perfect test item would have 2 characteristics: 1. Everyone who knows the item gets it right 2. People who do not know the item will have responses equally distributed across the wrong answers.
It is not desirable to have one of the distractors chosen more often than the correct answer.
This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading.
![Page 8: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/8.jpg)
Distractor Analysis (cont’d)
Calculate the # of people expected to choose each of the distractors. If random same expected number for each wrong response (Figure 10-1).
N answering incorrectly 14
Number of distractors 3
# of Persons Exp. To Choose Distractor
= = 4.7
![Page 9: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/9.jpg)
Distractor Analysis (cont’d)
When the number of persons choosing a distractor significantly exceeds the number expected, there are 2 possibilities:
1. It is possible that the choice reflects partial knowledge
2. The item is a poorly worded trick question
unpopular distractors may lower item and test difficulty because it is easily eliminated
extremely popular is likely to lower the reliability and validity of the test
![Page 10: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/10.jpg)
Item Difficulty Analysis Description and How to Compute
ex: a) (6 X 3) + 4 = ?
b) 9[1n(-3.68) X (1 – 1n(+3.68))] = ?
It is often difficult to explain or define difficulty in terms of some intrinsic characteristic of the item
The only common thread of difficult items is that individuals did not know the answer
![Page 11: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/11.jpg)
Item Difficulty
Percentage of test takers who respond
correctly
What if p = .00
What if p = 1.00?
![Page 12: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/12.jpg)
Item Difficulty– An item with a p value of .0 or 1.0 does not
contribute to measuring individual differences and thus is certain to be useless
– When comparing 2 test scores, we are interested in who had the higher score or the differences in scores
– p value of .5 have most variation so seek items in this range and remove those with extreme values
– can also be examined to determine proportion answering in a particular way for items that don’t have a “correct” answer
![Page 13: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/13.jpg)
Item Difficulty (cont.)
What is the best p-value?
– most optimal p-value = .50– maximum discrimination between good
and poor performers
Should we only choose items of .50?
When shouldn’t we?
![Page 14: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/14.jpg)
Should we only choose items of .50?
Not necessarily ...
When wanting to screen the very top group of applicants (i.e., admission to university or medical school).
Cutoffs may be much higher
Other institutions want a minimum level (i.e., minimum reading level)
Cutoffs may be much lower
![Page 15: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/15.jpg)
Item Difficulty (cont.)
Interpreting the p-value...
example:
100 people take a test
15 got question 1 right
What is the p-value?
Is this an easy or hard item?
![Page 16: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/16.jpg)
Item Difficulty (cont.)
Interpreting the p-value...
example:
100 people take a test
70 got question 1 right
What is the p-value?
Is this an easy or hard item?
![Page 17: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/17.jpg)
Item Difficulty (cont’d)
General Rules of Item Difficulty…
p low (< .20) difficult test item
p moderate (.20 - .80) moderately diff.
p high (> .80) easy item
![Page 18: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/18.jpg)
ITEM DISCRIMINATION
... The extent to which an item differentiates people on the behavior that the test is designed to assess.
the computed difference between the percentage of high achievers and the percentage of low achievers who got the item right.
![Page 19: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/19.jpg)
Item Discrimination (cont.)
compares the performance of upper group (with high test scores) and lower group (low test scores) on each item--% of test takers in each group who were correct
![Page 20: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/20.jpg)
Item Discrimination (cont’d):Discrimination Index (D)
Divide sample into TOP half and BOTTOM half (or TOP and BOTTOM third)
Compute Discrimination Index (D)
![Page 21: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/21.jpg)
Item Discrimination D = U - L
U = # in the upper group correct response
Total # in upper group
L = # in the lower group correct response
Total # in lower group
The higher the value of D, the more adequately the item discriminates (The highest value is 1.0)
![Page 22: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/22.jpg)
Item Discrimination
seek items with high positive numbers (those who do well on the test tend to get the item correct)
negative numbers (lower scorers on test more likely to get item correct) and low positive numbers (about the same proportion of low and high scorers get the item correct) don’t discriminate well and are discarded
![Page 23: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/23.jpg)
Item Discrimination (cont’d):Item-Total Correlation
Correlation between each item (a correct response usually receives a score of 1 and an incorrect a score of zero) and the total test score.
To which degree do item and test measures the same thing?
Positive -item discriminates between high and low scores
Near 0 - item does not discriminate between high & low
Negative - scores on item and scores on test disagree
![Page 24: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/24.jpg)
Item Discrimination (cont’d):Item-Total Correlation
Item-total correlations are directly related to reliability.
Why?
Because the more each item correlates with the test as a whole, the higher all items correlate with each other
( = higher alpha, internal consistency)
![Page 25: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/25.jpg)
Quantitative Item Analysis Inter-item correlation matrix displays the
correlation of each item with every other item
provides important information for increasing the test’s internal consistency
each item should be highly correlated with every other item measuring the same construct and not correlated with items measuring a different construct
![Page 26: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/26.jpg)
Quantitative Item Analysis
items that are not highly correlated with other items measuring the same construct can and should be dropped to increase internal consistency
![Page 27: Item Analysis. Purpose of Item Analysis –Evaluates the quality of each item –Rationale: the quality of items determines the quality of test (i.e., reliability](https://reader035.vdocuments.mx/reader035/viewer/2022062312/5518c26455034638098b49b4/html5/thumbnails/27.jpg)
Item Discrimination (cont’d):Interitem Correlation
Possible causes for low inter-item correlation:
a. Item badly written (revise)
b. Item measures other attribute than rest of the test (discard)
c. Item correlated with some items, but not with others: test measures 2 distinct attributes (subtests or subscales)