rel&item.docx
TRANSCRIPT
-
7/29/2019 rel&item.docx
1/6
1
Reliability and Item Analysis
o precise measurement of hypothesized processes or variableo construct reliable measurement scaleso
precision of measurement (applied research) whenever variables aredifficult to observe (e.g. employee performance)
o design and evaluation of sum scales (made of multiple individualmeasurements
Basic Ideas
Questionnaire to measure peoples prejudices against foreign-made cars
Example: (Based on slogan Real Americans buy American cars!)
Items
1.
Foreign-made cars lack personality 1 2 3 4 92. Foreign-made cars look the same 1 2 3 4 9
1 =disagree 9= disagree
True Scores and Error
Example: Foreign-made Cars
Two Aspects in the Response
True: prejudice and some esoteric aspectother aspects (the
error e.g. a friend has just bought a foreign-made car)
Classical Model
= tau + error
where is the actual measurement (subjects reponse to item)
tau is the true score ( prejudice) and error is the random error (esoteric)
Reliability
Measurement is reliable if it reflects mostly true score, relative to the errorEx: The item Red foreign-made cars are particularly ugly is unreliable. Why?
- will capture not only a persons prejudice but also his or her colorpreference
so that proportion of true score would be small
-
7/29/2019 rel&item.docx
2/6
2
Measure of Reliability
Index of Reliability =2
observedtotal
2
scoretrue
Sum Scales
o sum of several (reliable items)o Expected (error) = 0 (what does this means?)o More items, more reliable (sum scale)
Ex: height of ten persons using meter stick
Measure only oncenot reliable
Measure each person 100 times and get average you will
be able to distinguish reliably between individuals in
terms of their height
Cronbachs Alpha
Several response to items enables one to compute
o Variance for each itemo Variance for sum scale
Theory = Height
Ex:
Respondent 1 Respondent 2 Respondent 3 Respondent n
Item 1 Item 1 Item 1 Item 1
2 2 2 2
3 3 3 3
=
k
sum2S
i2S
11k
k
1i
where
-
7/29/2019 rel&item.docx
3/6
3
Si2 = variance for k individuals
S2sum = variance for the sum of all items
o If there is no true score but only random errors in the items (uncorrelatedacross items) then Si2 =S2sum and =0
o If all items measure the same thing (true score) then =1o Nunnaly (1978) suggests an >0.7o For binary items (e.g. yes/no) this is called the Kuder-Richardson-20
Split-halfo Divide sum scale into two halves randomlyo Reliable if two halves are perfectly correlated! (r=1)
xyr1
xyr2
sbr
where rxy = correlation between two halves
Designing a Reliable Scale
Step 1.Generating Items.
o Write as many items as possible (essentially a creative process!)Ex: Can ask a small group of highly commited car buyers to express
their general thoughts and feelings about foreign-made cars.
Step 2.Choosing items of optimum difficulty.
o Item(s) where most respondents agree or disagree withdo not help todiscriminate between respondents (useless)
o Known as item difficultyo Look at item means and standard deviations and eliminate those that show
extreme means, and zero or nearly zero variances
Step 3.Choosing internally consistent items (Cronbachs alpha).
-
7/29/2019 rel&item.docx
4/6
4
o More true score, few esoteric aspects (random errors)o Check items with small correlations with sum scale, has high alpha when the
item is deleted, and small multiple corellation (Statistica)
o See also other examples using SPSS and SAS (check our web!)
Example:
STATISTIC
A
RELIABL.
ANALYSIS
Summary for scale: Mean=46.1100 Std.Dv.=8.26444 Valid n:100
Cronbach alpha: .794313 Standardized alpha: .800491
Average inter-item corr.: .297818
variable
Mean if
deleted
Var. if
deleted
StDv. if
deleted
Itm-Totl
Correl.
Squared
Multp. R
Alpha if
deletedITEM1
ITEM2
ITEM3
ITEM4
ITEM5
ITEM6
ITEM7
ITEM8
ITEM9
ITEM10
41.61000
41.3700041.4100041.6300041.5200041.56000
41.4600041.33000
41.4400041.66000
51.93790
53.7931054.8619056.5731064.1696162.68640
54.0284053.32110
55.0664053.78440
7.206795
7.3343787.4068827.5215098.0105937.917474
7.3504017.302130
7.4206747.333785
.656298
.666111
.549226
.470852
.054609
.118561
.587637
.609204
.502529
.572875
.507160
.533015
.363895
.305573
.057399
.045653
.443563
.446298
.328149
.410561
.752243
.754692
.766778
.776015
.824907
.817907
.762033
.758992
.772013
.763314
Shown above are the results for 10 items. Of most interest to us are the threeright-most columns. They show us the correlation between the respective item
and the total sum score (without the respective item), the squared multiple
correlation between the respective item and all others, and the internalconsistency of the scale (coefficient alpha) if the respective item would be
deleted.
Clearly, items 5 and 6"stick out," in that they are not consistent with the rest ofthe scale. Their correlations with the sum scale are .05 and .12, respectively,while all other items correlate at .45 or better.
In the right-most column, we can see that the reliability of the scale would beabout .82 if either of the two items were to be deleted. Thus, we would probablydelete the two items from this scale.
-
7/29/2019 rel&item.docx
5/6
5
Step 4: Returning to Step 1. After deleting all items that are not consistent with thescale, we may not be left with enough items to make up an overall reliable scale
(remember that, the fewer items, the less reliable the scale). In practice, one often
goes through several rounds of generating items and eliminating items, until onearrives at a final set that makes up a reliable scale.
A Few Commands:
SAS: PROC CORR ALPHA NOMISS;
VAR VAR1-VARn;
RUN;
SPSS: RELIABILITY
/VARIABLES=q1 q2 q3 q4.
CORRELATIONS command:
CORRELATIONS VARIABLES=q1 q2 q3 q4.
STATA:alpha var1-varn
STATISTICA: (Assignment!)
Exercise: Download the file samplealpha.sd2 and samplealpha.sav in our
web and try to do some reliability analysis in SAS and SPSS.
Guide to Interpretation
Reliability Interpretation
.90 and above Excellent reliability; at the level of the best standardized tests
.80 - .90 Very good for a classroom test
.70 - .80Good for a classroom test; in the range of most. There are probably a few itemswhich could be improved.
.60 - .70Somewhat low. This test needs to be supplemented by other measures (e.g., moretests) to determine grades. There are probably some items which could be
improved.
.50 - .60Suggests need for revision of test, unless it is quite short (ten or fewer items). Thetest definitely needs to be supplemented by other measures (e.g., more tests) for
grading.
.50 or belowQuestionable reliability. This test should not contribute heavily to the coursegrade, and it needs revision.
-
7/29/2019 rel&item.docx
6/6
6
http://www.arts.auckland.ac.nz/edu/staff/