rel&item.docx

7/29/2019 rel&item.docx

1/6

1

Reliability and Item Analysis

o precise measurement of hypothesized processes or variableo construct reliable measurement scaleso

precision of measurement (applied research) whenever variables aredifficult to observe (e.g. employee performance)

o design and evaluation of sum scales (made of multiple individualmeasurements

Basic Ideas

Questionnaire to measure peoples prejudices against foreign-made cars

Example: (Based on slogan Real Americans buy American cars!)

Items

1.

Foreign-made cars lack personality 1 2 3 4 92. Foreign-made cars look the same 1 2 3 4 9

1 =disagree 9= disagree

True Scores and Error

Example: Foreign-made Cars

Two Aspects in the Response

True: prejudice and some esoteric aspectother aspects (the

error e.g. a friend has just bought a foreign-made car)

Classical Model

= tau + error

where is the actual measurement (subjects reponse to item)

tau is the true score ( prejudice) and error is the random error (esoteric)

Reliability

Measurement is reliable if it reflects mostly true score, relative to the errorEx: The item Red foreign-made cars are particularly ugly is unreliable. Why?

- will capture not only a persons prejudice but also his or her colorpreference

so that proportion of true score would be small


2/6

2

Measure of Reliability

Index of Reliability =2

observedtotal

2

scoretrue

Sum Scales

o sum of several (reliable items)o Expected (error) = 0 (what does this means?)o More items, more reliable (sum scale)

Ex: height of ten persons using meter stick

Measure only oncenot reliable

Measure each person 100 times and get average you will

be able to distinguish reliably between individuals in

terms of their height

Cronbachs Alpha

Several response to items enables one to compute

o Variance for each itemo Variance for sum scale

Theory = Height

Ex:

Respondent 1 Respondent 2 Respondent 3 Respondent n

Item 1 Item 1 Item 1 Item 1

2 2 2 2

3 3 3 3

=

k

sum2S

i2S

11k

k

1i

where


3/6

3

Si2 = variance for k individuals

S2sum = variance for the sum of all items

o If there is no true score but only random errors in the items (uncorrelatedacross items) then Si2 =S2sum and =0

o If all items measure the same thing (true score) then =1o Nunnaly (1978) suggests an >0.7o For binary items (e.g. yes/no) this is called the Kuder-Richardson-20

Split-halfo Divide sum scale into two halves randomlyo Reliable if two halves are perfectly correlated! (r=1)

xyr1

xyr2

sbr

where rxy = correlation between two halves

Designing a Reliable Scale

Step 1.Generating Items.

o Write as many items as possible (essentially a creative process!)Ex: Can ask a small group of highly commited car buyers to express

their general thoughts and feelings about foreign-made cars.

Step 2.Choosing items of optimum difficulty.

o Item(s) where most respondents agree or disagree withdo not help todiscriminate between respondents (useless)

o Known as item difficultyo Look at item means and standard deviations and eliminate those that show

extreme means, and zero or nearly zero variances

Step 3.Choosing internally consistent items (Cronbachs alpha).


4/6

4

o More true score, few esoteric aspects (random errors)o Check items with small correlations with sum scale, has high alpha when the

item is deleted, and small multiple corellation (Statistica)

o See also other examples using SPSS and SAS (check our web!)

Example:

STATISTIC

A

RELIABL.

ANALYSIS

Summary for scale: Mean=46.1100 Std.Dv.=8.26444 Valid n:100

Cronbach alpha: .794313 Standardized alpha: .800491

Average inter-item corr.: .297818

variable

Mean if

deleted

Var. if

deleted

StDv. if

deleted

Itm-Totl

Correl.

Squared

Multp. R

Alpha if

deletedITEM1

ITEM2

ITEM3

ITEM4

ITEM5

ITEM6

ITEM7

ITEM8

ITEM9

ITEM10

41.61000

41.3700041.4100041.6300041.5200041.56000

41.4600041.33000

41.4400041.66000

51.93790

53.7931054.8619056.5731064.1696162.68640

54.0284053.32110

55.0664053.78440

7.206795

7.3343787.4068827.5215098.0105937.917474

7.3504017.302130

7.4206747.333785

.656298

.666111

.549226

.470852

.054609

.118561

.587637

.609204

.502529

.572875

.507160

.533015

.363895

.305573

.057399

.045653

.443563

.446298

.328149

.410561

.752243

.754692

.766778

.776015

.824907

.817907

.762033

.758992

.772013

.763314

Shown above are the results for 10 items. Of most interest to us are the threeright-most columns. They show us the correlation between the respective item

and the total sum score (without the respective item), the squared multiple

correlation between the respective item and all others, and the internalconsistency of the scale (coefficient alpha) if the respective item would be

deleted.

Clearly, items 5 and 6"stick out," in that they are not consistent with the rest ofthe scale. Their correlations with the sum scale are .05 and .12, respectively,while all other items correlate at .45 or better.

In the right-most column, we can see that the reliability of the scale would beabout .82 if either of the two items were to be deleted. Thus, we would probablydelete the two items from this scale.


5/6

5

Step 4: Returning to Step 1. After deleting all items that are not consistent with thescale, we may not be left with enough items to make up an overall reliable scale

(remember that, the fewer items, the less reliable the scale). In practice, one often

goes through several rounds of generating items and eliminating items, until onearrives at a final set that makes up a reliable scale.

A Few Commands:

SAS: PROC CORR ALPHA NOMISS;

VAR VAR1-VARn;

RUN;

SPSS: RELIABILITY

/VARIABLES=q1 q2 q3 q4.

CORRELATIONS command:

CORRELATIONS VARIABLES=q1 q2 q3 q4.

STATA:alpha var1-varn

STATISTICA: (Assignment!)

Exercise: Download the file samplealpha.sd2 and samplealpha.sav in our

web and try to do some reliability analysis in SAS and SPSS.

Guide to Interpretation

Reliability Interpretation

.90 and above Excellent reliability; at the level of the best standardized tests

.80 - .90 Very good for a classroom test

.70 - .80Good for a classroom test; in the range of most. There are probably a few itemswhich could be improved.

.60 - .70Somewhat low. This test needs to be supplemented by other measures (e.g., moretests) to determine grades. There are probably some items which could be

improved.

.50 - .60Suggests need for revision of test, unless it is quite short (ten or fewer items). Thetest definitely needs to be supplemented by other measures (e.g., more tests) for

grading.

.50 or belowQuestionable reliability. This test should not contribute heavily to the coursegrade, and it needs revision.


6/6

6

http://www.arts.auckland.ac.nz/edu/staff/

rel&item.docx

Documents