rel&item.docx

Upload: cart11

Post on 03-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 rel&item.docx

    1/6

    1

    Reliability and Item Analysis

    o precise measurement of hypothesized processes or variableo construct reliable measurement scaleso

    precision of measurement (applied research) whenever variables aredifficult to observe (e.g. employee performance)

    o design and evaluation of sum scales (made of multiple individualmeasurements

    Basic Ideas

    Questionnaire to measure peoples prejudices against foreign-made cars

    Example: (Based on slogan Real Americans buy American cars!)

    Items

    1.

    Foreign-made cars lack personality 1 2 3 4 92. Foreign-made cars look the same 1 2 3 4 9

    1 =disagree 9= disagree

    True Scores and Error

    Example: Foreign-made Cars

    Two Aspects in the Response

    True: prejudice and some esoteric aspectother aspects (the

    error e.g. a friend has just bought a foreign-made car)

    Classical Model

    = tau + error

    where is the actual measurement (subjects reponse to item)

    tau is the true score ( prejudice) and error is the random error (esoteric)

    Reliability

    Measurement is reliable if it reflects mostly true score, relative to the errorEx: The item Red foreign-made cars are particularly ugly is unreliable. Why?

    - will capture not only a persons prejudice but also his or her colorpreference

    so that proportion of true score would be small

  • 7/29/2019 rel&item.docx

    2/6

    2

    Measure of Reliability

    Index of Reliability =2

    observedtotal

    2

    scoretrue

    Sum Scales

    o sum of several (reliable items)o Expected (error) = 0 (what does this means?)o More items, more reliable (sum scale)

    Ex: height of ten persons using meter stick

    Measure only oncenot reliable

    Measure each person 100 times and get average you will

    be able to distinguish reliably between individuals in

    terms of their height

    Cronbachs Alpha

    Several response to items enables one to compute

    o Variance for each itemo Variance for sum scale

    Theory = Height

    Ex:

    Respondent 1 Respondent 2 Respondent 3 Respondent n

    Item 1 Item 1 Item 1 Item 1

    2 2 2 2

    3 3 3 3

    =

    k

    sum2S

    i2S

    11k

    k

    1i

    where

  • 7/29/2019 rel&item.docx

    3/6

    3

    Si2 = variance for k individuals

    S2sum = variance for the sum of all items

    o If there is no true score but only random errors in the items (uncorrelatedacross items) then Si2 =S2sum and =0

    o If all items measure the same thing (true score) then =1o Nunnaly (1978) suggests an >0.7o For binary items (e.g. yes/no) this is called the Kuder-Richardson-20

    Split-halfo Divide sum scale into two halves randomlyo Reliable if two halves are perfectly correlated! (r=1)

    xyr1

    xyr2

    sbr

    where rxy = correlation between two halves

    Designing a Reliable Scale

    Step 1.Generating Items.

    o Write as many items as possible (essentially a creative process!)Ex: Can ask a small group of highly commited car buyers to express

    their general thoughts and feelings about foreign-made cars.

    Step 2.Choosing items of optimum difficulty.

    o Item(s) where most respondents agree or disagree withdo not help todiscriminate between respondents (useless)

    o Known as item difficultyo Look at item means and standard deviations and eliminate those that show

    extreme means, and zero or nearly zero variances

    Step 3.Choosing internally consistent items (Cronbachs alpha).

  • 7/29/2019 rel&item.docx

    4/6

    4

    o More true score, few esoteric aspects (random errors)o Check items with small correlations with sum scale, has high alpha when the

    item is deleted, and small multiple corellation (Statistica)

    o See also other examples using SPSS and SAS (check our web!)

    Example:

    STATISTIC

    A

    RELIABL.

    ANALYSIS

    Summary for scale: Mean=46.1100 Std.Dv.=8.26444 Valid n:100

    Cronbach alpha: .794313 Standardized alpha: .800491

    Average inter-item corr.: .297818

    variable

    Mean if

    deleted

    Var. if

    deleted

    StDv. if

    deleted

    Itm-Totl

    Correl.

    Squared

    Multp. R

    Alpha if

    deletedITEM1

    ITEM2

    ITEM3

    ITEM4

    ITEM5

    ITEM6

    ITEM7

    ITEM8

    ITEM9

    ITEM10

    41.61000

    41.3700041.4100041.6300041.5200041.56000

    41.4600041.33000

    41.4400041.66000

    51.93790

    53.7931054.8619056.5731064.1696162.68640

    54.0284053.32110

    55.0664053.78440

    7.206795

    7.3343787.4068827.5215098.0105937.917474

    7.3504017.302130

    7.4206747.333785

    .656298

    .666111

    .549226

    .470852

    .054609

    .118561

    .587637

    .609204

    .502529

    .572875

    .507160

    .533015

    .363895

    .305573

    .057399

    .045653

    .443563

    .446298

    .328149

    .410561

    .752243

    .754692

    .766778

    .776015

    .824907

    .817907

    .762033

    .758992

    .772013

    .763314

    Shown above are the results for 10 items. Of most interest to us are the threeright-most columns. They show us the correlation between the respective item

    and the total sum score (without the respective item), the squared multiple

    correlation between the respective item and all others, and the internalconsistency of the scale (coefficient alpha) if the respective item would be

    deleted.

    Clearly, items 5 and 6"stick out," in that they are not consistent with the rest ofthe scale. Their correlations with the sum scale are .05 and .12, respectively,while all other items correlate at .45 or better.

    In the right-most column, we can see that the reliability of the scale would beabout .82 if either of the two items were to be deleted. Thus, we would probablydelete the two items from this scale.

  • 7/29/2019 rel&item.docx

    5/6

    5

    Step 4: Returning to Step 1. After deleting all items that are not consistent with thescale, we may not be left with enough items to make up an overall reliable scale

    (remember that, the fewer items, the less reliable the scale). In practice, one often

    goes through several rounds of generating items and eliminating items, until onearrives at a final set that makes up a reliable scale.

    A Few Commands:

    SAS: PROC CORR ALPHA NOMISS;

    VAR VAR1-VARn;

    RUN;

    SPSS: RELIABILITY

    /VARIABLES=q1 q2 q3 q4.

    CORRELATIONS command:

    CORRELATIONS VARIABLES=q1 q2 q3 q4.

    STATA:alpha var1-varn

    STATISTICA: (Assignment!)

    Exercise: Download the file samplealpha.sd2 and samplealpha.sav in our

    web and try to do some reliability analysis in SAS and SPSS.

    Guide to Interpretation

    Reliability Interpretation

    .90 and above Excellent reliability; at the level of the best standardized tests

    .80 - .90 Very good for a classroom test

    .70 - .80Good for a classroom test; in the range of most. There are probably a few itemswhich could be improved.

    .60 - .70Somewhat low. This test needs to be supplemented by other measures (e.g., moretests) to determine grades. There are probably some items which could be

    improved.

    .50 - .60Suggests need for revision of test, unless it is quite short (ten or fewer items). Thetest definitely needs to be supplemented by other measures (e.g., more tests) for

    grading.

    .50 or belowQuestionable reliability. This test should not contribute heavily to the coursegrade, and it needs revision.

  • 7/29/2019 rel&item.docx

    6/6

    6

    http://www.arts.auckland.ac.nz/edu/staff/