reliability analysis

27
1 HANDOUT ON RELIABILITY Reliability refers to the consistency and stability in the results of a test or scale. A test is said to be reliable if it yields similar results in repeated administrations when the attribute being measured is believed not to have changed in the interval between measurements, even though the test may be administered by different people and alternative forms of the test are used. For example, if you weighed yourself twice consecutively and the first time the scale read 130 lbs. And the second time 140 lbs., we would say that the scale was an unreliable measure of weights. In addition, to be reliable, an instrument or test must be confined to measuring a single construct and only one dimension. For example, if a questionnaire designed to measure anxiety simultaneously measured depression, the instrument would not be a reliable measure of anxiety. A reliable instrument or test must meet two conditions: it must have a small random error; and it must measure a single dimension. Among others, one major source of inconsistency in test results is random measurement error. A primary concern of test developers and test users is therefore to determine the extent to which random measurement errors influence test performance. The classical true score model provides a useful theoretical framework for defining reliability and for the development of practical reliability investigations. In the classical true score model, an examinee’s or a subject’s observed score on a particular test is viewed as a random sample of one of the many possible test scores that a person could have earned under repeated administrations of the same test; and the observed score (X) is envisioned as the composite of two hypothetical components - a true score (T) and a random error component (E). T is defined as the expected value of the examinee’s test scores over many repeated testings with the same test and E is the discrepancy between an examinee’s observed score and his/her true score. The following equation summarizes the relationship between X, T and E: X = T + E An important question which follows from the above is: How closely related are the examinees’ true and observed scores on a particular test or instrument? Based on the classical true score model 1 , two indices are derived to measure the relationship between true and observed scores. 1. Reliability coefficient - defined as the correlation between parallel measures 2 . 1 “X = T + E” is only one of the assumptions of the classical true score theory. Please consult texts on measurement/test theory for other assumptions in the model as well as how the reliability coefficient and the reliability index are derived from the model. 2 According to classical true score theory, two measures/tests are defined as parallel when 1) each examinee or subject has the same true score on both measures/tests, and 2). The error variances of the two measures/tests are equal. Based on this definition, it is sensible to assume that

Upload: muhammad-tawakal-shah

Post on 07-Aug-2015

57 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Reliability Analysis

1

HANDOUT ON RELIABILITY Reliability refers to the consistency and stability in the results of a test or scale. A test is said to be reliable if it yields similar results in repeated administrations when the attribute being measured is believed not to have changed in the interval between measurements, even though the test may be administered by different people and alternative forms of the test are used. For example, if you weighed yourself twice consecutively and the first time the scale read 130 lbs. And the second time 140 lbs., we would say that the scale was an unreliable measure of weights. In addition, to be reliable, an instrument or test must be confined to measuring a single construct and only one dimension. For example, if a questionnaire designed to measure anxiety simultaneously measured depression, the instrument would not be a reliable measure of anxiety. A reliable instrument or test must meet two conditions: it must have a small random error; and it must measure a single dimension. Among others, one major source of inconsistency in test results is random measurement error. A primary concern of test developers and test users is therefore to determine the extent to which random measurement errors influence test performance. The classical true score model provides a useful theoretical framework for defining reliability and for the development of practical reliability investigations. In the classical true score model, an examinee’s or a subject’s observed score on a particular test is viewed as a random sample of one of the many possible test scores that a person could have earned under repeated administrations of the same test; and the observed score (X) is envisioned as the composite of two hypothetical components - a true score (T) and a random error component (E). T is defined as the expected value of the examinee’s test scores over many repeated testings with the same test and E is the discrepancy between an examinee’s observed score and his/her true score. The following equation summarizes the relationship between X, T and E: X = T + E An important question which follows from the above is: How closely related are the examinees’ true and observed scores on a particular test or instrument? Based on the classical true score model1, two indices are derived to measure the relationship between true and observed scores. 1. Reliability coefficient - defined as the correlation between parallel measures2 .

1“X = T + E” is only one of the assumptions of the classical true score theory. Please consult texts on measurement/test theory for other assumptions in the model as well as how the reliability coefficient and the reliability index are derived from the model.

2According to classical true score theory, two measures/tests are defined as parallel when 1) each examinee or subject has the same true score on both measures/tests, and 2). The error variances of the two measures/tests are equal. Based on this definition, it is sensible to assume that

Page 2: Reliability Analysis

Dr. Robert Gebotys 2003

2

This coefficient ( Dxx,) can be shown to equal the ratio F2T/F2

X , the proportion of observed score variance due to true score variance.

2. Reliability index - defined as the correlation between true and observed scores on a single

measure (i.e. DXT) and is equivalent to Fx/FT. However, in reality, we rarely know about the true scores. Besides, the reliability coefficient defined above is purely a theoretical concept because it is not possible to verify that two tests are truly parallel. Therefore reliability of tests have to be estimated using other methods. Methods of Estimating Reliability: The methods of estimating reliability can be roughly categorized into two groups: one group of methods includes methods that require two separate test administrations; and another group of methods includes those using one test administration. 1. Methods Requiring Two Separate Test Administrations: a. Test-Retest Method -

Test-Retest method yields a reliability estimate, m12, is based on testing the same examinees/subjects twice with the same test/scale and then correlating the results. If each examinee/subject receives exactly the same observed score on the second testing as he/she did on the first, and if there is some variance in the observed scores among examinees/subjects, then the correlation is 1.0, indicating perfect reliability. The correlation coefficient obtained from this test-retest procedure is called the coefficient of stability, which measures how consistently examinees/subjects respond to this test/scale at different times.

b. Alternate-Forms Method - This method involves constructing two similar forms of a test/scale (i.e. both forms have

the same content) and administering both forms to the same group of examinees within a very short time period. The correlation between observed scores on the alternate test/scale forms, (i.e. mxy computed using the Pearson product moment formula), is n estimate of the reliability of either one of the alternate forms. This correlation coefficient is known as coefficient of equivalence.

a. Test-Retest with Alternate Forms Method

This method is a combination of the test-retest and alternate-forms methods. In

parallel tests are matched in content.

Page 3: Reliability Analysis

3

this case, the procedure is to administer form 1 of the test/scale, wait, and then administer form 2. The correlation coefficient between the two sets of observed scores is an estimate of the reliability of either one of the alternate forms and is known as the coefficient of stability and equivalence.

2. Methods Using One Test Administration: There are many situations when a single form of a test/scale will be administered

only once to a group of examinees/subjects. The following are methods of estimating reliability based on scores from a single test administration. These methods of estimating reliability are mainly focused on how consistently the examinees/subjects performed or scored across items or subsets of items on this single test/scale form. The reliability estimates generated by these methods are usually called coefficient of internal consistency.

These methods of estimating reliability are based on the argument that if the

scores of the subjects/examinees are consistent across items or subsets of items on the single test/scale form, then it is reasonable to think that these items or subsets of items came from the same content domain and were constructed according to the same specifications. In addition, if the examinees/subjects’ performance is consistent across subsets of items within a test/scale, the test/scale administrator can also have some confidence that this performance would generalize to other possible items in the content domain.

a. Reliability Estimates Based on Item Variances: Calculation of Cronbach’s Alpha - This is the most widely used method of estimating reliability using a single test

administration. Cronbach’s Alpha (") is calculated based on the following formula:

" = k / k -1 ( { 1 - E F2i } / F2x )

where k is the number of items on the test/scale, F2i is the variance of item i, and F2

x is the total test variance Cronbach’s " can actually be conceived as the average of all the possible split-half reliabilities (Calculation of split-half reliabilities will be discussed in a following section) estimated on the single test/scale. However, unlike the split-half methods, Cronbach’s " is not affected by how the items are arranged in the test/scale.

b. Split-Half Method - Under this method, test/scale developers divide the scale/test into two halves, so

that the first half forms the first part of the entire test/scale and the second half forms the remaining part of the test/scale. Both halves are normally of equal lengths and they are designed in such a way that each is an alternate form of the other. Estimation of reliability is based on correlating the results of the two halves of the same test/scale. If

Page 4: Reliability Analysis

Dr. Robert Gebotys 2003

4

the two halves of the test/scale are parallel forms of one another, the Spearman Brown prophecy formula is used to estimate the reliability coefficient of the entire test/scale. The Spearman Brown prophecy formula is:

Dxx’ = 2 DYY, / 1 + DYY,

where Dxx’ is the reliability projected for the full-length test/scale, and DYY` is the

correlation between the half-tests. DYY, is also an estimate of the reliability of the test/scale if it contains the same number of items as that contained in the half-test.

If the two halves of test/scale are not parallel, the reliability of the full-length test/scale is calculated using the formula for coefficient " for split halves:

" = 2 [ F2

x - ( F2Y1 + F2 Y2) ] 1 / F2

x Where F2

Y1 and F2 Y2 are the variances of scores on the two halves of the test, and F2x

is the variance of the scores on the whole test, with X = Yl + Y2. In the SPSS program, the ‘SPLIT-HALF” model for reliability analysis is

conducted on the assumption that the two halves of the test/scale are parallel forms. Hence, coefficient " has to be obtained by hand calculations.

Besides, it must be noted that split-half reliability estimate is contingent upon how

the items in the test/scale are arranged. Reordering of the items and/or regrouping of items in the test/scale can result in different reliability estimates using the split-half method. Hence, reliability estimate obtained from the even/odd method (a method which is similar to split-half method and which will be mentioned below) on the same test/scale will most likely be different from the reliability estimated by using the split-half method.

c. Even/Odd Method -

Even/odd method is similar to split-half method, with the exception that the estimation of reliability for the entire test/scale is no longer based on correlating the first half of the test/scale with the second half, but instead it is based on correlating even items with odd items.

Determining Reliability Using SPSS: Example 1:

Page 5: Reliability Analysis

5

The following illustrative example contains six items extracted from a scale used to measure adolescents’ attitude towards the use of physical aggressive behaviours in their daily life. Each item in the scale refers to a situation where physical aggressive behaviour is or is not used. Adolescents are asked whether they agree or disagree with each and every item on the scale. Adolescents’ responses to the items are converted to scores of either 1 or 0, where 0 represents the endorsement of the use of physical aggressive behaviours and 1 represents disapproval of the use of physical aggressive behaviours. Below are the contents of the six items as well as the scores of 14 adolescents on these six items:

Item No. Content 1 When there are conflicts, people won’t listen to you unless you get physically

aggressive. 2 It is hard for me not to act aggressively if I am angry with someone. 3 Physical aggression does not help to solve problems, it only makes situations

worse. 4 There is nothing wrong with a husband hitting his wife if she has an affair. 5 Physical aggression is often needed to keep things under control. 6 When someone makes me mad, I don’t have to use physical aggression. I can

think of other ways to express my anger.

Page 6: Reliability Analysis

Dr. Robert Gebotys 2003

6

The following is the data obtained from 14 adolescents: Items Person 1 2 3 4 5 6 1 0 0 0 0 0 0 2 0 0 0 0 1 0 3 1 0 1 1 1 0 4 1 1 1 1 1 1 5 1 1 1 1 1 1 6 0 0 1 0 0 0 7 0 0 1 1 1 0 8 1 1 1 1 1 0 9 0 0 0 1 0 0 10 0 1 0 1 0 1 11 1 1 1 0 1 1 12 0 0 1 1 1 1 13 0 0 0 0 0 0 14 0 0 0 0 0 0 In the pages that follow, we will first outline the major commands for different models of reliability analyses and briefly explain the usage of these commands. Then, the whole program for the different reliability analyses will be reproduced in the next section, which will in turn be followed by discussions on the outputs. SPSS Commands for Reliability Analyses:3 1. Calculation of Cronbach’s Alpha: reliability variables=item 1 to item 6/ scale (test score) =item 1 to item 6/ model = alpha statistics all The subcommand “scale (testscore) = item1 to item 6" specifies the items on which

reliability analysis is to be carried out. In this case, item1 to item6 will form the “scale” on which analysis will be done. The subcommand “model =alpha” instructs the computer to perform the “ALPHA” model (i.e. to calculate Cronbach’s Alpha) for reliability analysis.

3 In the following illustrations and explanations, only a sample of commonly used commands and computer languages are shown. Students are advised to consult SPSS User’s Guide for other appropriate commands and computer languages in reliability analyses.

Page 7: Reliability Analysis

7

The command “statistics all” will instruct the computer to give us the following additional

statistics from reliability analysis:4 a. Item means and standard deviations; b. Inter-item covariance matrix; c. Inter-item correlation matrix; d. Scale mean, variance and standard deviation; e. Summary statistics for item means, item variances, inter-item covariances, inter-item

correlations, and item-total statistics (i.e. summary statistics comparing each item to the scale composed of other items (including alpha (") if that item is deleted));

f. ANOVA; g. Hotelling’s T-Squared; h. Other statistics like Friedman’s chi-square, Kendall’s coefficient of concordance and

Cochran’s Q, if applicable. 2. Assessing Split-Half Reliability: reliability variables=item1 to item6/ statistics=scale/ summary=means variances covariance correlations/ scale (test score) =item1 to item6/ model=split

The “scale (test score) =item1 to item6" subcommand specifies the number as well as the order of the items on which subsequent reliability analysis is to be performed. The subcommand “model=split’ instructs the computer to use the “SPLIT-HALF” model for reliability analysis on the scale. A split-half reliability analysis will be performed based on the order in which the items were named on the preceding “scale” subcommand, i.e., the first half of the items (rounding up if the number of items is odd) form the first part/half, and the remaining items form the second part/half. In this case, items 1, 2 and 3 will form the first part and items 4, 5 and 6 will form the second part.

Since the inter-item covariance matrix, inter-item correlation matrix, item means and

standard deviations as well as the item-total statistics produced from this reliability analysis are the same as those produced in the preceding “ALPHA” model (because the two analyses were performed on the same set of data), we may not want to look at these again at this stage. However, we may be interested in knowing the following: a. the means and standard deviations of each of the two parts of the scale; b. the summary statistics (i.e. item means, item variances, inter-item

4 Only outputs containing statistics categorized under a to e will be reproduced and discussed in subsequent pages because these are already sufficient in terms of serving the purposes and needs of our present analyses. Statistics under categories f to h will not be reported.

Page 8: Reliability Analysis

Dr. Robert Gebotys 2003

8

covariances and inter-item correlations) of each of the two parts of the scale. The insertion of the two subcommands, namely, “statistics=scale” and “summary= . . . correlations”, into the computer program will enable us to obtain the above-mentioned statistics which were not provided by the previous analysis based on the “ALPHA” model.5

3. Estimating Even/Odd Reliability: reliability variables=item1 to item6/ scale (test score) =item1 item3 item5 item2 item4 item6/ model=split statistics all

Since “EVEN/ODD” model for reliability analysis is not an available option in SPSS, the “SPLIT-HALF” model is used for this analysis. However, in order that the “SPLIT-HALF” model can be successfully employed for estimating even/odd reliability, the order of the items listed in the preceding “scale” subcommand must have been arranged in such a way that the odd items form the first part of the scale and that the even items form the remaining part. Please see the above “scale” subcommand for an illustration.

As already mentioned, the command “statistics all” instructs the computer to produce the

eight categories of additional statistics from reliability analysis6 In a later section, it will be shown that the item-total summary statistics, items means and standard deviations, inter-item covariance matrix, and the inter-item correlation matrix produced in this analysis are virtually the same as those produced from the “ALPHA” model of reliability analysis, with the exception that the statistics are displayed slightly differently as a result of reordering the six items. Alternatively, additional statistics which are specific to this model of reliability analysis and which are of interest to us can be obtained by using the same “statistics=scale” and “summary= . . . correlations” subcommands as those shown in the computer program for “SPLIT-HALF” model of reliability analysis.

Conducting All the Above-mentioned 3 Models of Reliability Analyses on the Set of Scores Obtained from 14 Adolescents for the 6 Items Using SPSS 1. SPSS Computer Program

5 If you want the full set of additional statistics from split-half reliability analysis, you have to write into the program the command “statistics all” in the same manner as that shown in the computer program for conducting the “ALPHA” model of reliability test.

6 Again, only statistics under categories a to e will be reported and discussed in subsequent pages.

Page 9: Reliability Analysis

9

2. SPSS Outputs and Discussions7 a. Reliability Analysis - “ALPHA” Model The initial part of the output contains descriptive statistics on each of the items (i.e. means

and standard deviations), an inter-item covariance8 matrix and an inter-item correlation matrix. These will be followed by descriptive statistics for the scale and the summary statistics.

****** Method 2 (covariance matrix) will be used for this analysis ****** R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A) Mean Std Dev Cases 1. ITEM1 .3571 .4972 14.0 2. ITEM2 .3571 .4972 14.0 3. ITEM3 .5714 .5136 14.0 4. ITEM4 .5714 .5136 14.0 5. ITEM5 .5714 .5136 14.0 6. ITEM6 .3571 .4972 14.0

7 Discussions and Explanations are in italics. These are not parts of the original computer outputs.

8 Covariance (Sxy) is defined as the average product of the deviations in X and Y, where a deviation is a distance from the mean. Its relation with the Pearson product-moment correlation coefficient is illustrated by the formula: mxy = Sxy .

Sx Sy

Page 10: Reliability Analysis

Dr. Robert Gebotys 2003

10R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R) Correlation Matrix ITEM1 ITEM2 ITEM3 ITEM4 ITEM5 ITEM1 1.0000 ITEM2 .6889 1.0000 ITEM3 .6455 .3443 1.0000 ITEM4 .3443 .3443 .4167 1.0000 ITEM5 .6455 .3443 .7083 .4167 1.0000 ITEM6 .3778 .6889 .3443 .3443 .3443 ITEM6 ITEM6 1.0000 It is shown in the above inter-item correlation matrix that the largest correlation coefficient

occurs between items 3 and 5 (i.e. r = .7083). Item 2 is also fairly highly correlated with both item 1 and item 6 (i.e. r in both cases are .6889). The lowest correlation coefficient is .3443, which occurs between a number of pairs of items (e.g. between item 1 and item 4, etc.)

R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R) R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)

N of Cases = 14.0

N ofStatistics for Mean Variance Std Dev Variables

Scale 2.7857 5.1044 2.2593 6

Item Means Mean Minimum Maximum Range Max/Min Variance.4643 .3571 .5714 .2143 1.6000 .0138

Item Variances Mean Minimum Maximum Range Max/Min Variance.2555 .2473 .2637 .0165 1.0667 .0001

Inter-itemCovariances Mean Minimum Maximum Range Max/Min Variance

.1190 .0879 .1868 .0989 2.1250 .0015

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

.4665 .3443 .7083 .3641 2.0575 .0234

Page 11: Reliability Analysis

11 The section of output reproduced above gives us descriptive statistics for the scale9 and

summary statistics for the items. From the above section, it can be seen that the average score for the scale is 2.7857 and the

standard deviation is 2.2593. The average score on an item is 0.4643, with a range of 0.2143 (i.e. maximum minus minimum). The average of the item variances is 0.2555, with a minimum of 0.2473 and a maximum of 0.2637. These show that the items in the scale have fairly comparable variances. The average covariance between the items is .119. The correlations between the items range from .3443 to .7083. The ratio between the largest and the smallest correlations is .7083/.3443, or 2.0575. The average correlation between the items is .4665.

The item-total summary statistics forms the next section of the output and is reproduced below:

Item-total Statistics

Scale Scale CorrectedMean Variance Item- Squared Alpha

if Item if Item Total Multiple if ItemDeleted Deleted Correlation Correlation Deleted

ITEM1 2.4286 3.4945 .7330 .7511 .7901ITEM2 2.4286 3.6484 .6364 .7469 .8095ITEM3 2.2143 3.5659 .6572 .6000 .8051ITEM4 2.2143 3.8736 .4784 .2533 .8404ITEM5 2.2143 3.5659 .6572 .6000 .8051ITEM6 2.4286 3.8022 .5440 .5733 .8273

Item-total Statistics For each item, the first column of the above set of statistics shows what the average score for

the scale would be if the item were excluded from the scale. For example, if item 1 were deleted from the scale, the mean score of the scale would be 2.4286. The next column in this set of statistics is the scale variance if the item were eliminated. The column labeled “Corrected Item-Total Correlation” is the Pearson correlation coefficient between the score on the individual item and the sum of the scores on the remaining items. For example, the smallest correlation reported is .4784, which occurs between the score on item 4 and the sum of the scores of items 1, 2, 3, 5 and 6. We can say that the relationship between item 4 and the other items is not very strong. Comparatively speaking, the relationship between item 1 and the other items is much stronger, with r = .7330.

Another way of looking at the relationship between an individual item and the rest of the scale

is to try to predict a person’s score on the item based on the scores obtained on the other

9 The scale in this case is formed by items 1 to 6. For each individual adolescent (or case), a score on the scale is computed by adding his/her scores on the six items.

Page 12: Reliability Analysis

Dr. Robert Gebotys 2003

12items. We can do this by calculating a multiple regression equation with the item of interest as the dependent variable and with all of the other items as independent variables. The multiple R2 from this regression equation is displayed for each of the items in the column labeled “Squared Multiple Correlation”. We can see that about 75% of the observed variability in the responses to item 1 can be explained by the other items. As expected, item 4 is less well predicted from the other items. Its multiple R2 is only .2533.

The final column “Alpha if Item Deleted” tells us how the reliability of the scale is affected by

each of the items. Six Cronbach’s "’s are reported in this column, each representing the Cronbach’s " of the scale when one item on the scale is removed. As will be shown later, the Cronbach’s " for the entire scale of 6 items is .8396. We can see from this column of statistics that removing item 4 from the scale causes " to increase from .8396 to .8404. On the other hand, eliminating any items other than item 4 from the scale will cause the " to decrease. If for some reason the scale must have to be shortened, then item 4 will logically be the first one to go. Conversely, it will be most undesirable to remove item 1 from the scale because " will decrease to .79 as a result.

The final results of the reliability analysis based on the “ALPHA” model is reported in the

final section of the output and is reproduced below: Reliability Coefficients 6 items Alpha = .8396 Standardized item alpha = .8399 Cronbach’s alpha is shown in the above output. The value is .8396 and can be regarded as

quite large. This indicates that the 6 - item scale is quite reliable. “Standardized item alpha” refers to the " that would be obtained if all of the items were standardized to have a variance of 1. Since there is not much variation among the variances of the 6 items in the scale10, there is therefore little difference between the two reported "’s. If items in the scale have widely differing variances, the two "’s may differ substantially.

b. Reliability Analysis - “SPLIT-HALF” Model R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R) The subcommand “statistics=scale” instructs the computer to produce the above output, while

the subcommand “summary=means . . . correlations” instructs the computer to perform and produce the following:

10 Please refer to the statistics reported in the section on “summary statistics on the items” as well as the discussion in that section. The variances of individual items can be computed by squaring the standard deviations reported for individual items in the initial section of the output.

Page 13: Reliability Analysis

13

R E L I A B I L I T Y A N A L Y S I S - S C A L E (S P L I T)

N of Cases = 14.0

N ofStatistics for Mean Variance Std Dev Variables

Part 1 1.2857 1.6044 1.2666 3Part 2 1.5000 1.3462 1.1602 3Scale 2.7857 5.1044 2.2593 6

Item Means Mean Minimum Maximum Range Max/Min VariancePart 1 .4286 .3571 .5714 .2143 1.6000 .0153Part 2 .5000 .3571 .5714 .2143 1.6000 .0153Scale .4643 .3571 .5714 .2143 1.6000 .0138

Item Variances Mean Minimum Maximum Range Max/Min VariancePart 1 .2527 .2473 .2637 .0165 1.0667 .0001Part 2 .2582 .2473 .2637 .0165 1.0667 .0001Scale .2555 .2473 .2637 .0165 1.0667 .0001

Inter-itemCovariances Mean Minimum Maximum Range Max/Min Variance

Part 1 .1410 .0879 .1703 .0824 1.9375 .0017Part 2 .0952 .0879 .1099 .0220 1.2500 .0001Scale .1190 .0879 .1868 .0989 2.1250 .0015

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

Part 1 .5596 .3443 .6889 .3446 2.0010 .0282Part 2 .3684 .3443 .4167 .0724 1.2103 .0014Scale .4665 .3443 .7083 .3641 2.0575 .0234

Please note that the descriptive statistics for the entire scale and the summary statistics over all items in the entire scale given in these sections of the computer output are identical to those produced in the corresponding sections of the output based on the “ALPHA” model of reliability analyses (check statistics on the “scale” row of corresponding sets of statistics). The significant feature of these sections of the output is that descriptive and summary statistics are given for each of the two parts of the scale, namely, Part 1 which is formed by items 1, 2 and 3, and Part 2 which is composed of items 4, 5 and 6. It is clearly evident that the two Parts have different means and standard deviations, as well as different item means, item variances, inter-item covariances and inter-item correlations.

Reliability Coefficients 6 items Correlation between forms = .7328 Equal-length Spearman-Brown = .8458

Guttman Split-half = .8439 Unequal-length Spearman-Brown = .8458

Alpha for part 1 = .7911 Alpha for part 2 = .6367

3 items in part 1 3 items in part 2

Page 14: Reliability Analysis

Dr. Robert Gebotys 2003

14 The above section of the output contains the results of reliability analysis based on the

“SPLIT-HALF” model. The correlation between the two halves (or parts), labeled on the output as “Correlation between forms”, is .7328. This is an estimate of the reliability of the scale if it has three items. The equal length Spearman-Brown coefficient, which has a value of .8458 in this case, tells us what the reliability of the entire scale would be if it was made up of two equal (or parallel) parts that have a three-item reliability of .7328. If the number of items on each of the two parts is not equal, the unequal length Spearman-Brown coefficient can be used to estimate the reliability of the overall scale. In the present example, since the two parts of the scale are of equal length, the two Spearman-Brown coefficients are identical. The Guttman split-half coefficient is another estimate of the reliability of the overall scale. It does not assume that the two parts are equally reliable or have the same variance, hence the reliability coefficient produced is smaller. Finally, separate values of Cronbach’s " are also shown for each of the two parts of the scale in the output.

c. Reliability Analysis - “EVEN/ODD” Model R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R) ****** Method 2 (covariance matrix) will be used for this analysis ******

R E L I A B I L I T Y A N A L Y S I S - S C A L E (S P L I T)

Mean Std Dev Cases

1. ITEM1 .3571 .4972 14.02. ITEM3 .5714 .5136 14.03. ITEM5 .5714 .5136 14.04. ITEM2 .3571 .4972 14.05. ITEM4 .5714 .5136 14.06. ITEM6 .3571 .4972 14.0

Covariance Matrix

Correlation Matrix

ITEM1 ITEM3 ITEM5 ITEM2 ITEM4ITEM1 1.0000ITEM3 .6455 1.0000ITEM5 .6455 .7083 1.0000ITEM2 .6889 .3443 .3443 1.0000ITEM4 .3443 .4167 .4167 .3443 1.0000ITEM6 .3778 .3443 .3443 .6889 .3443

ITEM6ITEM6 1.0000

Page 15: Reliability Analysis

15 The additional statistics produced in this section of the output are basically similar to those shown in

corresponding sections of the output based on the “ALPHA” model. The only difference is that as a result of reordering the items in the scale, the statistics are displayed differently here.

N of Cases = 14.0

N ofStatistics for Mean Variance Std Dev Variables

Part 1 1.5000 1.8077 1.3445 3Part 2 1.2857 1.4505 1.2044 3Scale 2.7857 5.1044 2.2593 6

Item Means Mean Minimum Maximum Range Max/Min VariancePart 1 .5000 .3571 .5714 .2143 1.6000 .0153Part 2 .4286 .3571 .5714 .2143 1.6000 .0153Scale .4643 .3571 .5714 .2143 1.6000 .0138

Item Variances Mean Minimum Maximum Range Max/Min VariancePart 1 .2582 .2473 .2637 .0165 1.0667 .0001Part 2 .2527 .2473 .2637 .0165 1.0667 .0001Scale .2555 .2473 .2637 .0165 1.0667 .0001

Inter-itemCovariances Mean Minimum Maximum Range Max/Min Variance

Part 1 .1722 .1648 .1868 .0220 1.1333 .0001Part 2 .1154 .0879 .1703 .0824 1.9375 .0018Scale .1190 .0879 .1868 .0989 2.1250 .0015

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

Part 1 .6664 .6455 .7083 .0628 1.0973 .0011Part 2 .4591 .3443 .6889 .3446 2.0010 .0317Scale .4665 .3443 .7083 .3641 2.0575 .0234

The above section of the output looks similar to the corresponding section produced under the

“SPLIT-HALF model. In fact, the descriptive and summary statistics reported in these two outputs for the entire scale are identical. However, descriptive and summary statistics for corresponding parts of the scale reported in the two outputs are not the same. The differences originate from the fact that the compositions of Part 1 and Part 2 are altered in the present analysis, i.e., Part 1 is made up of items 1, 3 and 5; and Part 2 is composed of items 2, 4 and 6.

Page 16: Reliability Analysis

Dr. Robert Gebotys 2003

16 Item-total Statistics Item-total Statistics

Scale Scale CorrectedMean Variance Item- Squared Alpha

if Item if Item Total Multiple if ItemDeleted Deleted Correlation Correlation Deleted

ITEM1 2.4286 3.4945 .7330 .7511 .7901ITEM3 2.2143 3.5659 .6572 .6000 .8051ITEM5 2.2143 3.5659 .6572 .6000 .8051ITEM2 2.4286 3.6484 .6364 .7469 .8095ITEM4 2.2143 3.8736 .4784 .2533 .8404ITEM6 2.4286 3.8022 .5440 .5733 .8273

The item-total statistics reported in the present analysis are exactly the same as those reported under the “ALPHA” model, with the only exception that the statistics are arranged differently. Again this is a direct result of reordering the items in the scale.

Reliability Coefficients 6 items

Correlation between forms = .5700 Equal-length Spearman-Brown = .7262

Guttman Split-half = .7234 Unequal-length Spearman-Brown = .7262

Alpha for part 1 = .8571 Alpha for part 2 = .7159

3 items in part 1 3 items in part 2 The above are the results of the reliability analysis based on the “EVEN/ODD” model. Please note

that the correlation coefficient between the parts formed respectively by even and odd items is smaller than the correlation reported in the “SPLIT-HALF” model (i.e. .5700 compared with .7328 in the “SPLIT-HALF” model). As a result, the Spearman-Brown coefficients reported in this analysis are comparatively smaller (i.e. .7262 against .8458). This illustrative example shows that “split-half” reliability analyses are capable of producing different reliability estimates on the same scale, depending on the methods researchers used in splitting items in the scale.

Determining Reliability Using SPSS: Example 2: The following questionnaire was developed by a researcher as part of an effort to collect participants’

feedback on a five-week community-based program designed to teach individuals disease prevention

Page 17: Reliability Analysis

17and to encourage healthier lifestyles. The questionnaire contained six items. Respondents were asked to respond to each item according to the following scale:

Strongly Agree No Opinion Disagree Strongly Agree Disagree 1 2 3 4 5 The 6 items in the questionnaire were: 1. The goals of the program are clear. 2. I feel comfortable in discussing my plans, concerns and experiences with the group. 3. The materials covered in the program are helpful. 4. The health contract is useful in assisting me to make healthy lifestyle changes. 5. Overall speaking, the group is supportive. 6. Overall, the program is useful in assisting me develop positive changes towards healthy lifestyles. The following is the data obtained from 10 participants: Items Person 1 2 3 4 5 6 1 2 3 1 3 4 2 2 1 2 1 1 3 1 3 4 3 4 5 3 3 4 5 3 2 4 3 2 5 2 1 2 2 1 1 6 3 3 1 3 3 1 7 4 5 2 3 4 2 8 2 1 2 2 1 1 9 2 2 2 2 2 2 10 3 4 2 5 4 2

Conducting Cronbach’s Alpha; Split-Half Reliability & Even-Odd Reliability Analyses on the Set of Scores Obtained from 10 Respondents for the 6 Items Using SPSS 1. SPS

S Computer Program
Page 18: Reliability Analysis

Dr. Robert Gebotys 2003

18 2. SPSS Outputs and Results11 a. Reliability Analysis - “Alpha” Model ****** Method 2 (covariance matrix) will be used for this analysis ******

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)

Mean Std Dev Cases

1. ITEM1 2.8000 1.2293 10.02. ITEM2 2.7000 1.2517 10.03. ITEM3 1.9000 .8756 10.04. ITEM4 3.0000 1.3333 10.05. ITEM5 2.8000 1.1353 10.06. ITEM6 1.7000 .6749 10.0

Correlation Matrix

ITEM1 ITEM2 ITEM3 ITEM4 ITEM5

ITEM1 1.0000ITEM2 .6066 1.0000ITEM3 .4955 .0710 1.0000ITEM4 .7457 .5992 .5710 1.0000ITEM5 .3662 .8914 -.1341 .5138 1.0000ITEM6 .5892 .5392 .6956 .7408 .4930

ITEM6

ITEM6 1.0000

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)N of Cases = 10.0

N ofStatistics for Mean Variance Std Dev Variables

Scale 14.9000 25.8778 5.0870 6

Item Means Mean Minimum Maximum Range Max/Min Variance2.4833 1.7000 3.0000 1.3000 1.7647 .2937

Item Variances Mean Minimum Maximum Range Max/Min Variance1.2278 .4556 1.7778 1.3222 3.9024 .2621

Inter-item

11 Only sections of outputs relevant to the purposes and needs of our present analysis will be reproduced below. Brief discusions are in italics and they are not parts of the original computer outputs.

Page 19: Reliability Analysis

19Covariances Mean Minimum Maximum Range Max/Min Variance

.6170 -.1333 1.2667 1.4000 -9.5000 .1434

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

.5189 -.1341 .8914 1.0255 -6.6457 .0651

Item-total Statistics

Scale Scale CorrectedMean Variance Item- Squared Alpha

if Item if Item Total Multiple if ItemDeleted Deleted Correlation Correlation Deleted

ITEM1 12.1000 16.9889 .7281 .7344 .8192ITEM2 12.2000 16.8444 .7267 .9060 .8196ITEM3 13.0000 22.0000 .3788 .8634 .8750ITEM4 11.9000 15.4333 .8273 .7752 .7973ITEM5 12.1000 18.9889 .5660 .9363 .8499ITEM6 13.2000 20.6222 .7830 .8531 .8311

Reliability Coefficients 6 items

Alpha = .8584 Standardized item alpha = .8662

The Cronbach’s Alpha reported in the above analysis is .8584. This indicates that the 6-item

questionnaire is quite reliable. The last column in the Item-total Statistics indicates that removing item 4 from the questionnaire will lead to a drop of Cronbach’s " from .8584 to .7973; while removing item 3 from the questionnaire will lead to an increase of Cronbach’s " from .8584 to .8750.

b. Reliability Analysis - “SPLIT-HALF” Model R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R)

R E L I A B I L I T Y A N A L Y S I S - S C A L E (S P L I T)

N of Cases = 10.0N of

Statistics for Mean Variance Std Dev VariablesPart 1 7.4000 6.9333 2.6331 3Part 2 7.5000 7.1667 2.6771 3Scale 14.9000 25.8778 5.0870 6

Item Means Mean Minimum Maximum Range Max/Min VariancePart 1 2.4667 1.9000 2.8000 .9000 1.4737 .2433Part 2 2.5000 1.7000 3.0000 1.3000 1.7647 .4900Scale 2.4833 1.7000 3.0000 1.3000 1.7647 .2937

Page 20: Reliability Analysis

Dr. Robert Gebotys 2003

20

Item Variances Mean Minimum Maximum Range Max/Min VariancePart 1 1.2815 .7667 1.5667 .8000 2.0435 .1995Part 2 1.1741 .4556 1.7778 1.3222 3.9024 .4470Scale 1.2278 .4556 1.7778 1.3222 3.9024 .2621

Inter-itemCovariances Mean Minimum Maximum Range Max/Min Variance

Part 1 .5148 .0778 .9333 .8556 12.0000 .1466Part 2 .6074 .3778 .7778 .4000 2.0588 .0341Scale .6170 -.1333 1.2667 1.4000 -9.5000 .1434

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

Part 1 .3910 .0710 .6066 .5356 8.5474 .0639Part 2 .5825 .4930 .7408 .2478 1.5026 .0151Scale .5189 -.1341 .8914 1.0255 -6.6457 .0651

Reliability Coefficients 6 items

Correlation between forms = .8354 Equal-length Spearman-Brown = .9103

Guttman Split-half = .9103 Unequal-length Spearman-Brown = .9103

Alpha for part 1 = .6683 Alpha for part 2 = .7628

3 items in part 1 3 items in part 2 RELIABILITY COEFFICIENTS 6 ITEMS The Spearman-Brown results reported in the output of reliability analysis based on the “SPLIT-

HALF “ model indicate that the reliability of the entire scale/questionnaire is .9103 if it is made up of two equal (or parallel) parts that have a three-item reliability of .8354 each. Separate values of Cronbach’s "s are shown for each of the two parts of the scale/questionnaire, i.e. Cronbach’s " for the first half is .6683 and that for the second half is .7628.

c. Reliability Analysis - “EVEN/ODD” Model R E L I A B I L I T Y A N A L Y S I S - S C A L E (T E S T S C O R)

N of Cases = 10.0

N ofStatistics for Mean Variance Std Dev Variables

Part 1 7.5000 5.3889 2.3214 3Part 2 7.4000 8.0444 2.8363 3Scale 14.9000 25.8778 5.0870 6

Page 21: Reliability Analysis

21

Item Means Mean Minimum Maximum Range Max/Min VariancePart 1 2.5000 1.9000 2.8000 .9000 1.4737 .2700Part 2 2.4667 1.7000 3.0000 1.3000 1.7647 .4633Scale 2.4833 1.7000 3.0000 1.3000 1.7647 .2937

Item Variances Mean Minimum Maximum Range Max/Min VariancePart 1 1.1889 .7667 1.5111 .7444 1.9710 .1460Part 2 1.2667 .4556 1.7778 1.3222 3.9024 .5046Scale 1.2278 .4556 1.7778 1.3222 3.9024 .2621

Inter-itemCovariances Mean Minimum Maximum Range Max/Min Variance

Part 1 .3037 -.1333 .5333 .6667 -4.0000 .1147Part 2 .7074 .4556 1.0000 .5444 2.1951 .0603Scale .6170 -.1333 1.2667 1.4000 -9.5000 .1434

Inter-itemCorrelations Mean Minimum Maximum Range Max/Min Variance

Part 1 .2425 -.1341 .4955 .6296 -3.6942 .0885Part 2 .6264 .5392 .7408 .2016 1.3738 .0086Scale .5189 -.1341 .8914 1.0255 -6.6457 .0651

Reliability Coefficients 6 items

Correlation between forms = .9450 Equal-length Spearman-Brown = .9717

Guttman Split-half = .9618 Unequal-length Spearman-Brown = .9717

Alpha for part 1 = .5072 Alpha for part 2 = .7914

3 items in part 1 3 items in part 2 RELIABILITY COEFFICIENTS 6 ITEMS The Spearman-Brown coefficient reported in the results of reliability analysis based on the “EVEN-

ODD” model is .9717, indicating that the 6 - item questionnaire is very reliable. This Spearman-Brown coefficient is even higher than that reported in the “SPLIT-HALF” model. The correlation coefficient between the parts formed respectively by even and odd items is also larger than the correlation reported in the “SPLIT-HALF” model (i.e., .9450 compared with .8354). The Cronbach’s " for the first part of the questionnaire (i.e. which is made up of odd items) is .5072, while that for the second part is .7914.

Page 22: Reliability Analysis

Dr. Robert Gebotys 2003

22

Part Five: Using SPSS for Windows to Implement Reliability Analyses

The following section will outline the steps necessary in undertaking three different forms of ‘single test administration’ reliability analyses: (1) Cronbach alpha, (2) Split-half, and (3) Even-odd. For further discussion of these reliability measures, students are encouraged to consult Bob Gebotys’ “Handout on Reliability.” 1.1 Conducting a Reliability Analysis using the Cronbach Alpha (αααα) Measure For this analysis we will use the data regarding adolescent attitudes toward physical aggression, as outlined on page 6 of Gebotys’ “Handout on Reliability.” In order to conduct this analysis, the following steps are required. 1. Enter the aforementioned data set into an SPSS Data Editor Window (see Section 2.1 for

instructions, if necessary). 2. Next, click Statistics on the main menu bar, followed by Scale, and then Reliability

Analysis… This series of clicks will open a Reliability Analysis dialogue box similar to the one shown below.

3. You should note that all of the variables (all Items) are listed in the text box at the left-hand

side of the dialog box. Take your cursor and click on “item1.” Keeping your finger depressed on the left button of the mouse, scroll your mouse downward until all variables (i.e., item1 through item6) are highlighted. Once they are highlighted, click the right arrow button (<) in the centre of the dialog box to move the selected variables into the ‘Items:’ text box.

4. Next, check to see that text in the ‘Model:’ text box reads “Alpha.” If it does not, click on the downward arrow (?) to the right of the text box and select Alpha from the list that appears.

Page 23: Reliability Analysis

235. Next, click on the Statistics… pushbutton, which will open a ‘Reliability Analysis:

Statistics’ subdialog box similar to the one below.

6. Next, select (i.e., click on) all options under ‘Descriptives for’ (i.e., item, scale, scale if item

deleted), ‘Summaries’ (i.e., means, vaiances, covariances, correlations), and ‘Inter-item’ (i.e., correlations, covariances). These are the primary statistics that you will need to interpret your reliability analyses. If, however, you would like further statistics, such as the ‘F test’ and ‘Hotelling’s T-square,’ you can make these selections from options in this subdialog box.

7. Once you have made your selections, click the Continue command pushbutton at the top right-hand corner of the subdialog box. This will return you to the ‘Reliability Analysis: Statistics’ subdialog box.

8. You have now completed all the necessary steps in specifying the reliability procedure. If you would like to examine the SPSS syntax for this procedure, please read the note below. If you would like to run this procedure now, without examining the syntax, click the OK command pushbutton at the top right-hand corner of the dialog box.

Note: If you would like to examine the SPSS syntax for this procedure, click on the Paste command pushbutton to open an SPSS Syntax Window. The syntax window should then resemble the one below. In order to run this syntax and complete the reliability analysis, click Run on the menu bar, followed by All.

Page 24: Reliability Analysis

Dr. Robert Gebotys 2003

24

Once you have run the Cronbach Alpha reliability procedure, the results should appear in an SPSS Viewer window similar to the one shown below.

At this stage it is recommended that you save and print the contents of the SPSS Viewer window. The steps that you take to save and print this reliability analysis are identical to the steps taken to save and print the Scatterplot, as outlined in section 2.4 of this guide. Your output should resemble the information on the following pages.

Page 25: Reliability Analysis

25 5.2 Conducting the Split-Half Reliability Analysis

Please note that the steps necessary for conducting the Split-Half reliability analysis are almost identical to the procedures outlined above for the Cronbach Alpha analysis. The only difference when using SPSS for Windows is that you must specify the “Split-Half” model instead of the “Alpha” model in the Reliability Analysis dialog box. Therefore, for the Split-Half model, step #4 should read as follows:

#4. Next, check to see that text in the ‘Model:’ text box reads “Split-half.” If it does not, click on the downward arrow to the right of the text box and select Split-half from the list that appears.

The SPSS Syntax window for the Split-half analysis should resemble the example below.

Again, it is recommended that you save and print the contents of the SPSS Viewer window. Your output should resemble the output on the following pages.

Page 26: Reliability Analysis

Dr. Robert Gebotys 2003

26

1.2 Conducting the Even-Odd Reliability Analysis Unlike the Cronbach Alpha and Split-Half models, the Even-Odd method of assessing reliability cannot be accessed using the “point and click” approach in SPSS. In order to utilize the Even-Odd option, one needs to modify the syntax file for the Split-Half model. More specifically, the order of the items examined needs to be changed so that the odd items form the first part of the scale and the even items form the remaining part. Therefore, your first step should be to follow the instructions noted above for undertaking the Split-Half model, but then be sure to click the Paste command pushbutton in the Reliability Analysis dialog box in order to open the SPSS Syntax window. Recall that the syntax for the Split-Half model contains the following information: RELIABILITY /VARIABLES=item1 item2 item3 item4 item5 item6 /FORMAT=NOLABELS /SCALE(SPLIT)=ALL/MODEL=SPLIT /STATISTICS=DESCRIPTIVE SCALE HOTELLING CORR COV ANOVA /SUMMARY=TOTAL MEANS VARIANCE COV CORR . For the Even-Odd model, you will need to make the following change to line 4 of the syntax file (note: the part to be changed is in bold):

Before: /SCALE(SPLIT)=ALL/MODEL=SPLIT After: /SCALE(SPLIT)= item1 item3 item5 item2 item4 item6/MODEL=SPLIT

The entire syntax file should now read: RELIABILITY /VARIABLES=item1 item2 item3 item4 item5 item6 /FORMAT=NOLABELS /SCALE(SPLIT)= item1 item3 item5 item2 item4 item6/MODEL=SPLIT /STATISTICS=DESCRIPTIVE SCALE HOTELLING CORR COV ANOVA /SUMMARY=TOTAL MEANS VARIANCE COV CORR . Once you have made the change noted above, click Run on the menu bar followed by All. The analysis output should then appear in an SPSS Viewer window. You should then proceed to save and print the analysis. Your output should resemble the information provided on the following pages.

Page 27: Reliability Analysis

27