coefficient alpha and beyond: issues and alternatives for educational research

5
BRIEF REPORT Coefficient Alpha and Beyond: Issues and Alternatives for Educational Research Timothy Teo Xitao Fan Published online: 28 March 2013 Ó De La Salle University 2013 Abstract Cronbach’s coefficient alpha has been widely known and used in educational research. Many education research practitioners, however, may not be aware of the potential issues when the main assumptions for coefficient alpha are violated in research practice. This paper provides a brief discussion about two assumptions that may make the use and interpretation of coefficient alpha less appro- priate in education research: violations of the tau-equivalence model assumption and the error independence assumption. Violation of either or both of these assumptions will have negative effects on the precision of coefficient alpha as reli- ability estimate. The paper further presents two alternative reliability estimates without the assumptions of tau-equiva- lence or error independence. Research practitioners may consider these and other alternatives, when measurement data may not satisfy the assumptions for coefficient alpha. Keywords Cronbach coefficient alpha Reliability Internal consistency Tau-equivalence Independent errors Latent variable modeling Generalizability theory Introduction In educational research, data in the form of a composite score consisting of multiple items or subscale scores are often obtained and used. By doing so, researchers believe that the item/subtest scores have an acceptable level of ‘‘inter-connectedness’’, and that each item contributes to the overall measurement of a construct. Reliability is a key attribute in educational measurement, and it is generally considered as a necessary condition for measurement validity. Conceptually, reliability represents the portion of the total score variance that is attributed to the true score variance (Revelle and Zinbarg 2009). True score variance, however, cannot be directly obtained or calculated. As a result, we cannot calculate measurement reliability itself; instead, we can only estimate measurement reliability in a given measurement situation. In educational research, a very widely used estimate of reliability for a multi-item instrument is the Cronbach’s coefficient alpha (a). Cronbach Coefficient Alpha (a) and Issues in its Use Coefficient a (Cronbach 1951), which was based on a coefficient by Guttman (1945), is by far the most popular estimate for internal consistency reliability of a scale (Peterson 1994). With boundaries between 0 and 1, a is expressed as the following (Crocker and Algina 2006): a ¼ k k 1 1 P r 2 i r 2 x where k is the number of items, P r 2 i is the sum of item variances, and r 2 x is the total variance of the scale. Although not obvious from the formula, coefficient a represents the estimated ratio of true score variance to total T. Teo (&) Faculty of Education, School of Learning Development and Professional Practice, University of Auckland, Private Bag 92019, Auckland 1010, New Zealand e-mail: [email protected] X. Fan University of Macau, Macau, China 123 Asia-Pacific Edu Res (2013) 22(2):209–213 DOI 10.1007/s40299-013-0075-z

Upload: xitao-fan

Post on 13-Dec-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

BRIEF REPORT

Coefficient Alpha and Beyond: Issues and Alternativesfor Educational Research

Timothy Teo • Xitao Fan

Published online: 28 March 2013

� De La Salle University 2013

Abstract Cronbach’s coefficient alpha has been widely

known and used in educational research. Many education

research practitioners, however, may not be aware of the

potential issues when the main assumptions for coefficient

alpha are violated in research practice. This paper provides

a brief discussion about two assumptions that may make

the use and interpretation of coefficient alpha less appro-

priate in education research: violations of the tau-equivalence

model assumption and the error independence assumption.

Violation of either or both of these assumptions will have

negative effects on the precision of coefficient alpha as reli-

ability estimate. The paper further presents two alternative

reliability estimates without the assumptions of tau-equiva-

lence or error independence. Research practitioners may

consider these and other alternatives, when measurement data

may not satisfy the assumptions for coefficient alpha.

Keywords Cronbach coefficient alpha � Reliability �Internal consistency � Tau-equivalence �Independent errors � Latent variable modeling �Generalizability theory

Introduction

In educational research, data in the form of a composite

score consisting of multiple items or subscale scores are

often obtained and used. By doing so, researchers believe

that the item/subtest scores have an acceptable level of

‘‘inter-connectedness’’, and that each item contributes to

the overall measurement of a construct. Reliability is a key

attribute in educational measurement, and it is generally

considered as a necessary condition for measurement

validity. Conceptually, reliability represents the portion of

the total score variance that is attributed to the true score

variance (Revelle and Zinbarg 2009). True score variance,

however, cannot be directly obtained or calculated. As a

result, we cannot calculate measurement reliability itself;

instead, we can only estimate measurement reliability in a

given measurement situation. In educational research, a

very widely used estimate of reliability for a multi-item

instrument is the Cronbach’s coefficient alpha (a).

Cronbach Coefficient Alpha (a) and Issues in its Use

Coefficient a (Cronbach 1951), which was based on a

coefficient by Guttman (1945), is by far the most popular

estimate for internal consistency reliability of a scale

(Peterson 1994). With boundaries between 0 and 1, a is

expressed as the following (Crocker and Algina 2006):

a ¼ k

k � 1

� �1�

Pr2

i

r2x

� �

where k is the number of items,P

r2i is the sum of item

variances, and r2x is the total variance of the scale.

Although not obvious from the formula, coefficient arepresents the estimated ratio of true score variance to total

T. Teo (&)

Faculty of Education, School of Learning Development

and Professional Practice, University of Auckland,

Private Bag 92019, Auckland 1010, New Zealand

e-mail: [email protected]

X. Fan

University of Macau, Macau, China

123

Asia-Pacific Edu Res (2013) 22(2):209–213

DOI 10.1007/s40299-013-0075-z

score variance (Lord and Novick 1968). As another side of

the coin, we may say that coefficient a estimates the

amount of measurement error: as the measurement error

(error variance) decreases, the coefficient a will increase.

Despite the popularity of coefficient a, its proper use and

interpretation may not be clearly understood. In research

practice, two assumptions of coefficient a are often violated

(Gignac et al. 2007). First, coefficient a is grounded on a

tau-equivalence model. This means that all items have

equal discriminating power, which may be represented by

equal factor loadings for all items under the factor analytic

model (McDonald 1999). In addition, tau-equivalence

assumption also requires unidimensionality of the item set,

as argued by Green and Yang (2009). If the tau-equiva-

lence condition is not satisfied, coefficient a is a lower

bound of reliability estimate, and it tends to underestimate

the true reliability to some extent (Raykov and Marcoulides

2011). In situations where a scale is multidimensional in

nature (e.g., comprising more than one factor), thus vio-

lating the tau-equivalence assumption, coefficient a does

not provide an accurate estimate of the reliability of a scale.

For example, one might have a multidimensional measure

for students’ e-learning acceptance in higher education

(Teo 2010), where the measure has three dimensions of

e-learning acceptance (tutor quality, perceived usefulness,

and facilitating conditions). Here, one might be interested

in measuring the overall e-learning acceptance as repre-

sented by the sum of the scores from all three dimensions,

as well as in measuring the e-learning acceptance for the

three dimensions separately. Because of the multidimen-

sionality nature of the scale, the coefficient a for the total

scale scores will tend to underestimate the true reliability.

In practice, the tau-equivalence and unidimensionality

assumption are rarely achieved in research (Kamata et al.

2003). This is because it is rare to have all items on a scale

having equal loadings on a single factor (Reuterberg and

Gustafsson 1992), and there is little chance for all items to

contribute exactly the same amount of variance to a single

factor (Cortina 1993). In addition, it has been noted that the

scales used in research practice are not likely to be unidi-

mensional; if they were truly unidimensional, they would

be either too narrow or trivial, thus limiting its utility value

in research (Reise et al. 2007).

The second assumption for coefficient a is that the error

terms of the items on a scale are not correlated with each

other (i.e., independent errors/residuals). When this

assumption is violated, the precision of coefficient a as an

estimate of reliability will suffer. When correlated errors

occur, they are likely to be positive, and coefficient a tends to

overestimate reliability (MacDougall 2011). Although in

some situations, this assumption may be empirically verified

(e.g., via confirmatory factor analysis, CFA), the violation of

such assumptions may not be obvious or easy to check. There

are many measurement situations where this assumption of

error independence could be violated, including using sub-

sets of items on a scale that are associated with same or

similar stimuli (Steinberg 2001), using the ordered items,

using similarly worded items adjacent to each other

(Green and Hershberger 2000), etc. Correlated errors may

also occur in research involving self-report data, due to

possible response set bias (e.g., acquiescence bias; Paro et al.

2010), or some other reasons (MacDougall 2011).

When the assumptions discussed above are not met,

coefficient a may be a biased estimate of internal consis-

tency reliability. However, the degree and the direction

(negative or positive) of bias is dependent on the degree to

which the tau-equivalency and uncorrelated errors

assumptions are violated in combination (Yang and Green

2011). In light of these issues on the proper use and inter-

pretation of coefficient a, there has been a considerable

amount of discussion in the measurement literature about

viable alternative approaches and methods (e.g., Cortina

1993; Raykov 2001; Revelle and Zinbarg 2009; Yang and

Green 2011; Zumbo et al. 2007). As a result, a sizable

number of alternatives, which may provide more accurate

estimates of reliability for data that may not meet these

assumptions, have been proposed, ranging from stand-alone

coefficients to those based on more elaborate modeling

approaches/methods (e.g., Hancock and Mueller 2001;

McDonald 1970; Raju 1970; Revelle and Zinbarg 2009;

Sijtsma 2009a; Yang and Green 2011; Zumbo et al. 2007).

For education research practitioners without expertise in

this or related areas, it is difficult to sift through the

extensive literature on this and related issues. Even for

those with measurement expertise, it is challenging to fully

comprehend the extensive literature, and the number of

possible alternatives proposed, as the literature in this area

is both extensive and somewhat fragmented, and the dis-

tinctions among many proposed alternatives are not always

clear. In this brief paper, for the benefit of education

research practitioners, we discuss two alternatives that

could be considered when the violations for the assump-

tions of coefficient alpha (a) are suspected. These alter-

natives were formulated within the contexts of principal

component analysis (coefficient theta, h) and latent vari-

able modeling (coefficient omega, x).

Coefficient Theta (h)

Introduced by Armor (1974), coefficient h was developed

to address the issue of multidimensionality in a scale.

Based on a principal components model, a composite score

is created by using the optimal weights obtained from

principal component analysis. Coefficient h for a single

factor solution is computed from the following:

210 T. Teo, X. Fan

123

h ¼ k

k � 1

� �1� 1

k

� �

where k is the number of items in the scale, and k is the first

(largest) eigenvalue from a principal components model

analysis (Carmines and Zeller 1982). Coefficient h is com-

putationally simple, and it takes into account the largest

eigenvalue that explains the largest amount of variance in the

scores. With a relatively long history, the coefficient h enjoys

the benefits of easy understanding and of easy computation. It

is not part of any statistical analysis software packages (e.g.,

SPSS, SAS), but it can be easily obtained by implementing a

principal component analysis (or exploratory factor analysis

with the default options) on the correlations of the items, and

then using the largest eigenvalue for easy computation. As

coefficient h does not have the assumption for unidimen-

sionality, it is readily applicable in measurement situations

where multidimensionality exists or is highly probable.

Coefficient Omega (x)

Introduced by McDonald (1970), coefficient x is based on

the parameter estimates of the items (or subscales) in a factor

model. Conceptually, coefficient x represents the ratio of the

variance due to the common attribute (e.g., factor) to the total

variance (McDonald 1999). It does not assume a tau-equiv-

alent model or independent item residuals. In the analysis of

a scale, there are situations where correlated errors are found

among items of some factors but not of others. For this rea-

son, two versions of coefficient x were proposed (Gignac

2009), known as coefficient xA and coefficient xB.

Coefficient xA is used in situations where the error

terms of items on a factor are not correlated, and it may be

formulated as the following (Hancock and Mueller 2001):

xA ¼Pk

i¼1 ki

� �2

Pki¼1 ki

� �2

þPk

i¼1 dii

where ki is standardized factor loading and dii is stan-

dardized error variance (i.e., dii ¼ 1� k2i ).

The second omega coefficient, xB, is used in cases

where the error terms of the items on a factor are correlated

(Raykov 2001), and it is formulated as the following:

xB ¼Pk

i¼1 ki

� �2

Pki¼1 ki

� �2

þPk

i¼1 dii þ 2P

1� i\j� k dij

where ki and dii are defined as above, and dij represents the

correlation between item error terms.

Because of the difficulty of meeting the assumptions of

coefficient a in practice, it can be argued that the latent

variable approach to the estimation of internal consistency

reliability (e.g., coefficient x) may be more appropriate.

Despite the fact that this approach has been well

established, relatively few empirical studies have used

xA and xB in research practice (Gignac et al. 2007). Given

the increasingly pervasive use of structural equation mod-

eling (Matsueda 2012), it could be expected that the use of

coefficient x would increase among researchers. However,

there are various reasons that would limit the use of

the coefficient x. First, it is not something included in

statistical packages (e.g., SPSS, SAS), thus limiting its

availability to research practitioners. Second, the use of

these coefficients requires additional analytical expertise

and skills (i.e., confirmatory factor analysis). Third,

because coefficient x is based on the parameter estimates

in a factor model, it is not advisable for use where poor

model-data fit is found in confirmatory factor analysis

(McDonald 1999). Finally, there may also be additional

analytical difficulties and/or considerations in modeling

item level data (e.g., Bernstein and Teng 1989; Gorsuch

1988; McLeod et al. 2001; Nunnaly and Bernstein

1994), which could limit the use of coefficient x, or

other modeling based reliability coefficients in some

research situations. For example, in using SEM/CFA

approach in modeling item-level data for reliability

analysis, the appropriate interpretation of the modeling

results requires that the model-data fit is acceptable

(Yang and Green 2011). In practice, in modeling item-

level data, good model-data fit may not always be

guaranteed.

The readers should be aware that, in addition to these

two alternatives discussed above, there exist a sizable

number of other alternatives and approaches (see, for

example, Meyer 2010; Revelle and Zinbarg 2009; Raykov

and Shrout 2002; Zinbarg et al. 2005; Zumbo et al. 2007)

designed for the same or similar purposes. These different

alternatives were developed based on different methodo-

logical rationales and assumptions. As we alluded to pre-

viously, it would be difficult for research practitioners to

fully understand the nuances and differences among

all the alternatives. It is beyond the scope and purpose of

this paper to present the details and comparisons of the

many other alternatives, but interested readers may consult

other sources for the technical details of many other

alternatives (e.g., Gadermann et al. 2012; Meyer 2010;

Raykov and Marcoulides 2011; Revelle and Zinbarg 2009;

Sijtsma 2009a, b; Zinbarg et al. 2005).

Conclusion

It is important to evaluate data reliability in any research.

Although coefficient a is a commonly used reliability

Coefficient Alpha and Beyond 211

123

estimate, there are potential issues associated with its

appropriate use and interpretation. The violation of the

assumptions of the tau-equivalency and independence of

error terms in many research situations may render the use

and interpretation of coefficient a less proper and less

accurate. Despite these potential issues, coefficient a has

been widely used in the literature, partially as a result of

some research practitioners’ insufficient understanding

about these potential issues. With the increasing popularity,

and the increasing availability of factor analysis and latent

variable modeling approaches, other alternative coeffi-

cients, such as coefficient h and coefficient x discussed

above, can be expected to become more viable alternatives

to coefficient a, especially as more research practitioners

come to the realization that the major assumptions of

coefficient alpha (i.e., tau-equivalence and independent

errors) may not hold in most measurement situations. In

such situations, coefficient h and coefficient x (or other

alternatives not presented in this paper) should serve as

viable alternatives to coefficient a.

It is important to point out that there exist different

measurement error sources in different measurement situ-

ations. Coefficient a and the alternatives discussed in this

brief paper only estimate the reliability of a composite

score, and the measurement error source in question is

internal inconsistency. In a measurement situation when

other measurement error sources are our concerns (e.g.,

consistency across raters, or consistency across time),

coefficient a or its alternatives are not appropriate; instead,

other reliability estimates are needed, such as inter-rater

reliability, test–retest reliability. It is also important to note

that, under the reliability framework of the classical test

theory, typically, each type of reliability estimate (e.g.,

internal consistency, inter-rater, or test–retest) only pro-

vides information about the measurement error from one

source. Thus, if a measurement requires that a rater rate the

object of measurement on multiple items on the same scale,

potentially, there are at least two measurement error

sources: internal consistency across multiple items, and

inter-rater consistency across different raters. In this situ-

ation, knowing coefficient a for the multiple items does

not inform us about the degree of inconsistency across

different raters. For such situations involving multiple mea-

surement error sources, generalizability theory (G-theory)

is the only framework under which multiple error sources

can be handled simultaneously. Under the generalizabil-

ity theory, with an appropriate design, measurement errors

from multiple error sources can be estimated simulta-

neously (Brennan 2001; Cronbach et al. 1963; Meyer

2010; Shavelson and Webb 1991), thus providing a more

comprehensive approach for estimating measurement reli-

ability.

References

Armor, D. J. (1974). Theta reliability and factor scaling. In H. Costner

(Ed.), Sociological methodology (pp. 17–50). San Francisco:

Jossey-Bass.

Bernstein, I. H., & Teng, G. (1989). Factoring items and factoring

scales are different: Spurious evidence for multidimensionality

due to item categorization. Psychological Bulletin, 105,

467–477.

Brennan, R. L. (2001). Generalizability theory. New York: Springer-

Verlag.

Carmines, E. G., & Zeller, R. A. (1982). Reliability and validityassessment. Beverly Hills: Sage Publications.

Cortina, J. M. (1993). What is coefficient alpha? An examination of

theory and applications. Journal of Applied Psychology, 78,

98–104.

Crocker, L., & Algina, J. (2006). Introduction to classical and moderntest theory. Fort Worth: Wadsworth.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of

tests. Psychometrika, 16, 297–334.

Cronbach, L. J., Nageswari, R., & Gleser, G. C. (1963). Theory of

generalizability: A liberation of reliability theory. The BritishJournal of Statistical Psychology, 16, 137–163.

Gadermann, A. M., Guhn, M., & Zumbo, B. D. (2012). Estimating

ordinal reliability for Likert-type and ordinal item response data:

A conceptual, empirical, and practical guide. Practical Assess-ment, Research and Evaluation, 17(3), 1–13.

Gignac, G. E. (2009). Psychometrics and the measurement of

emotional intelligence. In C. Stough, D. H. Saklofske, & J.

D. A. Parker (Eds.), Assessing emotional intelligence: Theory,research, and applications (pp. 9–40). New York: Springer.

Gignac, G. E., Bates, T. C., & Lang, K. (2007a). Implications relevant

to CFA model misfit, reliability, and the five factor model as

measured by the NEO–FFI. Personality and Individual Differ-ences, 43, 1051–1062.

Gignac, G. E., Palmer, B., & Stough, C. (2007b). A confirmatory

factor analytic investigation of the TAS-20: Corroboration of a

five-factor model and suggestions for improvement. Journal ofPersonality Assessment, 89, 247–257.

Gorsuch, R. L. (1988). Exploratory factor analysis. In J. R. Nesselro-

ade & R. B. Cattell (Eds.), Handbook of multivariate experi-mental psychology (2nd ed., pp. 231–258). New York: Plenum

Press.

Green, S. B., & Hershberger, S. L. (2000). Correlated errors in true

score models and their effect on coefficient alpha. StructuralEquation Modeling, 7, 251–270.

Green, S. A., & Yang, Y. (2009). Commentary on coefficient alpha: a

cautionary tale. Psychometrika, 74, 121–135. doi:10.1007/

s11336-008-9098-4.

Guttman, L. A. (1945). A basis for analyzing test–retest reliability.

Psychometrika, 10, 255–282.

Hancock, G. R., & Mueller, R. O. (2001). Rethinking construct

reliability within latent variable systems. In R. Cudeck, S. du

Toit, & D. Sorebom (Eds.), Structural equation modeling:Present and future: A festschrift in honor of Karl Joreskog (pp.

195–216). Lincolnwood: Scientific Software International.

Kamata, A., Turhan, A., & Darandari, E. (2003). EstimatingReliability for Multidimensional Composite Scale Scores. Paper

presented at the annual meeting of American Educational

Research Association, Chicago, April, 2003.

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mentaltest scores. Reading: Addison-Wesley.

MacDougall, M. (2011). Moving beyond the nuts and bolts of score

reliability in medical education: Some valuable lessons from

212 T. Teo, X. Fan

123

measurement theory. Advances and Applications in StatisticalSciences, 6(7), 643–664.

Matsueda, R. L. (2012). Key advances in the history of structural

equation modelling. In R. Hoyle (Ed.), Handbook of structuralequation modeling. New York: Guilford Press.

McDonald, R. P. (1970). The theoretical foundations of principal

factor analysis, canonical factor analysis, and alpha factor

analysis. British Journal of Statistical and MathematicalPsychology, 23, 1–21.

McDonald, R. P. (1999). Test theory: A unifed treatment. Mahwah: L.

Erlbaum Associates.

McLeod, L. D., Swygert, K. A., & Thissen, D. (2001). Factor analysis

for items scored in two categories. In D. Thissen & H. Wainer

(Eds.), Test scoring (pp. 189–216). Mahwah: Lawrence Erlbaum.

Meyer, P. (2010). Reliability. New York: Oxford University Press.

Nunnaly, J., & Bernstein, I. (1994). Psychometric theory. New York:

McGraw-Hill.

Paro, H. B. M. S., Morales, N. M. O., Silva, C. H. M., Rezende, C.

H. A., Pinto, R. M. C., Morales, R. R., et al. (2010). Health-

related quality of life of medical students. Medical Education,44, 227–235.

Peterson, R. A. (1994). A meta-analysis of Cronbach’s alpha. Journalof Consumer Research, 21, 381–391.

Raju, N. S. (1970). New formula for estimating total test reliability

from parts of unequal lengths. Proceedings of the AnnualConvention of the American Psychological Association, 5(Pt. 1),

143–144.

Raykov, T. (2001). Bias in coefficient alpha for fixed congeneric

measures with correlated errors. Applied Psychological Mea-surement, 25(1), 69–76.

Raykov, T., & Marcoulides, G. A. (2011). Introduction to psycho-metric theory. New York: Routledge.

Raykov, T., & Shrout, P. (2002). Reliability of scales with general

structure: Point and interval estimation using a structural

equation modeling approach. Structural Equation Modeling, 9,

195–212.

Reise, S. P., Morizot, J., & Hays, R. D. (2007). The role of the

bifactor model in resolving dimensionality issues in health

outcomes measures. Quality of Life Research, 16, 19–31.

Reuterberg, S. E., & Gustafsson, J.-E. (1992). Confirmatory factor

analysis and reliability: Testing measurement model assump-

tions. Educational and Psychological Measurement, 52,

795–811.

Revelle, W., & Zinbarg, R. (2009). Coefficients alpha, beta, omega,

and the glb: Comments on Sijtsma. Psychometrika, 74(1),

145–154.

Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: Aprimer. Thousand Oaks: Sage.

Sijtsma, K. (2009a). On the use, the misuse, and the very limited

usefulness of Cronbach’s alpha. Psychometrika, 74, 107–120.

Sijtsma, K. (2009b). Reliability beyond theory and into practice.

Psychometrika, 74, 169–173. doi:10.1007/S11336-008-9103-Y.

Steinberg, L. (2001). The consequences of pairing questions: Context

effects in personality measurement. Journal of Personality andSocial Psychology, 81, 332–342.

Teo, T. (2010). Development and validation of the E-learning

acceptance measure (ElAM). The Internet and Higher Educa-tion, 13, 148–152.

Yang, Y., & Green, S. B. (2011). Coefficient alpha: A reliability

coefficient for the 21st century? Journal of PsychoeducationalAssessment, 29, 377–392.

Zinbarg, R., Revelle, W., Yovel, I., & Li, W. (2005). Cronbach’s a,

Revelle’s b and McDonald’s xH: Their relations with each other

and two alternative conceptualizations of reliability. Psychomet-rika, 70, 123–133.

Zumbo, B. D., Gadermann, A. M., & Zeisser, C. (2007). Ordinal

versions of coefficients alpha and theta for likert rating scales.

Journal of Modern Applied Statistical Methods, 6, 21–29.

Coefficient Alpha and Beyond 213

123