applied psychological measurement - jwalkonline.org classes/spring 08/modmeas/general... · 108...

21
http://apm.sagepub.com Applied Psychological Measurement DOI: 10.1177/0146621604272739 2005; 29; 106 Applied Psychological Measurement Joseph F. Lucke Reliability, and Congeneric Reliability "Rassling the Hog": The Influence of Correlated Item Error on Internal Consistency, Classical http://apm.sagepub.com The online version of this article can be found at: Published by: http://www.sagepublications.com can be found at: Applied Psychological Measurement Additional services and information for http://apm.sagepub.com/cgi/alerts Email Alerts: http://apm.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://apm.sagepub.com/cgi/content/refs/29/2/106 SAGE Journals Online and HighWire Press platforms): (this article cites 25 articles hosted on the Citations © 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.com Downloaded from

Upload: phamphuc

Post on 09-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

http://apm.sagepub.com

Applied Psychological Measurement

DOI: 10.1177/0146621604272739 2005; 29; 106 Applied Psychological Measurement

Joseph F. Lucke Reliability, and Congeneric Reliability

"Rassling the Hog": The Influence of Correlated Item Error on Internal Consistency, Classical

http://apm.sagepub.com The online version of this article can be found at:

Published by:

http://www.sagepublications.com

can be found at:Applied Psychological Measurement Additional services and information for

http://apm.sagepub.com/cgi/alerts Email Alerts:

http://apm.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://apm.sagepub.com/cgi/content/refs/29/2/106SAGE Journals Online and HighWire Press platforms):

(this article cites 25 articles hosted on the Citations

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

“Rassling the Hog”: The Influenceof Correlated Item Error on InternalConsistency, Classical Reliability,and Congeneric Reliability

Joseph F. Lucke, University of Texas Health Science Center at San Antonio

The properties of internal consistency (α),classical reliability (ρ), and congeneric reliability(ω) for a composite test with correlated item errorare analytically investigated. Possible sources ofcorrelated item error are contextual effects, itembundles, and item models that ignore additionalattributes or higher-order attributes. The relationbetween reliability and internal consistency isdetermined by the deviance from true-scoreequivalence. The influence of correlated item erroron α, ρ, and ω is conveyed strictly through thetotal item error covariance. As the total itemerror covariance increases, ρ and ω decrease,

but α increases. The necessary and sufficientcondition for α to be a lower bound to ρ andto ω is that the total item error covariance notexceed the deviance from true-score equivalence.Coefficient α will uniformly exceed ρ or ω

in true-score equivalent tests with positivelycorrelated item error. Index terms: classicalreliability, coefficient alpha, coefficient omega,compound symmetry, congeneric reliability,correlated item error, deviance from true-scoreequivalence, internal consistency, reliability,total item error covariance, true-scoreequivalence.

The purpose of this article is to analytically compare internal consistency to reliability for com-posite tests in which the item errors might be correlated. The psychometric literature variouslypresents internal consistency as identical to reliability, as another kind of reliability (internal consis-tency reliability), as a lower bound to reliability, as an approximator to reliability, or as an estimatorof reliability. This explication, however, begins at first principles and makes no assumptions regard-ing the relationship between internal consistency and reliability. The development focuses entirelyon internal consistency and reliability as parameters; their estimation is only briefly discussed nearthe end.

The notation and model specifications, except for the notation for the true score, are borrowedfreely from a special case (continuous variables, no exogenous variables) of Muthen’s (2002)structural equation model (see also, Muthen & Muthen, 2001, Appendix 2). This model is equivalent,with slight modifications, to the LISREL Model 3B, supplemented with a means structure (Joreskog& Sorbom, 1996; Raykov & Penev, 2001).

All covariance matrices are assumed to be positive semidefinite. The covariance matrices neednot be of full rank but may contain zero variances or linearly dependent covariances. This assumption

Applied Psychological Measurement, Vol. 29 No. 2, March 2005, 106–125DOI: 10.1177/0146621604272739

© 2005 Sage Publications106

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 107

simplifies what follows by permitting sets of items that are linear functions of other items, constantitems having zero variances, and items that have zero true-score variances or zero error vari-ances. As no matrix inverses are required in what follows, this assumption entrains no specialproblems.

Internal Consistency, Classical Reliability, and Congeneric Reliability

Internal Consistency

Suppose one is given a p-vector y = [y1, . . . , yp]′ of item random variables with positivesemidefinite covariance

var y = Σ. (1)

Suppose further that a composite test Y is the (unit-weighted) sum of these items, Y = 1′y, where1 is the unit p-vector. Then,

var Y = 1′���1. (2)

The internal consistency of the composite test Y is

α ≡ p

p − 1

(1 − tr ���

1′���1

), (3)

where tr is the trace operator (Cronbach, 1951; Guttman, 1945). The definition of α requires nounderlying statistical model for the item vector y other than its covariance be positive semidefinite.The item random variables may have arbitrary (continuous or discrete) distributions subject onlyto having finite variances.

Classical Reliability

Classical test theory (Lord & Novick, 1968) imposes an additive structure on the items

y = τττ + εεε, (4)

where τττ and εεε are, respectively, p-vectors of true-score and error random variables. The vector τττ

has finite variance ���, εεε has mean 0 with

var εεε = ���, (5)

and τττ and εεε are uncorrelated with each other. No other distributional assumptions are made. Theitems in y are usually assumed to be continuous, but there are discrete random variables that satisfyequation (4) (Lord & Novick, 1968, pp. 32-34). Classical test theory also makes the followingassumption:

ASSUMPTION 1. (Uncorrelated Item Error). ��� is a diagonal matrix.

The composite test is Y = 1′y = 1′τττ + 1′εεε, with composite true score τY = 1′τττ . The varianceof the composite true score is then

var τY = 1′���1, (6)

and the variance of the composite test Y is

var Y = 1′���1 + tr ���. (7)

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

108Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

The classical reliability of Y is

ρ ≡ var τy

var Y= 1′���1

1′���1 + tr ���. (8)

Also under classical assumptions, ��� = ��� + ���, and α specializes to

α ≡ p

p − 1

(1 − tr ��� + tr ���

1′���1 + tr ���

). (9)

Furthermore,

ρ = α + p tr ��� − 1′���1(p − 1)(1′���1 + tr ���)

(10)

(Lord & Novick, 1968; Novick & Lewis, 1967).Adapting an idea from Raykov (1997b, p. 333), the quantity

δ ≡ tr ��� − p−11′���1 (11)

is designated as the deviance from true-score equivalence, and the quantity

p

p − 1

1′���1 + tr ���

)

is designated the relative deviance from true-score equivalence. Equation (29) below ensures

δ ≥ 0. (12)

Although perhaps not intuitive, the name for δ is justified by the following:

LEMMA 1. A vector of items is true-score equivalent if and only if δ = 0.

PROOF 1. [Proof that true-score equivalence implies δ = 0]. The item vector y is true-scoreequivalent (essentially τ equivalent) if τττ = ννν + τ1, where ννν is a p-vector of constants (Lord &Novick, 1968, p. 49). Let var τ = φ2, so that ��� = φ211′. Then, from equation (11),

δ = φ2tr 11′ − p−1φ21′11′1 = pφ2 − pφ2 = 0.

The converse is proved in the appendix. �

From equation (10), the reliability of a test is its internal consistency plus its relative deviance fromtrue-score equivalence:

ρ = α + p

p − 1

1′���1 + tr ���

). (13)

Equations (13) and (12) ensure

ρ ≥ α. (14)

Lemma 1 shows that internal consistency equals classical reliability when and only when the itemsare true-score equivalent.

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 109

Congeneric Reliability

The generality of classical test theory is achieved at the expense of not specifying how the itemsare connected to the attributes being measured. Congeneric test theory (Bollen, 1989; Carmines &McIver, 1981; Joreskog, 1971; McDonald, 1999) is a special case of classical test theory in whichthe connection between item and attribute is made explicit. Although a special case of classical testtheory, its widespread applicability and simplicity entitle it to its own development.

Congeneric test theory specifies that the true score τττ is a linear function of a latent randomvariable η denoting an unobservable attribute:

τττ = ννν + λλλη,

where ννν is the p-vector of item easiness parameters (−ννν is the item difficulties), and λλλ is thep-vector of item discriminability parameters with respect to η. Thus,

y = ννν + λλλη + εεε. (15)

Congeneric test theory assumes that the attribute η has zero mean and unit variance, that the errorεεε has mean 0 with variance given by equation (5), and that the attribute and error random variablesare uncorrelated. Otherwise, no distributional assumptions are made. Also, var τττ = ��� = λλλλλλ′.Assumption 1 remains in effect. Because the parameter vectors ννν and λλλ may take any real value, theitem vector y must be assumed continuous. This restriction incidentally shows the classical itemmodel, equation (4), to be more general than the congeneric model, equation (15).

The composite test is Y = 1′y = 1′ννν + 1′λλλη + 1′εεε with composite true score τY = 1′ννν + 1′λλλη.Furthermore, the variance of the composite true score is

var τY = 1′λλλλλλ′1, (16)

and the variance of the composite test Y is

var Y = 1′λλλλλλ′1 + tr ���. (17)

Additional insight and useful simplifications can be gained by letting λ = p−11′λλλ and then notingthat equation (11) becomes

δ = λλλ′λλλ − pλ2 = (λλλ − λ1)′(λλλ − λ1) ≥ 0. (18)

Under congeneric test theory, δ is the centered sum of squares of the discriminability coeffi-cients. Similar results have been obtained by Raykov (1997b, p. 332) and McDonald (1999,pp. 92-93).

The proof of Lemma 1 is simpler under congeneric test theory. The items in y are true-scoreequivalent if λλλ = λ1 (McDonald, 1999, p. 85). In this case, λ = λ, and δ = 0. Conversely, if δ = 0,then from equation (18), λi = λ for i = 1, . . . , p, and y is true-score equivalent.

The congeneric reliability of Y is

ω ≡ var τY

var Y= 1′λλλλλλ′1

1′λλλλλλ′1 + tr ���= p2λ2

p2λ2 + tr ���(19)

(McDonald, 1985, 1999). Under congeneric assumptions, ��� = λλλλλλ′ + ���, and

α ≡ p

p − 1

(1 − λλλ′λλλ + tr ���

1′λλλλλλ′1 + tr ���

)= p

p − 1

(1 − δ + pλ2 + tr ���

p2λ2 + tr ���

). (20)

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

110Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

Furthermore,

ω = α + p

p − 1

p2λ2 + tr ���

)(21)

(McDonald, 1999, pp. 92-93). Again, the reliability of a test is its internal consistency plus itsrelative deviance from true-score equivalence. From equation (18),

ω ≥ α. (22)

Internal consistency equals congeneric reliability when and only when the items are true-scoreequivalent.

Correlated Item Error

Poorly understood is the role that uncorrelated item error, Assumption 1, plays in securingequations (14) and (22). An examination of the proofs of these equations reveals that uncorrelateditem error is a only sufficient assumption in the derivations (Lord & Novick, 1968, pp. 88-89;McDonald, 1970, pp. 16-20; McDonald, 1999, pp. 92-93; Novick & Lewis, 1967). The proofs aresilent on its necessity. Despite this lacunae in logic, a widespread belief is that correlated item errorhas little effect on the relation between reliability and internal consistency (Zumbo & Rupp, 2004,p. 79) and that, were it present, “α would be inflated somewhat, but perhaps not beyond the truevalue ρ” (Miller, 1995, p. 266).

Sources of Correlated Item Error

One substantive source of correlated item error is contextual effects—“a substantial proportionof the uncontrolled extraneous conditions which are presumably responsible for measurement error[and] will persist from one item to the next” (Rozeboom, 1966, p. 415). However, there appearsto be little, if any, research conducted in this area, as item error is rarely the focus of substantiveinvestigations (Zimmerman & Williams, 1980). Consequently, “how severely internal-consistencyestimates of reliability are inflated by correlated errors is still an empirical unknown” (Rozeboom,1989, p. 283).

A second substantive source of correlated item error is item bundles or testlets, a cluster ofitems that share a common stimulus, contain common content, or possess a common structure(Rosenbaum, 1988; Tuerlinckx & De Boeck, 2004; Wainer & Kiely, 1987; Wilson & Adams,1995). Items within a bundle can be expected to show intra-bundle correlations in addition to thecorrelations within the test. Interest in item bundles and testlets originated in item response theorybecause they violate that theory’s crucial assumption of local independence. An interest in theimpact of item bundles and testlets on reliability and internal consistency has recently emerged inclassical and congeneric test theory (Feldt, 2002).

The primary methodological source of correlated item error would appear to be model misspeci-fication. Correlated item error can arise from ignoring additional attributes that affect the items(Shevlin, Miles, Davies, & Walker, 2000). Let the true (or empirically adequate) item model be

y = ννν + λλλ1η1 + ���2ηηη2 + εεε∗, (23)

where ηηη2 is an m2-vector of additional attributes with zero mean and unit variance, ���2 is a p × m2matrix of discriminabilities, and εεε∗ is a p-vector of item errors, uncorrelated with both η1 and ηηη2,with zero mean and

var εεε∗ = ���∗, (24)

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 111

a p × p diagonal covariance matrix. Assume also that

cov(η1,ηηη2) =[

1 00 ���2

], (25)

where ���2 is an m2 × m2 correlation matrix. The true item model is therefore heterogeneous (mul-tidimensional) in that each item reflects more than one attribute, where the second set of attributesis uncorrelated with the first. If the investigator uses a misspecified homogeneous (unidimensional)item model and ignores the additional attributesηηη2, then equation (15) is obtained by setting η = η1,λ = λ1, and εεε = ���2ηηη2 + εεε∗. But Assumption 1 will not be met because

var εεε = ��� = ������2 ���′ + ���∗

will not in general be diagonal.Correlated item error can also arise from a misspecified structural model, such as might occur

with the presence of higher-order attributes. Let the true (or empirically adequate) model be

y = ννν + λλλη1 + εεε∗ with η1 = βββ ′12 ηηη2 + ζ1, (26)

where ηηη2 is an m2-vector of second-order attributes with zero mean and unit variance, βββ12 is am2-vector of regressors of η1 on ηηη2, and ζ1 is a disturbance random variable. Assume also that

cov(ζ1,ηηη2) =[

1 00 ���2

],

where���2 is given in equation (25) and var εεε∗ is given in equation (24). Again, if the investigator usesa misspecified homogeneous item model and ignores the second-order attributes ηηη2, then equation(15) is obtained by setting η = ζ1 and εεε = λβββ ′

12ηηη2 +εεε∗. But Assumption 1 will not be met because

var εεε = ��� = λβββ ′12���2 βββ12λ

′ + ���∗

will not in general be diagonal.

Previous Research

Empirical evidence indicates that correlated item error can render equations (14) and (22) false.Zimmerman, Zumbo, and Lalonde (1993) conducted Monte Carlo comparisons of internal consis-tency to the known classical reliability of a composite test with true-score equivalent items andpositively correlated item error. Upon varying the number of items containing correlated error, theinvestigators found that α exceeded the known ρ in all but one case. Komaroff (1997) conducted aMonte Carlo investigation of item model misspecification in which the internal consistency of con-generic items containing correlated item error was compared to the reliability of a classical modelthat assumed uncorrelated item error. In this case, α exceeded the (misspecified) ρ in many but notall cases.

Green and Herschberger (2000) created a true-score equivalent item model with a first-ordermoving-average item error structure. They then substituted numerical values for the model param-eters, obtained the classical reliability, generated the population covariance matrix, and from itobtained the internal consistency. In this numerically specific model, α exceeded ω. These inves-tigators also created a second, Markovian model such that the items were congeneric and the itemerrors possessed a first-order autoregressive covariance structure. They again substituted numerical

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

112Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

values for the model parameters and obtained the congeneric reliability and internal consistency.And again, α exceeded ω.

Intriguing though these results are, their generality is restricted to the parameter space and covari-ance structures chosen for simulations and substitutions. Many questions remain unanswered. Howdoes correlated item error influence the definitions of α, ρ, and ω? What is the functional relation-ship between the amount of correlated item error and reliability? What is the functional relationshipbetween the amount of correlated item error and internal consistency? What role does the struc-ture of the test (parallel, true-score equivalent, congeneric) play? What role does the structure ofthe item error covariance matrix (block diagonal, compound symmetric, autoregressive, or mov-ing average) play? What are the necessary conditions to maintain α’s lower bound property withrespect to ρ and ω? Can correlated item error render internal consistency no longer a lower boundto reliability? An analytic analysis is required to acquire a general understanding of the relationsamong α, ρ, and ω under correlated item error.

Guttman (1953) appears to have been the first to conduct an analytic investigation of the influenceof correlated item error on internal consistency. His results indicated that the sum of the item error(proper) covariances played a fundamental role in determining the properties of internal consistency.Rozeboom (1966, chap. 9) investigated the impact of correlated item error on classical reliability andinternal consistency under an item model very similar to equation (40) in this article (see Rozeboom,1966, p. 431). His results revealed that a positive sum of item error covariances could decrease ρ

and could also render α > ρ. Raykov (1998b, 2001a) also investigated congeneric reliability andinternal consistency under correlated item error. He showed that positively correlated item errorcould render α > ω and that negatively correlated item error could render α too low a boundfor ω. The purpose of this presentation is to consolidate and extend these analytic results.

Internal Consistency, Classical Reliability, and CongenericReliability Under Correlated Item Error

From this point forward, the assumption of uncorrelated item error, Assumption 1, is replacedwith the following assumption:

ASSUMPTION 2. (Correlated Item Error). ��� is a positive semidefinite but otherwise arbitrarycovariance matrix.

Special Matrices for Summations

Two special matrices, when combined with the trace operator, facilitate the summation of theelements of a covariance matrix. Recall that 1 is the p-dimensional unit vector, and let I denote thep × p identity matrix. Define J and H as

J = 11′ and H = J − I. (27)

J is a p × p matrix of ones, and H is a p × p matrix of ones with zeros on the diagonal. Because��� in equation (1) is square, tr J��� is the sum of all the elements of ���, tr H��� is the sum of all theoff-diagonal elements, and

tr J��� = tr ��� + tr H���. (28)

Because ��� is also positive semidefinite,

tr J��� ≤ p tr ���. (29)

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 113

In other words, the average of the p2 elements of a positive semidefinite covariance matrix cannotexceed the average of the p variances on its diagonal. Proof is given in the appendix. Equations(28) and (29) yield

−tr ��� < tr H��� ≤ (p − 1)tr ���. (30)

Internal Consistency

The definition of internal consistency remains unchanged under correlated item error. Givenequations (1) and (2),

α ≡ p

p − 1

(1 − tr ���

tr J���

)= p

p − 1

(tr H���

tr J���

). (31)

From equation (31), α is clearly seen as the ratio of the average of the p(p − 1) proper covariancesto the average of the p2 variances and covariances. If tr H��� takes on its upper bound (p − 1)tr ���

in equation (30), then, from equation (28), tr J��� = p tr ���. If tr H��� approaches its lower bound−tr ���, then tr J��� approaches zero. Using these two results with equation (31) yields

−∞ < α ≤ 1.

Classical Reliability

Under classical test theory, the variance of the true score remains as in equation (6), but thevariance of the composite test Y is no longer equation (7) but

var Y = tr J��� + tr ��� + tr H���.

The classical reliability of Y is then

ρ ≡ var τy

var Y= tr J���

tr J��� + tr ��� + tr H���. (32)

The quantity tr H��� will be designated as the total item error covariance. Equation (32) revealsa fundamental result:

PROPOSITION 1. The influence of correlated item error on classical reliability is conveyed onlyby the total item error covariance.

From this result, it immediately follows that the structure of the���, such as having block-diagonal,autocorrelated, moving-average, or compound symmetric errors, is irrelevant. The number of itemswith correlated error influences classical reliability but only if the total covariance increases asa function of that number. It also immediately follows that a necessary condition to remove theinfluence of correlated item error on reliability is tr H��� = 0, which could occur with a mixture ofpositively and negatively correlated item errors. Clearly, from equation (32), 0 ≤ ρ ≤ 1. As tr H���

increases, ρ decreases. Indeed, from equation (30), as tr H��� increases from −tr ��� to (p − 1)tr ���,ρ decreases as

ρ =

1 for tr H��� = −tr ���;tr J���

tr J���+tr ���for tr H��� = 0;

tr J���tr J���+p tr ���

for tr H��� = (p − 1)tr ���.

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

114Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

Also under classical test theory assumptions, ��� = ��� + ���, and

α = p

p − 1

(1 − tr ��� + tr ���

tr J��� + tr ��� + tr H���

)= p

p − 1

[tr H (��� + ���)

tr J (��� + ���)

]. (33)

Equation (33) exhibits a fundamental result similar to Proposition 1:

PROPOSITION 2. The influence of correlated item error on internal consistency under classicaltest theory is conveyed only by the total item error covariance.

Contrary to ρ, as total item error covariance increases, α increases. From equation (30), as tr H���

increases from −tr ��� to (p − 1)tr ���, α increases as

α = p

p − 1×

(1 − tr ���+tr ���

tr J���

)for tr H��� = −tr ���;(

1 − tr ���+tr ���tr J���+tr ���

)for tr H��� = 0;(

1 − tr ���+tr ���tr J���+p tr ���

)for tr H��� = (p − 1)tr ���.

Comparing ρ to α,

ρ = α + p

p − 1

[δ − tr H���

tr J (��� + ���)

]. (34)

The necessary and sufficient condition for ρ ≥ α is for δ ≥ tr H���. If the items are true-scoreequivalent, δ = 0 and

α = ρ + p

p − 1

[tr H���

tr J (��� + ���)

].

For internal consistency to remain a lower bound to reliability, the total item error covariance cannotexceed the deviance from true-score equivalence. Equation (34) explains why, as reported in somesimulation studies (Komaroff, 1997, p. 347), α remains a lower bound to ρ more often when testsare not true-score equivalent than when they are. The conditions δ ≥ tr H��� and therefore α ≤ ρ

always hold true for items that have nonpositive correlated item error but need not be true itemswith positively correlated item error. Internal consistency uniformly exceeds classical reliability fortrue-score equivalent items with positive total item error covariance.

Congeneric Reliability

Under congeneric test theory, the variance of the test’s true score remains as in equation (16),but the variance of the composite test Y is no longer equation (17) but

var Y = tr Jλλλλλλ′ + tr ��� + tr H���.

The congeneric reliability of Y is

ω ≡ var τy

var Y= p2λ2

p2λ2 + tr ��� + tr H���. (35)

Clearly, 0 ≤ ω ≤ 1. Equation (35) reveals a result virtually identical to Proposition 1:

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 115

PROPOSITION 3. The influence of correlated item error on congeneric reliability is conveyedonly by the total item error covariance.

Indeed, from equation (30), as tr H��� increases from −tr ��� to (p − 1)tr ���, ω decreases as

ω =

1 for tr H��� = −tr ���;p2λ2

p2λ2+tr ���for tr H��� = 0;

pλ2

pλ2+tr ���for tr H��� = (p − 1)tr ���.

(36)

Also under congeneric test theory, ��� = λλλλλλ′ + ���, so that

α = p

p − 1

(1 − δ + pλ2 + tr ���

p2λ2 + tr ��� + tr H���

). (37)

Equation (37) yields a result parallel to Proposition 2:

PROPOSITION 4. The influence of correlated item error on internal consistency under congenerictest theory is conveyed only by the total item error covariance.

From equation (30), as tr H��� increases from −tr ��� to (p − 1)tr ���, α increases as

α = p

p − 1×

(1 − δ+pλ2+tr ���

p2λ2

)for tr H��� = −tr ���;(

1 − δ+pλ2+tr ���

p2λ2+tr ���

)for tr H��� = 0;(

1 − δ+pλ2+tr ���

p2λ2+p tr ���

)for tr H��� = (p − 1)tr ���.

(38)

Comparing ω to α and using equation (11),

ω = α + p tr λλλλλλ′ − tr Jλλλλλλ′ − p tr H���

(p − 1) tr J (λλλλλλ′ + ���)

= α + p

p − 1

(δ − tr H���

p2λ2 + tr J���

).

(39)

Equation (39) is similar to equation (34). Raykov (2001a, pp. 71-72) has obtained similar results.The necessary and sufficient condition for ω ≥ α is δ ≥ tr H���. If the items are true-score equivalent,then λ = λ, δ = 0, and

α = ω + p

p − 1

(tr H���

p2λ2 + tr J���

).

For internal consistency to remain a lower bound to reliability under congeneric test theory, thetotal item error covariance cannot exceed the deviance from true-score equivalence. This conditionalways holds true for nonpositive correlated item error but need not be true for positively correlateditem error. Internal consistency uniformly exceeds congeneric reliability for true-score equivalentitems with positive total item error covariance. These conclusions are identical to those for classicaltest theory.

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

116Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

Application: The Factor-Analytic Item Model

There need not be any relation between ρ and ω for classical test theory does not specify thefunction linking the unobservable attribute random variable to the observed item random variable.However, there is one model in which the relation between the two reliabilities is specified. Thefactor-analytic item model is similar to the congeneric model but introduces for each item a specificfactor that is separate from the item error term. Thus, the item model is

y = ννν + λλλη + ζ + εεε, (40)

where ζ is a p-dimensional vector of specific factors that has mean 0 and unknown variance ���

and is uncorrelated with both η and εεε. The specific factors are usually assumed to be uncorrelatedwith one another—that is, the p × p matrix ��� is usually assumed to be diagonal (Bollen, 1989,pp. 218-221; Lord & Novick, 1968, pp. 535-537; McDonald, 1970, pp. 2-3)—but a reviewerrequested an analysis allowing them to be correlated. Issues regarding identifiability of the para-meters will temporarily be ignored.

An ambiguity in equation (40) turns on whether ζ is considered part of the true score or the itemerror (Bollen, 1989, pp. 218-221). Classical test theory considers ζ to be part of the true score, so theclassical true score is ννν +λλλη + ζ with variance λλλλλλ′ +���, and the error is εεε with variance ��� (Bollen,1989, Equation 6.44). Bollen (1989) argued that reliability should assess only the systematic partof the model, with the remaining random sources of variation, ζ and εεε, being error. Congenerictest theory then considers ζ part of the item error, so that the congeneric true score is ννν + λλλη withvariance λλλλλλ′, and the error is ζ+εεε with variance ��� +��� (Bollen, 1989, Equation 6.46). The originalmotivation for ω as an alternative to ρ was based on this distinction (McDonald, 1970, pp. 16-20,Equation A.17).

Referring to equation (40) with Assumption 2,

α = p

p − 1

[1 − tr

(λλλλλλ′ + ��� + ���

)tr J (λλλλλλ′ + ��� + ���)

],

ρ = tr J(λλλλλλ′ + ���

)tr J (λλλλλλ′ + ��� + ���)

, and

ω = tr Jλλλλλλ′

tr J (λλλλλλ′ + ��� + ���).

Propositions 1 through 4 still hold true: The effect of the correlated item is conveyed strictly bytr H��� (as part of tr J���). As total item error covariance increases, α increases, but ρ and ω bothdecrease.

Comparisons among the three parameters yields

ρ = ω + tr J���tr J (λλλλλλ′ + ��� + ���)

,

ρ = α + p

p − 1

[δ − tr H��� + tr ��� − p−1tr J���

tr J (λλλλλλ′ + ��� + ���)

], and

ω = α + p

p − 1

[δ − tr H (��� + ���)

tr J (λλλλλλ′ + ��� + ���)

].

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 117

Under the standard assumptions that the specific factors are uncorrelated and the item errors areuncorrelated, α ≤ ω ≤ ρ (McDonald, 1970). Under the assumptions that the specific factors remainuncorrelated but the errors are correlated, only ω ≤ ρ remains true. For sufficiently large totalitem error covariance, α can exceed ρ or ω. Coefficient α exceeds ω if tr H��� > δ, the same as inequation (39). Coefficient α exceeds ρ (and ω) if tr H��� > δ+ (p−1)tr ���/p. Unlike in the classicalmodel, α need not exceed ρ in true-score equivalent tests with correlated error in the factor-analyticmodel.

Correlated specific factors have different effects on ρ and ω. For α to exceed ρ, the presence ofspecific factors (tr J��� > 0) requires that the total item error covariance be larger than would berequired in the absence of specific factors (tr J��� = 0). For α to exceed ω, negatively correlatedspecific factors (tr H��� < 0) require the total item error covariance to be larger than would berequired for uncorrelated specific factors (tr H��� = 0), which in turn require it to be larger thanwould be required for positively correlated factors (tr H��� > 0).

Although the factor-analytic item model specifies both classical and congeneric reliabilities,its merits remain theoretical as no method has yet been devised to identify ζ separately fromεεε (McDonald, 1985, pp. 214-215). If the specific factors were assumed to be correlated, thenalternative representations might be ζ = ���2ηηη2 in equation (23) or ζ = βββ ′

12ηηη2 in equation (26). Inpractice, to date, ζ has invariably been assumed to be identically 0, leaving the congeneric modelspecified by equation (15) and ρ = ω.

Application: Compound Symmetric Item Error Covariance

Equations (31), (32), (34), (35), and (39) are too general to reveal the influence of correlateditem error on the relations between ρ and α and between ω and α. Because only the total itemerror covariance influences the behavior of α, ρ, and ω, a more perspicuous approach would beto develop a special covariance matrix in which the item error structure could be controlled by asingle parameter rather than the p(p − 1)/2 item covariances. One such structure is the compoundsymmetric or equicorrelated covariance structure, which stipulates that error is equally correlatedamong items. The compound symmetric item error covariance is

��� = [γ 11′ + (1 − γ )I]θ2. (41)

The item error variances are θ2, the item error covariances are γ θ2, and the item error correlationsare γ . Partitioning the sum of the item error variances and covariances according to equation (28)yields

tr ��� = pθ2,

tr H��� = p(p − 1)γ θ2, and

tr J��� = p[1 + (p − 1)γ ]θ2.

(42)

Application of equation (30) reveals that for ��� to be positive semidefinite,

− 1

p − 1≤ γ ≤ 1.

Despite its relative simplicity, the compound symmetric matrix is a completely general matrixfor investigating the influence of item error correlations on α, ρ, and ω. From equations (31), (32),and (35), any two covariance matrices ���∗ and ��� will have the same influence if

tr ���∗ = tr ��� and tr H���∗ = tr H���. (43)

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

118Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

Thus, if ���∗ can be transformed into ��� such that equation (43) obtains, then the influence of ���∗can be examined through ���. In particular, if ���∗ can be transformed into a compound symmetricmatrix ���, then the influence of tr H���∗ can be investigated by varying ���’s item error correlation γ .Any covariance matrix can be transformed to a compound symmetric that satisfies the equalities inequation (43). Let ���∗ be a p × p item error covariance. Define the following quantities as

θ2 = tr ���∗

pand γ = tr H���∗

(p − 1)tr ���∗ = tr H���∗

p(p − 1)θ2.

Define the compound symmetric matrix��� using θ2, γ , and the unit p-vector 1 according to equation(41). From equation (42), tr ��� = tr ���∗ and tr H��� = tr H���∗. The influence of tr H���∗ can now beinvestigated using the γ of ���.

From equations (19) and (42), the congeneric reliability of a composite test is

ω = pλ2

pλ2 + [1 + (p − 1) γ

]θ2

. (44)

Using equation (36) with equation (42) or using equation (44), as γ increases from −(p − 1)−1

to 1, ω decreases as

ω =

1 for γ = −1(p−1)

;pλ2

pλ2+θ2 . for γ = 0;λ2

λ2+θ2 . for γ = 1.

Negative item error correlation reduces the effect of item error variance, thereby increasing reliabil-ity. At the minimum value of the correlation, item error variance is effectively eliminated, yieldingperfect reliability. Increasing item error correlation increases the effect of item error variance andeffectively reduces the number of items from p to one.

From equations (3) and (42), internal consistency is

α = p(p − 1)(λ2 + γ θ2) − δ

(p − 1){pλ2 + [1 + (p − 1)γ ]θ2} . (45)

Using equation (38) with equation (42) or using equation (45), as γ increases from −(p − 1)−1

to 1, α increases as

α =

1 − pθ2+δ

p(p−1)λ2 for γ = −1(p−1)

;p(p−1)λ2−δ

(p−1)(pλ2+θ2)for γ = 0;

1 − δ

p(p−1)(λ2+θ2)for γ = 1.

Comparing ω to α using equations (39) and (42) or equations (44) and (45) yields

ω = α + δ − p(p − 1)γ θ2

(p − 1){pλ2 + [1 + (p − 1)γ ]θ2} .

The necessary and sufficient condition for ω ≥ α is

γ ≤ min

p(p − 1)θ2, 1

]. (46)

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 119

If the items are true-score equivalent, then δ = 0, and

α = ω + pγ θ2

pλ2 + [1 + (p − 1) γ

]θ2

.

For true-score equivalent items, internal consistency will uniformly exceed congeneric reliabilitywhen the item error correlation is positive. Furthermore, as the correlation increases to 1, internalconsistency will also increase to 1, regardless of the value of the reliability.

A Numerical Example

For purposes of demonstration, consider a set of four hypothetical composite tests—Y1, Y2, Y3, Y4—each with p = 6 items, λ = 3, and θ2 = 9 but with four different vectors ofdiscriminabilities:

1. Y1 with λλλ1 = [3, 3, 3, 3, 3, 3]′ so that δ1 = 0,

2. Y2 with λλλ2 = [1, 1, 1, 5, 5, 5]′ so that δ2 = 24,

3. Y3 with λλλ3 = [−1, −1, −1, 7, 7, 7]′ so that δ3 = 96, and

4. Y4 with λλλ4 = [−4, −4, −4, 10, 10, 10]′ so that δ4 = 294.

All four tests have the same congeneric reliability

ω = 6 × 32

6 × 32 + 9 (1 + 5γ ).

Figure 1 displays the influence of correlated item error on congeneric reliability and internalconsistency for the four tests, and Table 1 displays the values of ω and the αs for select valuesof γ . As the item error correlation, γ , increases, the congeneric reliability ω decreases from 1 atγ = −1/5 through 6/7 ≈ .86 at γ = 0 to 1/2 at γ = 1. The test Y1 is a true-score equivalent test.Its internal consistency, denoted α1 in the figure, is less than ω for negative γ , equals ω at γ = 0,and exceeds ω for all positive values of γ , with an upper asymptote of 1. The tests Y2 and Y3 showdeviance from true-score equivalence. Their respective α2 and α3 remain lower bounds to ω until

γ > δ2/[p(p − 1)θ2] = 24/270 ≈ .09

for Y2 and

γ > δ3/[p(p − 1)θ2] = 96/270 ≈ .36

for Y3, at which points they each exceed ω. The test Y4 is perhaps an extreme case but showsthat with sufficiently large deviance from true-score equivalence, internal consistency can remaina lower bound. In this case,

δ4/[p(p − 1)θ2] = 294/270 ≈ 1.09

so that equation (46) is always true.

Discussion

Given a p-vector of observable random variables possessing second moments, internal consis-tency, α (equation (3)), is the ratio of the average of the p(p − 1) proper covariances to the average

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

120Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

Table 1

Influence of Correlated Item Error (γ ) on Reliability (ω)and Internal Consistency (α) for Four Congeneric Tests

α

γ ω Y1 Y2 Y3 Y4

−0.2 1.00 0.80 0.71 0.44 −0.290.0 0.86 0.86 0.78 0.55 −0.080.2 0.75 0.90 0.83 0.63 0.080.4 0.67 0.93 0.87 0.70 0.210.6 0.60 0.96 0.91 0.75 0.310.8 0.55 0.98 0.93 0.79 0.391.0 0.50 1.00 0.96 0.82 0.46

Figure 1

Influence of Correlated Item Error (γ ) on Reliability (ω) and Four Valuesof Internal Consistency (α1, α2, α3, and α4)

-0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Item Error Correlation

-0.50

-0.25

0.00

0.25

0.50

0.75

1.00

1

2

3

4

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 121

of the p2 covariances and variances. This generality is α’s greatest advantage and its greatestdisadvantage. The advantage is that α can be derived for the item random variables representingalmost any psychometric model, be they continuous or discrete, and regardless of their latentstructure. The disadvantage is that α itself is derived neither from a psychometric model nor fromany psychometric concept. Internal consistency has no intrinsic psychometric interpretation, andits only purpose is to serve as an approximation to reliability (McDonald, 1981, p. 111).

Classical reliability, ρ (equation (8)), is derived from the psychometric model of classical testtheory (Lord & Novick, 1968). The model assumes that each item can be additively decom-posed into a mutually uncorrelated latent true score and a latent item error. Congeneric relia-bility, ω (equation (19)), is derived from the psychometric model of congeneric test theory, animportant special case of classical test theory (Carmines & McIver, 1981). The model assumesthat each item is linearly related to an unobservable attribute complemented with an addi-tive, uncorrelated, latent item error. Because of its generality, α can be specialized both toclassical test theory (equation (9)) and congeneric test theory (equation (20)). Under the addi-tional assumption that item errors are uncorrelated, α is known to be a lower bound to ρ

(Lord & Novick, 1968) and ω (McDonald, 1999), with equality if the items are true-scoreequivalent.

Raykov’s (1997a) and McDonald’s (1999) independent proposals to employ the congeneric itemmodel to elucidate the relation between ω and α constituted a significant advance. McDonald’sderivation of the difference between ω and α as the centered sum of squares of the discriminabilitycoefficients (equations (18) and (21)) yielded additional simplifications. The results for the con-generic item model, in turn, gave impetus for investigating the relation between α and ρ in the moregeneral, classical item model. The new concepts of deviance from true-score equivalence and therelative deviance from true-score equivalence emerged as fundamental to the relation between reli-ability (classical or congeneric) and internal consistency. Both quantities are nonnegative. A testis true-score equivalent if and only if its deviance from true-score equivalence is zero. Reliability(classical or congeneric) is internal consistency plus the relative deviance from true-score equiva-lence. The relative deviance from true-score equivalence shows why internal consistency is a lowerbound to reliability.

The assumption of uncorrelated item error may be unrealistic. Correlated item error can arisefrom contextual effects or from misspecified item models. The proof of the lower bound propertyshows that uncorrelated error is sufficient but is silent on its necessity (Lord & Novick, 1968; Novick& Lewis, 1967). Without the assumption of uncorrelated item error, the possibility remained thatthe lower bound property reliability need not hold. From the beginning, analytic investigations ofcorrelated item error have focused the sum of the item error (proper) covariances as the primesource of influence on ρ (Guttman, 1953; Rozeboom, 1966). In this investigation, the concept oftotal item error covariance emerged as fundamental to assessing the effect of correlated item erroron α, ρ, and ω. The influence of correlated item error on ρ (Proposition 1), ω (Proposition 3), and α

(Propositions 2 and 4) is conveyed strictly through the total item error covariance. The structure ofthe item error covariance matrix (e.g., block diagonal, autocorrelated, moving average, compoundsymmetric) is irrelevant. Likewise, the structure of the test (e.g., parallel, true-score equivalent,congeneric) is irrelevant.

Correlated item error influences reliability and internal consistency in opposite directions. As thetotal item error covariance decreases from zero to its lower bound, reliability increases to an upperasymptote of unity, but internal consistency decreases with no lower asymptote. As the total itemerror covariance increases from zero to its upper bound, reliability decreases to a lower, nonzeroasymptote that depends on the average item error variance, but internal consistency increases to an

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

122Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

upper asymptote that depends on the relative deviance from true-score equivalence. If the items aretrue-score equivalent, α’s upper asymptote is unity.

Uncorrelated item error is a sufficient but not a necessary condition to establish the lower boundproperty of internal consistency. A weaker sufficient condition is that the total item error covariancenot exceed zero. Surprisingly, the concepts of deviance from true-score equivalence and total itemerror covariance were both required to establish necessity. The necessary and sufficient conditionfor internal consistency to be a lower bound to reliability (classical or congeneric) is that the totalitem error covariance not exceed the deviance from true-score equivalence. True-score equivalenttests have zero deviance and therefore require a nonpositive total item error covariance for the lowerbound property to hold. Positively correlated item error will always cause internal consistency toexceed reliability in true-score equivalent tests.

The factor-analytic item model relates ρ to ω and both of them to α. Correlated item error in thefactor-analytic item model can cause α to exceed both ρ and ω. Larger total item error covarianceis required for α to exceed ρ than it is for α to exceed ω. In a true-score equivalent test withpositively correlated item error, α will necessarily exceed ω but not necessarily ρ. The presenceof positively correlated rather than uncorrelated specific factors will reduce the total item errorcovariance required for α to exceed ω.

The compound symmetric matrix is sufficient to study effects of correlated item error. Theinfluence of correlated item error from any item error covariance matrix can be examined by usinga corresponding compound symmetric with the same total item error variance and total item errorcovariance. The influence of total item error can then be investigated by varying the correlationparameter of the compound symmetric matrix.

Correlated item error poses substantial problems for the application of estimated α to testdevelopment. Current psychometric practice maintains that α should be estimated in all new tests(Nunnally & Bernstein, 1994, p. 235). Implicit in this standard is the assumption that the test’sreliability will be at least as large as the estimated α. But this conclusion is assured only if thetotal item error covariance is at most zero. Under positively correlated item error, the estimatedα may exceed the reliability and falsely indicate an adequate test. Items for a composite test areoften selected with the goal of increasing the estimated α. Again, the assumption is that an increas-ing α indicates increasing reliability. Unfortunately, an increasing α could instead be indicatingincreasing item error covariance with decreasing reliability as a side effect. Coefficient α cannotdistinguish reliability from correlated item error. “If internal-consistency measures are to [retain]stature in reliability theory . . . , explicit provision must be made for the effects of correlated mea-surement errors.” Other-wise, “the apparent power of internal-consistency . . . estimates is largelyillusory . . . [and] for practical applications like putting on a clean shirt to rassle a hog” (Rozeboom,1966, pp. 438, 415).

Methods now exist for estimating ω directly. Assuming the joint multivariate normality of η andεεε, equation (15) becomes a normal-theory factor analysis model. The maximum likelihood estimator(MLE) ω can be constructed from equation (19) using the MLEs of λλλ, ���, and ���. A first-orderapproximation to the variance of ω has been derived (Raykov, 2002). For nonnormal continuousη and εεε, consistent estimates of λλλ, ���, and ��� may be available. The required point and intervalestimates can be obtained from software for structural equation modeling (Raykov, 1997a, 1998a,2001b, 2002; Raykov & Shrout, 2002; Reuterberg & Gustafsson, 1992). In the case of multivariatenormality, the structural equation approach provides the added benefit of testing the measurementmodel’s goodness of fit. Thus, α is no longer needed for approximating ω. Perhaps by estimatingω and eschewing α, at least in congeneric tests, one can pen the hog while keeping one’s shirtclean.

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 123

Appendix

p tr ��� ≥≥≥ tr J��� (Equation (29))

PROOF 2. This proof is a matrix version of Lord and Novick’s (1968) proof of their equation(4.4.6). Let y1, . . . , yp be a vector of random variables. Let ��� be their covariance matrix, withvariances σ 2

i and covariances σij for i, j = 1, . . . , p. Let v be the p-dimensional vector of variancesof ���. And finally, let V = v1′ + 1v′ so that for each vij ∈ V, vij = σ 2

i + σ 2j . From equation (28),

tr JV = tr V + tr HV. But

tr JV = tr Jv1′ + tr J1v′ = 2tr Jv1′ = 2p tr 1′v = 2p1′v = 2p tr ���,

and

tr V = tr v1′ + tr 1v′ = 2tr v1′ = 2tr 1′v = 21′v = 2 tr ���. (47)

Thus,

2p tr ��� = 2 tr ��� + tr HV. (48)

But for each i = j ,

vij = σ 2i + σ 2

j ≥ 2σiσj ≥ 2σij (49)

by the Cauchy-Schwarz inequality. This implies

tr HV ≥ 2 tr H���. (50)

Therefore,

2p tr ��� ≥ 2 tr ��� + 2 tr H��� = 2 tr J���,

which is equation (29). �

If δ = 0, then the vector of items is true-score equivalent.

PROOF 3. To use the notation in the previous proof, let ��� = ��� in equation (11). From equations(48) and (50), 2(p−1)tr ��� = tr HV ≥ 2tr H���. Assume δ = 0. From equation (11), p tr ��� = tr J���,which yields, via equation (28), (p−1)tr ��� = tr H���. Therefore, 2(p−1)tr ��� = tr HV = 2 tr H���.

Using equation (47),

tr JV = tr V + tr HV = 2 tr ��� + 2 tr H��� = 2 tr J���,

so that tr J (V − 2���) = 0. By construction, for every σij ∈ ���, there is a vij ∈ V such thatvij = σ 2

i + σ 2j , where i, j = 1, . . . , p and σii = σ 2

i . Each vij ∈ V can be paired with a σij ∈ ���

without altering the value of the trace function. But this implies

σ 2i + σ 2

j = 2σij (51)

for all i and j . Otherwise, any nonzero difference would, by the Cauchy-Schwarz inequality,necessarily be positive and render tr J (V − 2���) > 0. Therefore, V = 2���. Combiningequations (51) and (49) yields σ 2

i + σ 2j = 2σiσj for all i and j . Therefore, σi = σj for all i and j .

Reverting to the original notation, ��� = ��� = φ211′, from which it follows that y is true-scoreequivalent. �

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

124Volume 29 Number 2 March 2005APPLIED PSYCHOLOGICAL MEASUREMENT

References

Bollen, K. A. (1989). Structural equations with latentvariables. New York: John Wiley.

Carmines, E. G., & McIver, J. P. (1981). Analyz-ing models with unobservable variables: Analysisof covariance structures. In G. W. Bohrnstedt &E. F. Borgatta (Eds.), Social measurement(pp. 65-115). Beverly Hills, CA: Sage.

Cronbach, L. J. (1951). Coefficient alpha and theinternal structure of tests. Psychometrika, 16(3),297-334.

Feldt, L. S. (2002). Estimating the internal consis-tency reliability of tests composed of testlets vary-ing in length. Applied Measurement in Education,15(1), 33-48.

Green, S. B., & Herschberger, S. L. (2000). Corre-lated errors in true score models and their effect oncoefficient alpha. Structural Equation Modeling,7(2), 251-270.

Guttman, L. (1945). A basis for analyzing test-retestreliability. Psychometrika, 10(4), 255-282.

Guttman, L. (1953). Reliability formulas that do notassume experimental independence. Psychome-trika, 18(3), 225-239.

Joreskog, K. G. (1971). Statistical analysis of sets ofcongeneric tests. Psychometrika, 36, 109-133.

Joreskog, K. G., & Sorbom, D. (1996). LISREL8 user’s guide. Chicago: Scientific SoftwareInternational.

Komaroff, E. (1997). Effect of simultaneous vio-lations of essential τ -equivalence and uncorre-lated error on coefficient α. Applied PsychologicalMeasurement, 21(4), 337-348.

Lord, F. M., & Novick, M. R. (1968). Statistical the-ories of mental test scores with contributions byAllan Birnbaum. Reading, MA: Addison-Wesley.

McDonald, R. P. (1970). The theoretical founda-tions of common factor analysis, principal factoranalysis, and alpha factor analysis. British Jour-nal of Mathematical and Statistical Psychology,23, 1-21.

McDonald, R. P. (1981). The dimensionality of testsand items. British Journal of Mathematical andStatistical Psychology, 34, 100-117.

McDonald, R. P. (1985). Factor analysis and relatedmethods. Hillsdale, NJ: Lawrence Erlbaum.

McDonald, R. P. (1999). Test theory: A unifiedapproach. Mahwah, NJ: Lawrence Erlbaum.

Miller, M. B. (1995). Coefficient alpha: A basic intro-duction from the perspectives of classical testtheory and structural equation modeling. Struc-tural Equation Modeling, 2(3), 255-273.

Muthen, B. O. (2002). Beyond SEM: Generallatent variable modeling. Behaviormetrika, 29(1),81-117.

Muthen, L. K., & Muthen, B. O. (2001). Mplus user’sguide: Statistical analysis with latent variables(2nd ed.). Los Angeles: Author.

Novick, M. R., & Lewis, C. (1967). Coefficient alphaand the reliability of composite measurements.Psychometrika, 32(1), 1-13.

Nunnally, J. C., & Bernstein, I. H. (1994). Psy-chometric theory (3rd ed.). New York: McGraw-Hill.

Raykov, T. (1997a). Estimation of composite reliabil-ity for congeneric measures. Applied Psychologi-cal Measurement, 21(2), 173-184.

Raykov, T. (1997b). Scale reliability, Cronbach’scoefficient alpha, and violations of essentialtau-equivalence with fixed congeneric compo-nents. Multivariate Behavioral Research, 32(4),329-353.

Raykov, T. (1998a). A method for obtaining standarderrors and confidence intervals of composite relia-bility for congeneric items. Applied PsychologicalMeasurement, 22(4), 369-374.

Raykov, T. (1998b). Coefficient alpha and com-posite reliability with interrelated nonhomoge-neous items. Applied Psychological Measurement,22(4), 375-385.

Raykov, T. (2001a). Bias of coefficientα for fixed con-generic measures with correlated errors. AppliedPsychological Measurement, 25(1), 69-76.

Raykov, T. (2001b). Estimation of congeneric scalereliability using covariance structure analysis withnonlinear constraints. British Journal of Mathe-matical and Statistical Psychology, 54, 315-323.

Raykov, T. (2002). Analytic estimation of standarderror and confidence interval for scale reliability.Multivariate Behavioral Research, 37(1), 89-103.

Raykov, T., & Penev, S. (2001). The problemof equivalent structural models: An individualresidual perspective. In G. A. Marcoulides &R. E. Schumacker (Eds.), New developments andtechniques in structural equation modeling(pp. 297-321). Mahwah, NJ: Lawrence Erlbaum.

Raykov, T., & Shrout, P. E. (2002). Reliability ofscales with general structure: Point and intervalestimation using a structural equation modelingapproach. Structural Equation Modeling, 9(2),195-212.

Reuterberg, S.-E., & Gustafsson, J.-E. (1992). Con-firmatory factor analysis and reliability: Testing

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from

J. F. LUCKECORRELATED ITEM ERROR 125

measurement model assumptions. Educationaland Psychological Measurement, 52, 795–811.

Rosenbaum, P. (1988). Item bundles. Psychometrika,53, 349-359.

Rozeboom, W. W. (1966). Foundations of the theoryof prediction. Homewood, IL: Dorsey.

Rozeboom, W. W. (1989). The reliability of a lin-ear composite of nonequivalent subtests. AppliedPsychological Measurement, 13(3), 277-283.

Shevlin, M., Miles, J. N. V., Davies, M. N. O., &Walker, S. (2000). Coefficient alpha: A usefulindicator of reliability? Personality and IndividualDifferences, 28, 229-237.

Tuerlinckx, F., & De Boeck, P. (2004). Modelsfor residual dependencies. In P. De Boeck &M. Wilson (Eds.), Explanatory item responsemodels: A generalized linear and nonlinearapproach. New York: Springer.

Wainer, H., & Kiely, G. (1987). Item clustersand computerized adaptive testing: A case fortestlets. Journal of Educational Measurement, 24,185-201.

Wilson, M., & Adams, R. J. (1995). Raschmodels for item bundles. Psychometrika, 60(2),181-198.

Zimmerman, D. W., & Williams, R. H. (1980). Isclassical test theory ‘robust’ under violations ofthe assumption of uncorrelated errors? CanadianJournal of Psychology, 34(3), 227-237.

Zimmerman, D. W., Zumbo, B. D., & Lalonde, C.(1993). Coefficient alpha as an estimate of test

reliability under violations of two assumptions.Educational and Psychological Measurement,53, 33-49.

Zumbo, B. D., & Rupp, A. (2004). Responsiblemodeling of measurement data for appropriateinferences: Important advances in reliability andvalidity theory. In D. Kaplan (Ed.), The Sage hand-book of quantitative methodology for the socialsciences. Thousand Oaks, CA: Sage.

Acknowledgments

I thank Mark R. Reiser, Mark D. Reckase, threeanonymous reviewers, and two anonymous review-ers of a related manuscript for suggesting numerousimprovements. Portions of this work were presentedat the 2003 Conference of Texas Statisticians at TexasA & M University in College Station, Texas; the 2003Joint Statistical Meetings in San Francisco, Califor-nia; and a 2004 meeting of the San Antonio Chapterof the American Statistical Association in San Anto-nio, Texas. This work was partially supported byNIH-NINR grant R01 NR07891.

Author’s Address

Address correspondence to Joseph F. Lucke, Univer-sity of Texas Health Science Center at San Antonio,Mail Code 7951, 7703 Floyd Curl Drive, San Anto-nio, TX 78229-3900; e-mail: [email protected].

© 2005 SAGE Publications. All rights reserved. Not for commercial use or unauthorized distribution. by on March 6, 2008 http://apm.sagepub.comDownloaded from