psychological assessment · 2013. 8. 19. · psychological assessment how well is psychometric g...

9
Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew R. Reynolds, Randy G. Floyd, and Christopher R. Niileksela Online First Publication, August 12, 2013. doi: 10.1037/a0034102 CITATION Reynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013, August 12). How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests. Psychological Assessment. Advance online publication. doi: 10.1037/a0034102

Upload: others

Post on 01-Apr-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

Psychological Assessment

How Well Is Psychometric g Indexed by GlobalComposites? Evidence From Three Popular IntelligenceTestsMatthew R. Reynolds, Randy G. Floyd, and Christopher R. NiilekselaOnline First Publication, August 12, 2013. doi: 10.1037/a0034102

CITATIONReynolds, M. R., Floyd, R. G., & Niileksela, C. R. (2013, August 12). How Well Is Psychometricg Indexed by Global Composites? Evidence From Three Popular Intelligence Tests.Psychological Assessment. Advance online publication. doi: 10.1037/a0034102

Page 2: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

How Well Is Psychometric g Indexed by Global Composites?Evidence From Three Popular Intelligence Tests

Matthew R. ReynoldsUniversity of Kansas

Randy G. FloydUniversity of Memphis

Christopher R. NiilekselaUniversity of Kansas

Global composites (e.g., IQs) calculated in intelligence tests are interpreted as indexes of the generalfactor of intelligence, or psychometric g. It is therefore important to understand the proportion of variancein those global composites that is explained by g. In this study, we calculated this value, referred to ashierarchical omega, using large-scale, nationally representative norming sample data from 3 popularindividually administered tests of intelligence for children and adolescents. We also calculated theproportion of variance explained in the global composites by g and the group factors, referred to as omegatotal, or composite reliability, for comparison purposes. Within each battery, g was measured equallywell. Using total sample data, we found that 82%–83% of the total test score variance was explained byg. The group factors were also measured in the global composites, with both g and group factorsexplaining 89%–91% of the total test score variance for the total samples. Global composites areprimarily indexes of g, but the group factors, as a whole, also explain a meaningful amount of variance.

Keywords: intelligence, composite reliability, omega, psychometric g, IQ

Intelligence tests are among the most popular and useful of allmeasurement devices in psychology (Flanagan & Harrison, 2012;Kaufman, 2009). The factor structure of the subtest scores of theintelligence test relative to the measurement structure of the test isthe focus of much factor-analytic research. Multiple latent sourcesof variance in those subtest scores are described in that researchvia factor loadings (e.g., Canivez & Watkins, 2010; Reynolds,Keith, Fine, Fisher, & Low, 2007). The general factor (psycho-metric g or simply g) is often interpreted as one latent source ofvariance in all subtest scores, with all subtests having positivenonzero loadings on g (Jensen, 1998).

A global composite (e.g., an IQ) calculated as the sum of subtestscores, and not a score from an individual subtest, is intended toindex g. Therefore, although subtest loadings on the g factor (i.e.,g loadings) provide estimates of how strongly g is measured in theindividual subtests, and consequently, these g loadings providehints about g measurement in a global composite when they areaveraged across subtests, it is not clear how well global compositesmeasure g. Such estimates are important if these composites areinterpreted as indexes of g. In this study, the proportion of variance

in global composite scores that is accounted for by the g factor wascalculated for three popular individually administered intelligencetests designed for children and adolescents.

Psychometric intelligence is described in multifactored hierar-chical models (Gustafsson, 1984). One such model, based onCattell–Horn–Carroll theory (CHC), is a general framework thatcurrently includes g at the apex, about eight to 10 broad cognitiveability factors (e.g., Verbal Comprehension–Knowledge [Gc],Novel Reasoning [Gf], Visual–Spatial Ability [Gv], Short-TermMemory [Gsm], and Processing Speed [Gs]), and numerous nar-row abilities (Schneider & McGrew, 2012). CHC theory is a workin progress, and in fact Carroll (1993) and Horn (Horn & Blank-son, 2005) disagreed on evidence of a theory of g, but CHC theoryis a useful framework for research and applied practice because itdescribes intelligence at three levels of generality, with scoresfrom intelligence batteries often reflecting each of these levels(Keith & Reynolds, 2010). The global composite (IQ) is consid-ered an indicator of g, broad composite scores as indicators ofCHC broad abilities (otherwise known as group factors), andindividual subtest scores as indicators of narrow abilities. In thisstudy, the focus was on the indicator of g within a test battery, theglobal composite.

More than 85 years ago, Spearman (1927, Appendix A) pro-posed an estimate that was supposed to represent the g loading ofthe global composite (also see Jensen, 1998, p. 104). More re-cently, model-based estimates calculated from structural equationmodels have been proposed (e.g., McDonald, 1999). McDonaldhas referred to these estimates as coefficient omega. In terms ofclassical test theory, omega is both a reliability and validity indexbecause it is the true score variance in the composite that isattributed to a common factor (Bollen, 1989; Gustafsson & Åberg-

Matthew R. Reynolds, Department of Psychology and Research inEducation, University of Kansas; Randy G. Floyd, Department of Psychol-ogy, University of Memphis; Christopher R. Niileksela, Department ofPsychology and Research in Education, University of Kansas.

Correspondence concerning this article should be addressed to MatthewR. Reynolds, Psychology and Research in Education, Joseph R. PearsonHall, 1122 West Campus Rd., University of Kansas, Lawrence, KS 66045.E-mail: [email protected]

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

Psychological Assessment © 2013 American Psychological Association2013, Vol. 25, No. 4, 000 1040-3590/13/$12.00 DOI: 10.1037/a0034102

1

Page 3: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

Bengtsson, 2010). Omega is a general estimate and may be used todetermine the saturation of one factor (e.g., g) in a composite evenwhen there are multiple factors (e.g., g and group factors) contrib-uting to the composite, and it may also be used to calculate thesaturation of all of the factors contributing to the composite. Theformer estimate has been referred to as hierarchical omega (�h,Zinbarg, Yovel, Revelle, & McDonald, 2006), whereas the latterhas been subsequently referred to as omega total (�T; Revelle &Zinbarg, 2009). That is, �h may be used to quantify the saturationof g in a global composite, whereas �T may be used to quantify thesaturation of all of the common factors in a global composite (e.g.,g and group factors) and is also therefore described as compositereliability. The distinction is important because despite the multi-factorial nature of intelligence, in practice, global composites aremostly interpreted as indexes of g, not of intelligence in general(g � group factors). In this study, we compared these two esti-mates because although global composites are intended to index g,it is also important to understand how well they index all of thecommon factors.

Purpose

Because global composite scores are interpreted as indexing g,a latent construct that has been subjected to years of study and hasa wide range of predictive validity (Jensen, 1998), it is especiallyimportant to estimate how well global composite scores measurethis construct. The primary purpose of this study was to estimatethe g saturation in global composites from three popular individ-ually administered intelligence tests designed for children andadolescents. This g saturation was indexed by �h. The g saturation(�h) was also compared with the saturation of all of the factors bycalculating �T., or composite reliability.

Method

Participants

The participants were children and adolescents included in thenorming samples for three intelligence tests. These norming sam-ples were representative of the United States (Elliott, 2007b;Kaufman & Kaufman, 2004; Wechsler, 2003). The total samplesfor each test ranged from 2175 to 2200. There were 200 partici-pants per 1-year age group for the second edition of the Differen-tial Abilities Scales (DAS–II; Elliott, 2007a), 125 to 200 partici-pants per 1-year age group for the second edition of the KaufmanAssessment Battery for Children (KABC–II; Kaufman & Kauf-man, 2004), and 200 participants per 1-year age group for thefourth edition of the Wechsler Intelligence Scale for Children(WISC–IV; Wechsler, 2003).

Measurement Instruments

Differential Abilities Scales–Second Edition (DAS–II). TheDAS–II (Elliott, 2007a) is an individually administered intelli-gence test developed for children and adolescents. Fifteen of theschool-age battery subtests for children ages 7–17 were used inthis study. The DAS–II provides composites representing sixgroup factors: Nonverbal Reasoning, Verbal, Spatial, VerbalShort-Term Memory, Visual–Verbal Memory, and Cognitive

Speed. The g factor is operationalized in the test as GeneralConceptual Ability (GCA), comprising six subtests. Compositescore reliability estimates for the GCAs ranged from .96 to .97 inthe ages used in this study (Elliott, 2007b, pp. 127–128).

Kaufman Assessment Battery for Children–Second Edition(KABC–II). The KABC–II is an individually administered mea-sure of cognitive abilities developed for children and adolescents.Data from 16 subtests used with children ages 7–18 were includedin this study. The KABC–II provides composites representing fivegroup factors: Crystallized Knowledge, Fluid Reasoning, Visual–Spatial Ability, Long-Term Retrieval, and Short-Term Memory.The g factor is operationalized in the test as the Fluid–CrystallizedIndex (FCI), comprising 10 subtests. Composite score reliabilityestimates for the FCIs ranged from .96 to .97 in the ages used inthis study (Kaufman & Kaufman, 2004, p. 88).

Wechsler Intelligence Scale for Children–Fourth Edition(WISC–IV). The WISC–IV is an individually administeredmeasure of cognitive abilities developed for children and adoles-cents ages 6–16. Fifteen subtests were used in this study. TheWISC–IV provides composites representing four group factors:Perceptual Reasoning, Verbal Comprehension, Processing Speed,and Working Memory. The g factor is operationalized in the test asthe Full Scale IQ (FSIQ), comprising 10 subtests. Composite scorereliability estimates for the FSIQs ranged from .96 to .97 in theages used in this study (Wechsler, 2003, p. 35).

Analytic Approach

Nested factor models. Covariance matrices for total samplesand for each 1-year age group for each test (DAS–II, KABC–II,WISC–IV) were created based on the subtest correlations andstandard deviations for the norm samples provided in the testmanuals (standard deviations for the KABC–II had to be obtainedfrom raw data files that were available because they are notprovided in the test manual). These matrices were used as inputdata. Mplus (Muthén & Muthén, 1998–2010) was used for allanalyses.

Nested factor models were estimated for each intelligence testwith the total sample and with each1-year age group data (seeGustafsson & Balke, 1993; Reynolds & Keith, 2013). In a nestedfactor model, the g factor is modeled as a first-order factor, withdirect effects on every subtest. More specific first-order factors (orgroup factors) are also modeled for a subset of interrelated sub-tests; thus, there are direct effects from two latent factors (i.e., gand group) on most subtests. All of these factors are independentfrom one another. We used nested factor models, rather thanhigher order models, to account for the hierarchical and multidi-mensional nature of intelligence. Although higher order models, inwhich g has direct effects on first-order factors and indirect effectson subtests through the first-order factors, are more consistent withintelligence theory (Jensen, 1998), the nested factor models aremore consistent with the scoring structure of intelligence tests(Keith & Reynolds, 2012). Regardless, nested factor models andhigher order models typically produce similar �h values becausethe average g loadings from the higher order model (calculated astotal g effects, normally only indirect effects through the first-orderfactors) are similar to the average g loadings in the nested factormodel (Zinbarg et al., 2006; see Niileksela, Reynolds, & Kaufman,2013, for an example).

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

2 REYNOLDS, FLOYD, AND NIILEKSELA

Page 4: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

All subtests included in each test battery were included in thenested factor models so that the best estimates of g loadings wereobtained, and because in many instances, it allowed for the groupfactors to be empirically identified without having to includeequality constraints on the loadings. Group factors are often in-dexed by composites comprising two subtests. A factor not corre-lated with other factors and with only two subtest indicators isempirically underidentified. In these circumstances in which thereare only two indicators, constraints must be included in the model(e.g., fix the two factor loadings to be equal) for proper modelidentification. However, test batteries often include additionalmeasures of these group factors; therefore, this problem of unde-ridentification was mostly circumvented in this study by includingthese additional subtests in the nested factor models. The estimatesassociated with these additional subtests, however, were not in-cluded the calculations of the omegas; only those associated withthe global composite were included in those calculations.1

The DAS–II nested factor model included 13 subtests thatloaded directly on the first-order g factor. In children ages 12 andyounger, g was also indicated by Phonological Processing, al-though this subtest was not associated with an additional first-order factor. In addition to g, six first-order group factors associ-ated with subsets of those subtests (i.e., two to three subtests) wereincluded. These group factors were independent of each other andof g. Only one group factor was indicated by more than twosubtests, so equality constraints were applied to the loadings ofgroup factors with two indicators to properly identify the model.For purposes of calculating the omegas, however, only factorloadings and residual variances from the six subtests contributingto the GCA were used: Pattern Construction, Recall of Designs,Matrices, Sequential and Quantitative Reasoning, Word Defini-tions, and Verbal Similarities.

The KABC–II model included 16 subtests with direct loadingson g. The model included four first-order group factors becausethere was not reliable Fluid Reasoning variance remaining once gwas accounted for (Gustafsson, 1984; Reynolds & Keith, 2007).Factor loadings and residual variances from 10 subtests contribut-ing to the FCI were used in the omega calculations; the GestaltClosure, Expressive Vocabulary, Hand Movements, Atlantis De-layed, and Rebus Delayed subtests were omitted.

The WISC–IV model included 15 subtests loading directly on g.There were four first-order group factors. Factor loadings andresidual variances from 10 subtests contributing to the FSIQ wereused in the omega calculations; the Picture Completion, Arithme-tic, Information, Word Reasoning, and Cancellation subtests wereomitted.

Model evaluation. We evaluated the models globally by ex-amining the fit indexes. Global indexes of stand-alone fit includedchi-square test of fit statistic, root-mean-square error of approxi-mation (RMSEA), comparative fit index (CFI), and standardizedroot-mean-square residual (SRMR). Models with “good” global fittypically have RMSEA values close to or less than .05, CFI valuesclose to or greater than .95, and SRMR values close to or less than.05 (Schermelleh-Engel, Moosbrugger, & Müller, 2003). In addi-tion, the quality of the parameter estimates, as indicated by im-plausible values, was evaluated. Omega calculations were basedonly on models that had acceptable global fit and parameterestimates.

Omega calculations. Both �h and �T values were calculatedfrom the nested factor model-based estimates. The estimates werecalculated in the total sample and in each 1-year age group for eachtest. First, �h was calculated as the square of the sum of subtest gloadings divided by the total variance in the subtest scores in-cluded in the composite (Gustafsson, 2002). This estimate repre-sented the proportion of variance in the global composite that wasaccounted for by g. The square root of �h represented the corre-lation between the composite score and latent g, which is the gloading for the composite (McDonald, 1999). Second, �T esti-mates were calculated; these estimates were based on all of thecommon factors in the nested factor model, including g and theother first-order group factors. The sum of subtest g loadingssquared, and the sum of each group’s factor subtest loadingssquared, were summed and divided by the total variance in thesubtest scores.

Results and Discussion

Models for each test using the total samples and their respectivefit indexes are shown in Figures 1, 2, and 3. Only the fit indexesfor the total samples are reported for each test because the fitstatistics for the age-specific models were similar (except that themodel chi-square value was larger in the total samples because oftheir large sample sizes). All of the models fit very well. Someslight adjustments, noted in Table 1, were made to models forconvergence purposes in some age groups. The bolded g loadingsin Figures 1–3 were the estimates included in �h calculations; thebolded g and group factor loadings were included in the �T

calculations.As shown in Table 1, the global composites from each test—the

GCA (Elliott, 2007a), the FCI (Kaufman & Kaufman, 2004), andthe FSIQ (Wechsler, 2003)—measure their respective g factorssimilarly. In the total sample for the DAS–II, the g saturation (�h)was .83, and across age groups, values ranged from .79 for 17-year-olds to .87 for 8- and 9-year-olds (SD � .03). In the totalsample for the KABC–II, the g saturation was .82, and valuesranged from .78 for 7- and 16-year-olds to .87 for 15-year-olds(SD � .03). Last, in the total sample for the WISC–IV, the gsaturation was .83, and values ranged from .79 for 6-year-olds to.85 for 15-year-olds (SD � .02). Therefore, using total sample datawithin each test, 82%–83% of the variance in the global compositescore was explained by g. About 82%–84% of the variance in theglobal composite scores within each test was explained by g whenthe estimates were averaged across the age groups (see last row inTable 1).

As shown in the third column for each test in Table 1, althoughg was predominantly measured by the global composites, thegroup factors were also measured. In the total sample for theDAS–II, the combined g and group factor saturation (�T) was .89,and values ranged from .87 for 17-year-olds to .93 for 8- and9-year-olds (SD � .02) across age groups. In the total sample for

1 Models were estimated using only the subtests included in the globalcomposites (i.e., other subtests were not included), and the findings weresimilar. Moreover, models were estimated that calculated omega valuesusing all available subtests and creating a composite from those subtests.As per the principle of aggregation, the omega values were larger when allsubtests were used to create a composite score. Interested readers shouldcontact the first author to obtain those results.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

3GENERAL FACTOR MEASUREMENT

Page 5: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

the KABC–II, the combined g and group factor saturation was .89,and values ranged from .87 for 8-year-olds to .92 for 15-year-olds(SD � .02). Last, in the total sample for the WISC–IV, thecombined g and group factor saturation was .91, and values rangedfrom .88 for 6- and 7-year-olds to .93 for 15-year-olds (SD � .02).Therefore, using total sample data within each test, 89%–91% ofthe variance explained in the global composite was explained by gand the group factors. About 90%–91% of the global compositevariance was explained by both g and the group factors when theestimates were averaged across the age ranges (see last row inTable 1). The variance explained by the group-factors was muchless than g, but the amount was not trivial, with an additional5%–12% of the variance explained by these group factors. Globalcomposites are primarily indexes of g, but they are also indexes ofintelligence in general.

One practical implication of these findings is related to thecalculation of confidence intervals around global composites. Asimple “true-score” confidence interval for a global compositescore, using the reliability estimate based on internal consistency,

will be narrower than the construct-, or validity-, based confidenceinterval for a global composite as an estimate of g. If cliniciansinterpret global composites as estimates of g, then it seems to makemore sense to use a validity-based confidence interval. For exam-ple, confidence intervals are often calculated from Cronbach’salpha, which for global composites from the tests used in this studyranged from .96–.97. Using these estimates, along with the typical15-point standard deviation, an average standard error of measure-

ment could be calculated as 15�1�.97, which equals 2.59. A95% confidence interval approximately equals �5.00 (i.e., 1.96 �2.59) and for a score of 100 would range from 95 to 105. Alter-natively, the validity-based confidence interval for the global com-posite as an estimate of g is based on �h (i.e., .82–.84 using thetotal samples, or .78–.87 in various age groups). Therefore, anaverage standard error of measurement, for example, is equal to

15�1�.83, or 6.18. A 95% confidence interval around the globalcomposite as an estimate of g is approximately �12, and for ascore of 100 would range from 88 to 112. The confidence interval

Figure 1. Differential Abilities Scales–Second Edition nested-factor model using the total sample data. Boldedg and bolded g and group factor estimates were used in the calculations of hierarchical omega and omega total,respectively. Residuals variances have been removed from the figure. CFI � comparative fit index; RMSEA �root-mean-square error of approximation; SRMR � standardized root-mean-square residual.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

4 REYNOLDS, FLOYD, AND NIILEKSELA

Page 6: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

based on the construct of interest, therefore, is much larger whenthe valid variance related to g is considered.2 Hence, a reliabilityestimate based on internal consistency should not be confused witha validity estimate (see Schneider, 2013).

Our findings may be generalized to another study in whichsimilar estimates were calculated with a different method. Asnoted, �h is the saturation of g in a global composite, or saiddifferently, it is the g loading squared. In a study by Keith,Kranzler, and Flanagan (2001), children completed two intelli-gence tests, the Woodcock–Johnson III Tests of Cognitive Abili-ties (WJ III; Woodcock, McGrew, & Mather, 2001) and CognitiveAssessment System (CAS; Naglieri & Das, 1997). Second-order gfactors from those batteries were found to be perfectly correlated.Thus, the authors also correlated the g factor from the WJ III withthe global composite from the CAS. The resulting g loading—thecorrelation between g and global composite—was .79, which is ag saturation of 62%, and less than all of the �h values in this study.

The CAS subtests are not as g saturated as subtests in the tests weincluded in this study (Canivez, 2011), which may account for thisdifference. Nevertheless, although �h may seem abstract, it isconceptualized as the square of the g loading of a global compos-ite.

One area for future research is related to g measurement as afunction of g. Recent research, including research with intelligencetests used in this study, has shown that g is measured less well inintelligence tests as g increases (Detterman & Daniel, 1989; Reyn-olds, 2013; Reynolds, Keith, & Beretvas, 2010; Tucker-Drob,2010). For example, in one study using estimates from a one-factorg model of DAS–II group factor-based composites, omegas wereshown to decrease substantially across the IQ range (Reynolds,2013). In that study, the omega values, calculated from models

2 We thank a reviewer for suggesting that we report this information.

Figure 2. Kaufman Assessment Battery for Children–Second Edition nested-factor model using the totalsample data. Bolded g and bolded g and group factor estimates were used in the calculations of hierarchicalomega and omega total, respectively. Residuals variances have been removed from the figure. CFI � compar-ative fit index; RMSEA � root-mean-square error of approximation; SRMR � standardized root-mean-squareresidual.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

5GENERAL FACTOR MEASUREMENT

Page 7: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

with DAS–II group factor based composites, at IQs of 100 weresimilar if not identical to �h values calculated from the modelswith the DAS–II subtests in this study. Thus, the findings maygeneralize; however, future research is needed for a definitiveanswer. If they do generalize, the implications are that the confi-dence intervals about global composites, used as indicators of g,become wider as g increases (and “confidence” in g measurementdecreases).

Limitations

We did not investigate whether g exists or whether g is ameaningful psychological construct. Rather, we investigatedhow much of g, if such a construct exists, is measured in globalcomposites of three popular intelligence tests. In general, thisapproach regards g as a reflective latent variable and not as aformative composite or an aggregate of multiple abilities. It isimportant to recognize, however, that the assumption ofwhether g is a reflective construct is debatable (van der Maas et

al., 2006). All models require assumptions, and some research-ers may not recognize the existence of g as a valid assumptionfor a model. Thus, it should be noted that omega values mayalso be calculated for each of the group factor indexes, withoutthe assumption of g or with g partialled. Such decisions shouldbe based on the theoretical model of the researcher. Estimatingall of these values was beyond the purpose of this study.

Similarly, it should be noted that the intelligence tests used inthis study were either fully or to some extent based on theory, andthey most notably have been studied with CHC theory. We mod-eled the data, however, so that they were more closely aligned withthe intended factor structure of the tests. We could have used CHCtheory-based predictions to improve model fit, and in some cases,those were used when the models did not converge. Nevertheless,the focus of this study was not on the group factors but the g factor.Future studies may investigate how our omission of theory influ-enced the results (including the use of a more theory-based g factormodel, the higher order factor model, rather than the nested factormodel used in this study).

Figure 3. Wechsler Intelligence Scale for Children–Fourth Edition nested-factor model using the total sampledata. Bolded g and bolded g and group factor estimates were used in the calculations of hierarchical omega andomega total, respectively. Residuals variances have been removed from the figure. CFI � comparative fit index;RMSEA � root-mean-square error of approximation; SRMR � standardized root-mean-square residual.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

6 REYNOLDS, FLOYD, AND NIILEKSELA

Page 8: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

Conclusion

The g factor saturation was estimated in three popular individuallyadministered intelligence tests using large-scale, nationally represen-tative norming sample data. The g factor was measured equally wellin the global composites within each test, but a combination of groupfactors were also measured in those scores. Previous research hasdemonstrated that for the most part, the same g factors are measuredacross intelligence batteries (e.g., Floyd, Reynolds, Farmer, & Kran-zler, in press; Johnson, Bouchard, Krueger, McGue, & Gottesman,2004; Keith et al., 2001).3 Given those findings, it would appear thatan insight made over 85 years ago by Spearman (1927, pp. 197–198)is supported. That is, different intelligence test composites show highcorrelations with each other and with g itself, and thus measure gsimilarly. Spearman called it the theorem of the indifference of theindicator. In this study, we have shown how well g is measured bythose indicators.

3 These findings are restricted to intelligence measures. Although gfactors from achievement and intelligence measures are highly correlated,they are less than perfectly correlated (Kaufman, Reynolds, Liu, Kaufman,& McGrew, 2012).

References

Bollen, K. A. (1989). Structural equations with latent variables. NewYork, NY: Wiley.

Canivez, G. L. (2011). Hierarchical structure of the Cognitive AssessmentSystem: Variance partitions from the Schmid–Leiman (1957) procedure.School Psychology Quarterly, 26, 305–317. doi:10.1037/a0025973

Canivez, G. L., & Watkins, M. W. (2010). Exploratory and higher orderfactor analyses of the Wechsler Adult Intelligence Scale–Fourth Edition

(WAIS–IV) adolescent subsample. School Psychology Quarterly, 25,223–235. doi:10.1037/a0022046

Carroll, J. B. (1993). Human cognitive abilities. Cambridge, England:Cambridge University Press.

Detterman, D. K., & Daniel, M. H. (1989). Correlations of mental testswith each other and with cognitive variables are highest for low-IQgroups. Intelligence, 13, 349–359. doi:10.1016/S0160-2896(89)80007-8

Elliott, C. D. (2007a). Differential Ability Scales–Second Edition. SanAntonio, TX: Pearson.

Elliott, C. D. (2007b). Differential Ability Scales–Second Edition: Intro-ductory and technical handbook. San Antonio, TX: Pearson.

Flanagan, D. P., & Harrison, P. H. (Eds.). (2012). Contemporary intellec-tual assessment, third edition: Theories, tests, and issues. New York,NY: Guilford Press.

Floyd, R. G., Reynolds, M. R., Farmer, R. L., & Kranzler, J. H. (inpress).Are the general factors from different child and adolescent intel-ligence tests the same? Results from a five-sample, six-test analysis.School Psychology Review.

Gustafsson, J.-E. (1984). A unifying model for the structure of intellectualabilities. Intelligence, 8, 179–203. doi:10.1016/0160-2896(84)90008-4

Gustafsson, J.-E. (2002). Measurement from a hierarchical point of view.In H. I. Braun, D. N. Jackson, & D. E. Wiley (Eds.), The role ofconstructs in psychological and educational measurement (pp. 73–95).Mahwah, NJ: Erlbaum.

Gustafsson, J.-E., & Åberg-Bengtsson, L. (2010). Unidimensionality andinterpretability of psychological instruments. In S. Embretson (Ed.),Measuring psychological constructs: Advances in model-based ap-proaches (pp. 97–121). Washington, DC: American Psychological As-sociation. doi:10.1037/12074-005

Gustafsson, J.-E., & Balke, G. (1993). General and specific abilities aspredictors of school achievement. Multivariate Behavioral Research, 28,407–434. doi:10.1207/s15327906mbr2804_2

Horn, J. L., & Blankson, A. (2005). Foundations for better understanding

Table 1Hierarchical Omega and Omega Total Estimates for the Differential Abilities Scales–II, Kaufman Assessment Battery for Children–II,and Wechsler Intelligence Scale for Children–IV Global Composites for the Total Samples and by Age

Sample

Differential Abilities Scales–II GCAKaufman Assessment Battery for

Children–II FCIaWechsler Intelligence Scale for

Children–IV FSIQ

N or nHierarchical

omegaOmega

total N or nHierarchical

omegaOmega

total N or nHierarchical

omegaOmega

total

Total 2,200 .83 .89 2,175 .82 .89 2,200 .83 .91Age 6 200 .79 .88Age 7 200 .85 .92 200 .78 .88 200d .82 .88Age 8 200 .87 .93 200 .81 .87 200e .82 .90Age 9 200 .87 .93 200 .81 .88 200 .84 .91Age 10 200b .86 .91 200 .79 .88 200 .83 .92Age 11 200 .84 .91 200 .83 .89 200 .83 .91Age 12 200 .84 .90 200 .82 .91 200de .83 .92Age 13 200 .81 .88 200 .82 .91 200 .82 .91Age 14 200 .84 .90 200 .83 .90 200ef .80 .91Age 15 200 .83 .90 150 .87 .92 200g .85 .93Age 16 200 .86 .91 150 .78 .88 200 .81 .92Age 17 200 .79 .87 150c .85 .91Age 18 125 .83 .91Mh .84 .91 .82 .90 .82 .91

Note. GCA � General Conceptual Ability; FCI � Fluid–Crystallized Index; FSIQ � Full Scale IQ.a Rebus and Atlantis were constrained equal for all ages. b Digits Forward and Digits Backward were constrained to be equal. c No Gestalt Closureloading on broad factors. d Matrix Reasoning and Picture Concepts were constrained to be equal. e Letter–Number Sequencing and Digit Span wereconstrained to be equal. f Block Design, Matrix Reasoning, and Picture Concepts were constrained to be equal. g Perceptual Reasoning factor notincluded. h Fisher’s z formula used to calculate mean.

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

7GENERAL FACTOR MEASUREMENT

Page 9: Psychological Assessment · 2013. 8. 19. · Psychological Assessment How Well Is Psychometric g Indexed by Global Composites? Evidence From Three Popular Intelligence Tests Matthew

of cognitive abilities. In D. P. Flanagan & P. L. Harrison (Eds.),Contemporary intellectual assessment, third edition: Theories, tests, andissues (pp. 73–98). New York, NY: Guilford Press.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport,CT: Praeger.

Johnson, W., Bouchard, T. J., Krueger, R. F., McGue, M., & Gottesman,I. I. (2004). Just one g: Consistent results from three test batteries.Intelligence, 32, 95–107. doi:10.1016/S0160-2896(03)00062-X

Kaufman, A. S. (2009). IQ testing 101. New York, NY: Springer.Kaufman, A. S., & Kaufman, N. L. (2004). Kaufman Assessment Battery

for Children–Second Edition. Circle Pines, MN: American GuidanceService.

Kaufman, S. B., Reynolds, M. R., Liu, X., Kaufman, A. S., & McGrew,K. S. (2012). Are cognitive g and academic g one and the same g? Anexploration on the Woodcock–Johnson and Kaufman tests. Intelligence,40, 123–138. doi:10.1016/j.intell.2012.01.009

Keith, T. Z., Kranzler, J. H., & Flanagan, D. P. (2001). What does theCognitive Assessment System (CAS) measure? Joint confirmatory fac-tor analysis of the CAS and the Woodcock–Johnson Tests of CognitiveAbility (3rd edition). School Psychology Review, 30, 89–119.

Keith, T. Z., & Reynolds, M. R. (2010). CHC theory and cognitiveabilities: What we’ve learned from 20 years of research. Psychology inthe Schools, 47, 635–650.

Keith, T. Z., & Reynolds, M. R. (2012). Using confirmatory factor analysisto aid in understanding the constructs measured by intelligence tests. InD. F. Flanagan & P. H. Harrison (Eds.), Contemporary intellectualassessment, third edition: Theories, tests, and issues (pp. 758–799).New York, NY: Guilford Press.

McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ:Erlbaum.

Muthén, L. K., & Muthén, B. O. (1998–2010). Mplus user’s guide (6thed.). Los Angeles, CA: Author.

Naglieri, J. A., & Das, J. P. (1997). Das–Naglieri Cognitive AssessmentSystem. Itasca, IL: Riverside.

Niileksela, C. R., Reynolds, M. R., & Kaufman, A. S. (2013). An alterna-tive Cattell–Horn–Carroll (CHC) factor structure of the WAIS–IV: Ageinvariance of an alternative model for ages 70–90. Psychological As-sessment, 25, 391–4045. doi:10.1037/a0031175

Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega andthe glb: Comments on Sijtsma. Psychometrika, 74, 145–154. doi:10.1007/s11336-008-9102-z

Reynolds, M. R. (2013). Interpreting intelligence test composite scores inlight of Spearman’s law of diminishing returns. School PsychologyQuarterly, 28, 63–76. doi:10.1037/spq0000013

Reynolds, M. R., & Keith, T. Z. (2007). A test of Spearman’s law ofdiminishing returns in hierarchical models of intelligence. Intelligence,35, 267–281. doi:10.1016/j.intell.2006.08.002

Reynolds, M. R., & Keith, T. Z. (2013). Measurement and statistical issuesin child assessment research. In D. H. Soklofske, C. R. Reynolds, & V.Schwean (Eds.), The Oxford handbook of child and adolescent assess-ment (pp. 48–83). New York, NY: Oxford University Press. doi:10.1093/oxfordhb/9780199796304.013.0003

Reynolds, M. R., Keith, T. Z., & Beretvas, S. N. (2010). Use of factormixture modeling to capture Spearman’s law of diminishing returns.Intelligence, 38, 231–241. doi:10.1016/j.intell.2010.01.002

Reynolds, M. R., Keith, T. Z., Fine, J. G., Fisher, M. E., & Low, J. (2007).Confirmatory factor structure of the Kaufman Assessment Battery forChildren–Second Edition: Consistency with Cattell–Horn–Carroll the-ory. School Psychology Quarterly, 22, 511–539. doi:10.1037/1045-3830.22.4.511

Schermelleh-Engel, K., Moosbrugger, H., & Müller, H. (2003). Evaluatingthe fit of structural equation models: Tests of significance and descrip-tive goodness-of-fit measures. Methods of Psychological Research On-line, 8, 23–74.

Schneider, W. J. (2013). What if we took our models seriously? Estimatinglatent scores in individuals. Journal of Psychoeducational Assessment,31, 186–201. doi:10.1177/0734282913478046

Schneider, W. J., & McGrew, K. S. (2012). The Cattell–Horn–Carrollmodel of intelligence. In D. P. Flanagan & P. L. Harrison (Eds.),Contemporary intellectual assessment, third edition: Theories, tests, andissues (pp. 91–144.). New York, NY: Guilford Press.

Spearman, C. (1927). The abilities of man: Their nature and measurement.New York, NY: Macmillan.

Tucker-Drob, E. M. (2009). Differentiation of cognitive abilities across thelife span. Developmental Psychology, 45, 1097–1118. doi:10.1037/a0015864

van der Maas, H. L. J., Dolan, C. V., Grasman, R. P. P. P., Wicherts, J. M.,Huizenga, H. M., & Raijmakers, M. E. J. (2006). A dynamical model ofgeneral intelligence: The positive manifold of intelligence by mutualism.Psychological Review, 113, 842–861.

Wechsler, D. (2003). The Wechsler Intelligence Scale for Children–FourthEdition. San Antonio, TX: The Psychological Corporation.

Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock–Johnson III Tests of Cognitive Abilities. Itasca, IL: Riverside.

Zinbarg, R. E., Yovel, I., Revelle, W., & McDonald, R. P. (2006). Esti-mating generalizability to a latent variable common to all of a scale’sindicators: A comparison of estimators for �h. Applied PsychologicalMeasurement, 30, 121–144. doi:10.1177/0146621605278814

Received April 8, 2013Revision received July 8, 2013

Accepted July 9, 2013 �

Thi

sdo

cum

ent

isco

pyri

ghte

dby

the

Am

eric

anPs

ycho

logi

cal

Ass

ocia

tion

oron

eof

itsal

lied

publ

ishe

rs.

Thi

sar

ticle

isin

tend

edso

lely

for

the

pers

onal

use

ofth

ein

divi

dual

user

and

isno

tto

bedi

ssem

inat

edbr

oadl

y.

8 REYNOLDS, FLOYD, AND NIILEKSELA