confirmatory factor analysis for applied research

3
This article was downloaded by: [University of Chicago] On: 08 June 2012, At: 23:45 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK The American Statistician Publication details, including instructions for authors and subscription information: http://amstat.tandfonline.com/loi/utas20 Confirmatory Factor Analysis for Applied Research Phil Wood University of Missouri-Columbia Available online: 01 Jan 2012 To cite this article: Phil Wood (2008): Confirmatory Factor Analysis for Applied Research, The American Statistician, 62:1, 91-92 To link to this article: http://dx.doi.org/10.1198/tas.2008.s98 PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://amstat.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Upload: phil

Post on 09-Aug-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confirmatory Factor Analysis for Applied Research

This article was downloaded by: [University of Chicago]On: 08 June 2012, At: 23:45Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

The American StatisticianPublication details, including instructions for authors and subscription information:http://amstat.tandfonline.com/loi/utas20

Confirmatory Factor Analysis for Applied ResearchPhil WoodUniversity of Missouri-Columbia

Available online: 01 Jan 2012

To cite this article: Phil Wood (2008): Confirmatory Factor Analysis for Applied Research, The American Statistician, 62:1,91-92

To link to this article: http://dx.doi.org/10.1198/tas.2008.s98

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://amstat.tandfonline.com/page/terms-and-conditions

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form toanyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representation that the contentswill be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug dosesshould be independently verified with primary sources. The publisher shall not be liable for any loss, actions,claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.

Page 2: Confirmatory Factor Analysis for Applied Research

bold as to switch to these more unusual approaches. We owe all of the authors athank you from the profession for following their instincts rather than followingthe crowd.

Richard J. CLEARYBentley College

Data Mining Methods and Models.Daniel T. LAROSE. Hoboken, NJ: Wiley, 2006, xvi + 322 pp., $95.50 (H),ISBN: 0-471-66656-4.

Data Mining Methods and Models is the second volume of a three-book se-ries on data mining authored by Larose. The following review was performedindependently of LaRose’s other two books. Paraphrasing from the Preface, thegoal of this book is to “explore the process of data mining from the point of viewof model building.” Nevertheless, the reader will soon be aware that this bookis not intended to provide a systematic or comprehensive coverage of variousdata mining algorithms. Instead, it considers supervised learning or predictivemodeling only, and it walks the reader through the data mining process merelywith a few selected modeling methods such as (generalized) linear modelingand the Bayesian approach.The book has seven chapters. Chapter 1 introduces dimension reduction,

with a focus on principal components analysis (PCA) types of techniques. Chap-ters 2, 3, and 4 provide a detailed coverage of simple linear regression, multi-ple linear regression, and logistic regression, respectively. Chapter 5 introducesnaive Bayes estimation and Bayesian networks. In Chapter 6, the basic idea ofgenetic algorithms is discussed. Finally, Chapter 7 presents a case study exam-ple of modeling response to direct mail marketing within the CRISP (cross-industry standard process) framework.This book is very easy to read, and this is absolutely the strength which many

readers, especially those nonstatistically oriented ones, will greatly appreciate.Predictive modeling is perhaps the most technical part in a data mining process.The author has done an excellent job in making this difficult topic accessibleto a broad audience. For example, I like the way in which Bayesian networksare introduced in Chapter 5. After the reader goes through a churn example onnaive Bayes estimation in a step-by-step manner, Bayesian belief networks be-come easily understood as natural extensions. The overall style of the book isclear and patient.The main limitation of the book is its limited coverage. An inspired reader

would expect to see a much more extended list of topics. Hastie, Tibishirani, andFriedman (2001) gave a full and more technical account of various data miningalgorithms. The inclusion of genetic algorithms in Chapter 6 seems novel whencompared to Hastie, Tibishirani, and Friedman (2001), but at the same time,a little unexpected as a separate chapter, since a genetic algorithm involves astochastics search scheme, which is somewhat involved given the elementarynature of this text. Another noteworthy issue is that the author does not make anattempt to distinguish between conventional statistical analysis and data mining.I found a few errors. On Page 25, for example, it should be ai = 1, instead

of ai = 1/4. Also, in the frame on the top of Page 211, it might have been“Posterior Odds,” instead of “Posterior Odds Ratio.”The book uses three different software packages to implement the ideas in-

cluding SPSS with Clementine, Minitab, and WEKA, which might not be ap-pealing. On the other hand, it is justifiable as it allows one to perform datamining with affordable costs.In summary, I recommend this fairly readable book for adoption in a

graduate-level introductory course on data mining, especially when the studentscome from varied backgrounds.

Xiaogang SUUniversity of Central Florida

REFERENCES

Hastie, T., Tibishirani, R., and Friedman, J. (2001), The Elements of StatisticalLearning, New York: Springer.

Confirmatory Factor Analysis for Applied Research.Timothy A. BROWN. NewYork, NY: Guilford Publications, 2006, xviii+475pp., $74.00 (H), ISBN: 1-59385-275-4; $48.00 (P), ISBN: 1-59385-274-6.

I was excited to see a book on the topic of confirmatory factor analysis(CFA). As the author and editor note, this topic often receives brief treatmentin discussions of structural equation models (SEM). However, it is foundationalfor construction of the often-elegant yet complicated SEMs advanced in the lit-erature.

The book has strengths as an applied text, as evidenced by the computer pro-grams and path diagrams which accompany several examples. The examplesare, at least for a psychological audience, well-chosen and illustrate the inter-pretation of the applied examples in addition to the statistical mechanics anddraw on the author’s longtime experience in the field of psychology. Often, itis precisely these interpretational issues which get lost in the transition betweenclassroom instruction and students’ (or faculty’s) application of statistics to aparticular research situation. The text has a well-written index and has a com-panion Web site with software programs and the datasets presented in the text.The text is better than many structural equation texts in its inclusion of chap-

ters which survey models for the analysis of multitrait multimethod matrices[see Eid and Diener (2006) for an equally readable and arguably more thoroughtreatment of current thinking in multitrait multimethod models], factorial invari-ance, Horn’s parallel analysis, Raykov’s structural models for the assessment ofinternal consistency, approaches to assessment of statistical power in CFA, cat-egorical models for ordered polytomies, and bootstrap estimates of standarderrors as an alternative to their normal theory counterparts.Despite these strengths, my evaluation of the text is that it is largely prob-

lematic. I would not recommend its use as either a text in an applied course oras an adjunct to more rigorous texts. Although I have several reservations abouttopics in the text, I have taken more than the usual review space to detail writingI find particularly problematic.In terms of content, although the new topics are useful additions, the book

would profit from a brief introduction to the fact that the best SEM and CFApractice results from the evaluation of competing models to explain the data.Students (and other researchers) frequently do not understand this. As manytexts, this text relies primarily on analysis of a single model considered in iso-lation.Although the text contains good examples of reasoned interpretation of fac-

tor and CFA models, I would hesitate to have students cite the text as an au-thoritative source. Much of the text appears to be taken from classroom lecturesresulting in an informal tone, and it uses undefined colloquial terms when well-defined terms would have been preferable. Despite an attempt to summarizesome major points in table form, the typed lecture format of the exposition re-sults in many explanations, cautions, and guides to best practice being scatteredthroughout the text with sometimes conflicting result.In some places, the language is vague or misleading. There appears to be a

repeated (and unfounded) concern with the fact that iteration is involved in SEMmodels under various loss functions: On page 175, in a discussion of a loadingassociated with a misspecified model, we read “This inflated estimate reflectsthe attempts of the iterations to produce the observed correlation”; the authornotes “Factor loadings are inflated in the iterative process” on page 182; and“Factor loadings are lowered in the iterations to better approximate” (p. 186).I am fairly sure the author understands that model misspecification (or possi-bly the loss function used) results in the incorrect parameter estimate (whichcan, depending on the situation, result in deflation as much as inflation) and notthe fact that iterations were used to arrive at the answer. A naı̈ve reader couldwell conclude from this language that the way to fix an incorrect parameterestimate is to change the default iteration procedures. Many statements are sim-ply wrong or outdated, particularly those regarding the capabilities of specificsoftware (e.g., M-plus).Other sections of common patterns are poorly stated or argued. Page 88:

“When measurement error is systematic (e.g., method effects) it can also leadto positively biased parameter estimates.” Actually, I would argue that seriouslybiased parameter estimates are more likely to occur when differential patterns ofmeasurement error in the independent variables are present. If uniform system-atic method effects are present across all independent variables, no positivelybiased parameter estimates will result except for the regression intercept. Page50: “A key limitation of ordinary least squares approaches such as correlationaland multiple regression analysis is the assumption that variables have been mea-sured without error.” Actually, it is only the independent variables which areassumed to be measured without error. Measurement error in the usual classi-cal test sense may be present and has no effect on unstandardized regressioncoefficients (though it does effect standardized coefficients and the confidenceintervals of the unstandardized parameters).Finally, even many of the recommendations for best practice are somewhat

questionable. Here are a few short examples. On page 182 we find a recom-mendation to drop nonsignificant correlated errors. It has been my experiencethat at times the effect of dropping correlated errors can have a large impacton other parameters in the model and the researcher would be better advisedto conduct models for several scenarios as a sensitivity analysis. On page 202:“Because latent variable software programs are capable of evaluating whether agiven model is identified, it is often most practical to simply try to estimate thesolution and let the computer determine the model’s identification status.” Frompage 301: “Given the number of parameters that are constrained in invarianceevaluations . . . it is possible that some parameters will differ by chance.” (Ac-tually, that is what the model comparison procedures in invariance analysis are

The American Statistician, February 2008, Vol. 62, No. 1 91

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go]

at 2

3:45

08

June

201

2

Page 3: Confirmatory Factor Analysis for Applied Research

designed to control for.)In summary, although the intended scope and attempt to didactically explain

good practice by example were aspects which initially attracted me to this text,I find it too flawed to be used even as an adjunct to a more serious treatmentof CFA. Many better discussions of CFA are available elsewhere in publishedarticles and texts.

Phil WOOD

University of Missouri-Columbia

REFERENCES

Eid, M., and Diener, E. (2006), Handbook of Multimethod Measurement in Psy-chology, Washington, DC: American Psychological Association.

Multivariable Analysis: A Practical Guide for Clinicians (2nd ed.).Mitchell H. KATZ. New York, NY: Cambridge University Press, 2006,xv+203 pp., $120.00 (H), ISBN: 0-521-84051-1; $52.00 (P), ISBN 0-521-54985-X.

Study Design and Statistical Analysis: A Practical Guide for Clinicians.Mitchell H. KATZ. New York, NY: Cambridge University Press, 2006, xii +188 pp., $95.00 (H), ISBN: 0-521-82675-6; $45.00 (P),ISBN: 0-521-53407-0.

Both of these books are subtitled A Practical Guide for Clinicians, and theintended audience is researchers who have limited access to biostatistical sup-port. The author assumes a basic familiarity with statistics and acknowledgesa debt to more comprehensive works in each of these respective areas, includ-ing Glantz (2002), Rosner (2000), Everett (2003), and Friedman, Furberg, andDeMets (1999). References to further details on statistical methodology are pro-vided in footnotes sprinkled liberally throughout. The books eschew any discus-sion of mathematical derivations, equations, and symbols, but instead focus onstatistical concepts and practical advice.The basic layout of both books is a question and answer format that novice

readers will find helpful. Although investigators would benefit from reading ei-ther book from cover to cover, they also have the option to quickly turn to thetopic(s) of interest. In both books, topics are introduced in chronological order:the development of a research question, choosing an appropriate experimentaldesign, the outcome of interest, choosing the appropriate statistical methodol-ogy and/or model, the interpretation of the results, and the writing of a researchpaper. The exposition in both books is greatly enhanced by the author’s choiceof excellent examples demonstrating many of the methodologies, including rel-evant tables and figures, culled and referenced from the clinical literature. Im-portant definitions and tips are also provided in boxes in the margins of the textsfor quick reference. Although out of the scope of both books, I believe that thepractical advice provided should extend to the presentation of annotated outputfrom various statistical software packages. Another aspect these books share istheir relatively brief length (approximately 200 pages each). This length is ef-fective for the focused discussion inMultivariable Analysis, but it is much to thedetriment of the more broad coverage in Study Design and Statistical Analysis.Chapters 1–4 ofMultivariable Analysis motivate discussion and indicate for

which types of research questions and outcomes multivariate methods are use-ful. The text focuses on the three most common procedures: multiple linear re-gression, multiple logistic regression, and proportional hazards analysis. Chap-ters 5–10 are the heart of the book, and they cover the underlying assumptionsof each method, how to set up the models, perform the analysis, interpret theresults, and check assumptions. Statisticians will be gratified to read the carefulexplanations of the assumptions necessary for each analysis, as well as how tocheck that the assumptions after the analysis is complete.Although many statisticians may be put off by the idea of the statistician as

a “specialist” rather than a collaborator, there is a useful section (14.4) entitled“How can I get best use of my biostatistician?” Readers would be well servedto take the author’s advice in this section about consulting with a statistician onsample size, because the earlier discussion of this topic (Section 7.4) is wanting.Since mathematical calculations are outside the realm of this book, the reader isreferred to sample size software packages which, in my experience, are not par-ticularly user friendly. Chapters 11–15 review special topics. Curiously, modelvalidation is not covered until Chapter 13 in the context of prognostic modelsof disease.This second edition is little changed from the first. Topics which now re-

ceive more attention include multivariate analysis of variance, general estimat-ing equations, mixed effects models, and propensity scores for matching in clus-tered samples. In addition, the author urges much more caution in the use ofstepwise variable selection algorithms in computer programs.In contrast to Multivariable Analysis, the author of Study Design and Sta-

tistical Analysis tries to bite off much more than one can chew in 200 pages.

Chapter 1 provides some useful advice on how to choose a research question.However, Chapter 2 covers a litany of topics in designing a study in only 30pages. This chapter starts by discussing the different types and appropriate usesof randomized versus observational studies. The author then discusses threedifferent types of randomized studies including “randomization to two or moregroups” (although curiously he never uses the term “parallel design”), cross-over designs, and factorial designs. Different types of observational studies arealso covered. The chapter further goes on to a number of important conceptssuch as bias, confounding, different aspects of randomization (e.g., equipoise,stratification, blocking, equal and unequal allocations), the construction andwording of hypothesis statements, types of outcome variables (e.g., continu-ous, dichotomous, ordinal, nominal), sample size, study costs, and institutionalreview boards. Unfortunately, this last topic is the only mention of ethical is-sues in clinical research. For example, there is no mention of the Declaration ofHelsinki.The third chapter on Data Management introduces many important aspects

of this critically important function of actually running a clinical trial. However,many important issues are omitted including the need to blind data managementpersonnel to treatment allocation, the coding of medications and adverse expe-rience information using standard terminology such as that provided by Med-DRA and WHO, as well as regulatory issues related to source documents andelectronic data systems.Chapters 4–6 are the heart of this book, introducing univariate, bivariate,

and multivariate statistics as implemented in clinical trials. As in MultivariableAnalysis, the author shines in describing statistical methods and their applica-tion to research questions. Basic statistical methods are covered in detail alongwith important concepts such as underlying distributional assumptions, mea-sures of central tendency and variability, confidence intervals, and measures ofassociation, as well as definitions of important statistics such as odds ratios andrelative risk. Also included are discussions of the analysis of variance (ANOVA)and degrees of freedom, and their relation to linear regression; nonparametricstatistics; multiplicity of testing; survival analysis; censoring; and repeated mea-sures. Chapter 6 provides a brief (seven pages) overview of multivariate statis-tics. As in the first book, there are many good examples from the literature withtables and figures included. As in earlier chapters, however, there are a numberof important omissions. Statisticians may be dismayed to see the discussion ofthe central limit theorem relegated to a footnote.Chapter 7 covers sample size calculations fairly thoroughly, however, this

discussion suffers from the same lack of practical advice on how to do the calcu-lations or how to obtain results from the statistical software programs. Chapters8–11 discuss special topics such as the development of diagnostic and prognos-tic tests of disease, statistics and causality, and publishing research.In conclusion, I believe either of these books to be a useful introduction to

a novice investigator, especially for those with little access to a biostatistician.Multivariable Analysis is more successful in its coverage of its targeted subjectmatter. Study Design and Statistical Analysis is less successful due to too broadof a reach. This is not surprising given that entire books have been devoted tomost of the individual topics covered in Study Design and Statistical Analysis. Iwould not recommend either of these books as a teaching tool due to their lackof exercises and their lack of ties to statistical software.

Charles L. LISSMerck Research Labs, UG1D-38

REFERENCES

Everett, B. (2003), Medical Statistics from A to Z, New York: Cambridge Uni-versity Press.

Friedman, L., Furberg, C., and DeMets, D. (1999), Fundamentals of ClinicalTrials (3rd ed.), New York, Springer.

Glantz, S. (2001), Primer of Biostatistics, New York: McGraw-Hill.

Rosner, B (2000), Fundamentals of Biostatistics (5th ed.), Pacific Grove, CA:Duxbury.

A Pocket Guide to Epidemiology.David G. KLEINBAUM, Kevin M. SULLIVAN and Nancy D. BARKER. NewYork, NY: Springer, 2007, 281 pp., $39.95 (P), ISBN 978-0-387-45964-6.

A Pocket Guide to Epidemiology is a useful addition to the ready referencegenre of epidemiological texts. However, before purchasing it, make sure youhave big pockets (not deep pockets) as this pocket guide measures 6 18 inchesby 9 14 inches, calling into question its “pocket” designation. The text offers acomprehensive look at the salient fundamentals of epidemiology. It begins witha discussion of epidemiological studies, touching on important issues such asstudy design, measures of effect, and, in later chapters, bias, confounding, andmatching. Organized logically and replete with classic examples, the text offers

92 Reviews of Books and Teaching Materials

Dow

nloa

ded

by [

Uni

vers

ity o

f C

hica

go]

at 2

3:45

08

June

201

2