bayesian item response modeling: theory and applications by fox, jean-paul

3
Journal of Educational Measurement Fall 2011, Vol. 48, No. 3, pp. 357–359 Book Review Fox, Jean-Paul (2010). Bayesian Item Response Modeling: Theory and Applications. New York, NY: Springer. Reviewed by Peter Baldwin National Board of Medical Examiners I have a great weakness for reference books. To have at my fingertips thoughtful and sage advice on whatever may be running through my mind is a great comfort (and distraction). Moreover, I show no signs of outgrowing this fondness; on the contrary, the more I learn, the more I rely on such resources. Regrettably, this positive relationship probably is not causal; the relevant causal relationship more likely is along the lines of “the more I learn, the more I forget.” In any case, for these reasons I was delighted to add Jean-Paul Fox’s excellent new book Bayesian Item Response Modeling, to my library. Interest in Bayesian item response models has grown steadily since Swaminathan and Gifford (1982, 1985, 1986), Tsutakawa and Lin (1986), Mislevy (1986), and others popularized the notion that so-called empirical (parametric) Bayesian item re- sponse models, and Bayesian thinking more generally, had the potential to enhance modern test theory. The development of more sophisticated estimation strategies— Markov chain Monte Carlo (MCMC) methods, Gibbs sampling, and Metropolis- Hastings algorithms—broadened item response theory (IRT) applications to include increasingly complex fully Bayesian item response modeling. These technical ad- vances in turn made possible software packages such as, WinBUGS (Lunn, Thomas, Best, & Speigelhalter, 2000), R (R Development Core Team, 2010), and S+ (TIBCO Software, 2009) that allowed advanced learners as well as experts to estimate such models. This brings us to the present day, where we find the democratization of Bayesian item response models nearly complete, excepting of course the wisdom to use such models judiciously. Such wisdom is unlikely to arise spontaneously or, for that mat- ter, under any circumstances; thus, the importance of Fox’s book is considerable. It is to the author’s credit that he so lucidly explains the technical features and (although perhaps to a lesser extent), the theoretical rationales of the models and strategies presented in the book rather than merely providing instructions for their use. The author’s commitment to explication is all the more notable given that choosing to be descriptive rather than prescriptive runs the risk (at least one can hope) of alienating those readers who merely want to follow a set of defined procedures. The book is loosely organized into three sections: (i) an introduction and back- ground to IRT and Bayesian methods, (ii) Bayesian modeling of standard item response models including the evaluation and comparison of such models, and Copyright c 2011 by the National Council on Measurement in Education 357

Upload: peter-baldwin

Post on 21-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Educational MeasurementFall 2011, Vol. 48, No. 3, pp. 357–359

Book Review

Fox, Jean-Paul (2010). Bayesian Item Response Modeling: Theory and Applications.New York, NY: Springer.

Reviewed byPeter Baldwin

National Board of Medical Examiners

I have a great weakness for reference books. To have at my fingertips thoughtfuland sage advice on whatever may be running through my mind is a great comfort(and distraction). Moreover, I show no signs of outgrowing this fondness; on thecontrary, the more I learn, the more I rely on such resources. Regrettably, this positiverelationship probably is not causal; the relevant causal relationship more likely isalong the lines of “the more I learn, the more I forget.” In any case, for these reasonsI was delighted to add Jean-Paul Fox’s excellent new book Bayesian Item ResponseModeling, to my library.

Interest in Bayesian item response models has grown steadily since Swaminathanand Gifford (1982, 1985, 1986), Tsutakawa and Lin (1986), Mislevy (1986), andothers popularized the notion that so-called empirical (parametric) Bayesian item re-sponse models, and Bayesian thinking more generally, had the potential to enhancemodern test theory. The development of more sophisticated estimation strategies—Markov chain Monte Carlo (MCMC) methods, Gibbs sampling, and Metropolis-Hastings algorithms—broadened item response theory (IRT) applications to includeincreasingly complex fully Bayesian item response modeling. These technical ad-vances in turn made possible software packages such as, WinBUGS (Lunn, Thomas,Best, & Speigelhalter, 2000), R (R Development Core Team, 2010), and S+ (TIBCOSoftware, 2009) that allowed advanced learners as well as experts to estimate suchmodels.

This brings us to the present day, where we find the democratization of Bayesianitem response models nearly complete, excepting of course the wisdom to use suchmodels judiciously. Such wisdom is unlikely to arise spontaneously or, for that mat-ter, under any circumstances; thus, the importance of Fox’s book is considerable. It isto the author’s credit that he so lucidly explains the technical features and (althoughperhaps to a lesser extent), the theoretical rationales of the models and strategiespresented in the book rather than merely providing instructions for their use. Theauthor’s commitment to explication is all the more notable given that choosing to bedescriptive rather than prescriptive runs the risk (at least one can hope) of alienatingthose readers who merely want to follow a set of defined procedures.

The book is loosely organized into three sections: (i) an introduction and back-ground to IRT and Bayesian methods, (ii) Bayesian modeling of standard itemresponse models including the evaluation and comparison of such models, and

Copyright c© 2011 by the National Council on Measurement in Education 357

Book Review

(iii) Bayesian modeling of extended item response models. Of course, Bayesianmethodology is not required to estimate many of these models including, even, someof the IRT models with minor extensions, e.g., the testlet model (Wainer, Bradlow, &Wang, 2007; for a discussion, see Rijmen, 2010). For this reason, some readers mighthave found it enlightening had the author devoted additional space to discussing therationale—both theoretical and practical—of the Bayesian approach to IRT model-ing more generally. Even so, and in the author’s defense, it is quite conceivable thatwhile readers of this book may find such a discussion enriching, they may not needconvincing on this point. Moreover, the emphasis in the later chapters on extendeditem response models—multilevel item response models in general and random itemeffects models, response time item response models, and randomized item responsemodels in particular—probably reflects the principal motivation driving readers’ in-terest in Bayesian methods: the opportunity to extend the range of problems that canbenefit from IRT by permitting the use of models that are impractical to estimatewithin the frequentist paradigm. Nonetheless, there are passages (e.g., Fox’s discus-sion of priors) that demonstrate that the Bayesian paradigm does more than merelysimplify estimation problems; I would have enjoyed reading more on this theme.

A marvelous feature of this book is that nearly every chapter is replete with ex-amples illustrating key ideas or, more often, demonstrating various models. Some ofthese models have little direct relevance to the types of measurement problems thatarise most often in educational measurement (e.g., there is an entire chapter devotedto randomized item response models), but all illustrate different ways item responsemodels can be extended to meet the demands of real-life measurement problems. Theeffect is not to tell the reader which model to use, but to demonstrate how Bayesianmethods allow flexibility in model choice. Or, to put it another way, the author makesa compelling case that Bayesian response modeling creates the opportunity for prob-lems to guide the choice of model rather than for models to guide the choice ofproblem. This is progress.

It now seems to be standard practice that reference books or textbooks includesome type of a supplementary material. These supplements, whether included in thebooks themselves or made available electronically, generally give readers the oppor-tunity to apply, assess, and extend their newfound knowledge and skills by answeringexercises, running software or source code, or reanalyzing data sets introduced in themain text. Bayesian Item Response Modeling meets this expectation handily.

Each chapter closes with a large number of thoughtful exercises that tie in neatlywith the text. I do wonder, however, about the numbers of readers that will beable to successfully complete the exercises based solely on the material introducedin the book. Some questions, while directly related to content from the book, re-quire substantially more background than is presented. Further, in some cases im-portant information may be inadequately conveyed to readers who elect not to dothe exercises. To this end, future editions may wish to avoid presenting new ma-terial in the exercises and, moreover, consider narrowing the scope of the exer-cises somewhat such that they conform to the sometimes-limited background ofnovice readers. In the meantime, I suspect that many readers would find anno-tated responses to the exercises invaluable, and I urge the author to make themavailable.

358

Book Review

Further, source code is available on the author’s website, although at the time ofthis writing, these resources seem focused on WinBUGS whereas those for R and S+are less developed. Perhaps these resources will be expanded in the coming months.

The author notes that among other goals, the book is intended to provide an intro-duction to Bayesian methodology for item response modeling. The first four chap-ters are extremely successful on this point despite some sections (and exercises) thatmay not be suitable for beginners. Novices will find that these challenges becomeboth more numerous and increasingly difficult as they progress through the book.If these obstacles prevent unprepared readers from acting imprudently, they are per-haps better characterized as a public service than a shortcoming of the book. Still,out of fairness, beginners should be forewarned that they may find parts of this bookdemanding (particularly Chapters 5–9). Thus, while Fox’s Bayesian Item ResponseModeling has much to offer students and practitioners at all levels, advanced learn-ers may benefit most from this book. Given this caveat, I recommend this book toall who seek a thorough, albeit somewhat technical, introduction to the increasinglyimportant subarea of Bayesian IRT.

References

Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000) WinBUGS—A Bayesian mod-elling framework: Concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.

Mislevy, R. J. (1986). Bayes model estimation in item response models. Psychometrika, 51,177–195.

R Development Core Team (2010). R: A Language and Environment for Statistical Comput-ing. Vienna, Austria: R Foundation for Statistical Computing.

Rijmen, F. (2010). Formal relations and an empirical comparison among the bi-factor, the test-let, and a second-order multidimensional IRT model. Journal of Educational Measurement,47, 361–372.

Swaminathan, H., & Gifford, J. A. (1982). Bayesian estimation in the Rasch model. Journalof Educational Statistics, 7, 175–192.

Swaminathan, H., & Gifford, J. A. (1985). Bayesian estimation in the two-parameter logisticmodel. Psychometrika, 50, 349–364.

Swaminathan, H., & Gifford, J. A. (1986). Bayesian estimation in the three-parameter logisticmodel. Psychometrika, 51, 589–601.

TIBCO Software (2009). TIBCO Spotfire S+ 8.1: Programmer’s Guide and Computer Pro-gram [Computer software]. Somerville, MA: TIBCO Software Inc.

Tsustakawa, R. K., & Lin, H. Y. (1986). Bayesian estimation of item response curves. Psy-chometrika, 51, 251–267.

Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications.Cambridge, UK: Cambridge University Press.

Author

PETER BALDWIN is a Senior Measurement Scientist, National Board of MedicalExaminers, 3750 Market Street, Philadelphia, PA 19104-3102; [email protected]. Hisprimary research interests include psychometric methods.

359