book review: modelling longitudinal data. by r. weiss

2
Book Review Modelling Longitudinal Data. R. Weiss (2005). New York: Springer. ISBN 0387402713 Longitudinal data analysis has been a major topic of biostatistical research for at least twenty to thirty years. Over this period, the subject has grown from being treated primarily as a special topic within the analysis of variance to a largely model-based methodology. Key developments in this transition have included general linear models with correlated error structure (Laird and Ware, 1982); generalized estimating equations for discrete long- itudinal data (Liang and Zeger, 1986); generalized linear mixed models for longitudinal and other kinds of depen- dent data (Breslow and Clayton, 1993). Longitudinal data analysis is now a core subject in most postgraduate biostatistics programmes, and a natural consequence of this has been the appearance of a succession of books devoted to the subject. A number of these, such as Crowder and Hand (1990), Diggle, Heagerty, Liang and Zeger (2002), Verbeke and Molenberghs (2000) and Fitzmaurice, Laird and Ware (2004), offer a rather general treat- ment whilst others, such as Jones and Kenward (1989), Jones (1993) Davidian and Giltinan (1995), Fahrmeir and Tutz (2001, Senn (2002), Goldstein (1995) focus on particular, but important, topics which the more general books treat only briefly, if at all. Weiss’s book is the latest in the “general” vein. Its distinctive feature, as stated explicitly by the author in the preface, is that it is “a text-book, not a monograph.” Accordingly, each chapter ends with a set of problems for the reader to tackle, many of these involving the analysis of the data-sets which are linked to the book through its associated web-page. The text-book flavour has several other consequences which will make the book more or less attractive than the books cited above, according to the reader’s motivation. Almost all of the book is devoted, explicitly or implicitly, to the general linear model with correlated Gaus- sian residuals. I use this strategy myself in teaching at Masters’ level. From a pedagogical perspective, the Gaussian linear model gives the teacher enough flexibility to convey the essence of longitudinal data analysis and why it is so different from classical, cross-sectional analysis, without introducing unnecessary technical complications. From a practical perspective it will cope with many of the applied problems which a novice biostatistician is likely to encounter early in their career. However, there is a price to be paid for this choice. The one chapter in this book which covers “Discrete Longitudinal Data” dicusses only two very specific models in any detail: the logistic linear for binary responses and the Poisson log-linear for count responses, both with a random intercept in the linear predictor. This is liable to leave the student with the misleading impression that the only way to deal with discrete longitudinal data is to fit a random effects model. The book includes chapter-length treatments of several basic topics which other “general” books typically mention only in passing. This is partly a consequence of the author’s pitching the book at a lower level than its competitors, with regard to both the technical mathematical skills and the general level of statistical sophistication expected of the reader. Two examples where I think this works well are chapter 2 on “Plots” and chapter 6 on “Tools and Concepts.” In “plots” the author gives a detailed discussion, and many illustrations, of why the graphical display of longitudinal data needs careful attention to detail indeed, I would have welcomed even stronger warnings against the uncritical use of the default plotting options which are supplied in high-quality statistical graphics environments such as R (which the author uses very much in non-default mode), never mind the horrors produced by Excel and the like. In “Tools and Concepts” the author describes general inferential tools such as likelihood ratio tests and model selection criteria, together with computational issues and design. The treatment of the inferential issues is a little too algorithmic for my taste; for example, maximum likelihood and restricted maximum likelihood (REML) estimation criteria for the Gaussian linear model are given as explicit formulae, but the motivation for the latter is weak the author simply states that REML “is done because it can be done easily and ... produces better estimates of q,” which is unlikely to satisfy an intellectually curious student. The computational issues are, in my opinion, done rather better, with clear and much-needed warnings about not trusting the first answer you get from a numerical optimisation procedure, and the fragility of a numeri- cally estimated Hessian matrix. The chapter on “Specifying Covariates” is also well done on its own terms, but I do question whether the space taken to show in detail how to construct design matrices for different parameterisations of relatively 484 Biometrical Journal 49 (2007) 3, 484–485 DOI: 10.1002/bimj.200710329 # 2007 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim

Upload: peter-j-diggle

Post on 06-Jun-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Book Review: Modelling Longitudinal Data. By R. Weiss

Book Review

Modelling Longitudinal Data. R. Weiss (2005). New York: Springer. ISBN 0387402713

Longitudinal data analysis has been a major topic of biostatistical research for at least twenty to thirty years.Over this period, the subject has grown from being treated primarily as a special topic within the analysis ofvariance to a largely model-based methodology. Key developments in this transition have included general linearmodels with correlated error structure (Laird and Ware, 1982); generalized estimating equations for discrete long-itudinal data (Liang and Zeger, 1986); generalized linear mixed models for longitudinal and other kinds of depen-dent data (Breslow and Clayton, 1993). Longitudinal data analysis is now a core subject in most postgraduatebiostatistics programmes, and a natural consequence of this has been the appearance of a succession of booksdevoted to the subject. A number of these, such as Crowder and Hand (1990), Diggle, Heagerty, Liang and Zeger(2002), Verbeke and Molenberghs (2000) and Fitzmaurice, Laird and Ware (2004), offer a rather general treat-ment whilst others, such as Jones and Kenward (1989), Jones (1993) Davidian and Giltinan (1995), Fahrmeir andTutz (2001, Senn (2002), Goldstein (1995) focus on particular, but important, topics which the more generalbooks treat only briefly, if at all.

Weiss’s book is the latest in the “general” vein. Its distinctive feature, as stated explicitly by the author in thepreface, is that it is “a text-book, not a monograph.” Accordingly, each chapter ends with a set of problems forthe reader to tackle, many of these involving the analysis of the data-sets which are linked to the book through itsassociated web-page. The text-book flavour has several other consequences which will make the book more orless attractive than the books cited above, according to the reader’s motivation.� Almost all of the book is devoted, explicitly or implicitly, to the general linear model with correlated Gaus-

sian residuals.I use this strategy myself in teaching at Masters’ level. From a pedagogical perspective, the Gaussian linearmodel gives the teacher enough flexibility to convey the essence of longitudinal data analysis and why it isso different from classical, cross-sectional analysis, without introducing unnecessary technical complications.From a practical perspective it will cope with many of the applied problems which a novice biostatistician islikely to encounter early in their career. However, there is a price to be paid for this choice. The one chapterin this book which covers “Discrete Longitudinal Data” dicusses only two very specific models in any detail:the logistic linear for binary responses and the Poisson log-linear for count responses, both with a randomintercept in the linear predictor. This is liable to leave the student with the misleading impression that theonly way to deal with discrete longitudinal data is to fit a random effects model.� The book includes chapter-length treatments of several basic topics which other “general” books typically

mention only in passing.This is partly a consequence of the author’s pitching the book at a lower level than its competitors, withregard to both the technical mathematical skills and the general level of statistical sophistication expected ofthe reader. Two examples where I think this works well are chapter 2 on “Plots” and chapter 6 on “Tools andConcepts.”In “plots” the author gives a detailed discussion, and many illustrations, of why the graphical display oflongitudinal data needs careful attention to detail – indeed, I would have welcomed even stronger warningsagainst the uncritical use of the default plotting options which are supplied in high-quality statistical graphicsenvironments such as R (which the author uses very much in non-default mode), never mind the horrorsproduced by Excel and the like.In “Tools and Concepts” the author describes general inferential tools such as likelihood ratio tests andmodel selection criteria, together with computational issues and design. The treatment of the inferentialissues is a little too algorithmic for my taste; for example, maximum likelihood and restricted maximumlikelihood (REML) estimation criteria for the Gaussian linear model are given as explicit formulae, but themotivation for the latter is weak – the author simply states that REML “is done because it can be doneeasily and . . . produces better estimates of q,” which is unlikely to satisfy an intellectually curious student.The computational issues are, in my opinion, done rather better, with clear and much-needed warnings aboutnot trusting the first answer you get from a numerical optimisation procedure, and the fragility of a numeri-cally estimated Hessian matrix.The chapter on “Specifying Covariates” is also well done on its own terms, but I do question whether thespace taken to show in detail how to construct design matrices for different parameterisations of relatively

484 Biometrical Journal 49 (2007) 3, 484–485 DOI: 10.1002/bimj.200710329

# 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Page 2: Book Review: Modelling Longitudinal Data. By R. Weiss

simple linear regression models is justified in a book for which a course in linear regression is a stated pre-requisite.� The book gives no references at all until a final, short chapter on “Further Reading” which is more or less a

list of references grouped under several broad headings.This must be a deliberate strategy on the author’s part. My concern with it is that it reinforces the moderntendency, which I recognise but very much regret, for students to expect all of the required material for acourse to be in “the course text” or, worse, in a printed set of lecture notes. It also makes the material readvery oddly in places. For example, data-sets are introduced without any discussion of their source or, moreseriously, the scientific purpose for which they were collected (the former omission is rectufied in the Appen-dix). Another example: in the preface the author makes specific, critical comments about other books(“Many current texts are unbalanced . . .”, “One book spends more than 25% of its space on missing datamodelling . . .”) without telling us which books these are, so that we can judge for ourselves. Incidentally,none of the “general” books which I have cited above are mentioned in the “Further reading” chapter and sodo not appear in the bibliography. Omission of references from the main text is, in my opinion, particularlyunfortunate in the later chapters of the book where, for legitimate pedagogical reasons, the treatment of thematerial is very incomplete. In this respect, the selected references in the “Further Reading” chapter are attimes idiosyncratic. With regard to discrete data, the author expresses a distaste for marginal modellingwhich borders on blind prejudice, and an enthusiasm for Bayesian methods which sits oddly with the book’snear-universal reliance on estimation and significance testing using standard likelihood asymptotics as thebasis for inference. With regard to missing values, key papers such as the review by Little (1995) are notmentioned, whilst the repeated reference to Diggle and Donnelly (1989) in this context must surely be theresult of a bibtex error.

In conclusion, this is a welcome addition to the available set of books on longitudinal data analysis because ofits distinctive, text-book style. As my comment makes clear, I have reservations about some aspects of the book,but would certainly add it to the list of recommended reading for the next course I give at Masters’ level.

References

Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. Journal ofthe American Statistical Association 88, 9–25.

Crowder, M. J. and Hand, D. J. (1990). Analysis of Repeated Measures. London, Chapman and Hall.Davidian, M. and Giltinan, D. M. (1995). Nonlinear Models for Repeated Measurement Data. London, Chapman

and Hall.Diggle, P. J. and Donnelly, J. D. (1989). The analysis of repeated measurements: a bibliography. Australian Jour-

nal of Statistics 31, 183–93.Diggle, P. J., Heagerty, P., Liang, K. Y., and Zeger, S. L. (2002). Analysis of Longitudinal Data (second edition).

Oxford, Oxford University Press.Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized Linear Models. New

York, Springer.Fitzmaurice, G. M., Laird, N. M., and Ware, J. H. (2004). Applied Longitudinal Analysis. New Jersey, Wiley.Goldstein, H. (1995). Multilevel Statistical Models (second edition). London, Edward Arnold.Jones, B. and Kenward, M. J. (1989). Design and Analysis of Cross-Over Trials. London, Chapaman and Hall.Jones, R. H. (1993). Longitudinal Data with Serial Correlation: a State-Space Approach. London, Chapman and Hall.Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38, 963–74.Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.Little, R. J. A. (1995). Modelling the drop-out mechanism in repeated-measures studies. Journal of the American

Statistical Associtation 90, 1112–21.Senn, S. (2002). Cross-over Trials in Clinical Research. New York, Wiley.Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. New York, Springer.

Peter J. Diggle*

Department of Medicine, Lancaster UniversityLancaster LA1 4YFUK

Biometrical Journal 49 (2007) 3 485

* e-mail: [email protected], Phone: 0044 1524 593957

# 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com