[wiley series in probability and statistics] multivariable model-building || front matter

16
Multivariable Model-Building Multivariable Model-Building Patrick Royston, Willi Sauerbrei © 2008 John Wiley & Sons, Ltd. ISBN: 978-0-470-02842-1

Upload: willi

Post on 22-Feb-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

Multivariable Model-Building

Multivariable Model-Building Patrick Royston, Willi Sauerbrei© 2008 John Wiley & Sons, Ltd. ISBN: 978-0-470-02842-1

Page 2: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

WILEY SERIES IN PROBABILITY AND STATISTICS

Established by WALTER A. SHEWHART and SAMUEL S. WILKS

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Iain M. Johnstone,Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg

Editors Emeriti: Vic Barnett, J. Stuart Hunter, David G. Kendall, Jozef L. Teugels

A complete list of the titles in this series appears at the end of this volume.

Page 3: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

Multivariable Model-BuildingA pragmatic approach to regression analysisbased on fractional polynomials formodelling continuous variables

PATRICK ROYSTONCancer and Statistical Methodology Groups,MRC Clinical Trials Unit, London, UK

WILLI SAUERBREIInstitute of Medical Biometry and Medical Informatics,University Medical Center, Freiburg, Germany

Page 4: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

Copyright © 2008 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): [email protected] our Home Page on www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted inany form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except underthe terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the CopyrightLicensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of thePublisher. Requests to the Publisher should be addressed to the Permissions Department, John Wiley & Sons Ltd,The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailed to [email protected], orfaxed to (+44) 1243 770620.

Designations used by companies to distinguish their products are often claimed as trademarks. All brandnames and product names used in this book are trade names, service marks, trademarks or registeredtrademarks of their respective owners. The Publisher is not associated with any product or vendormentioned in this book.

This publication is designed to provide accurate and authoritative information in regard to the subject mattercovered. It is sold on the understanding that the Publisher is not engaged in rendering professional services.If professional advice or other expert assistance is required, the services of a competent professional shouldbe sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 42 McDougall Street, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 6045 Freemont Blvd, Mississauga, ONT, L5R 4J3

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not beavailable in electronic books.

Library of Congress Cataloging in Publication Data

Royston, Patrick.Multivariable model-building : a pragmatic approach to regression analysis based onfractional polynomials for continuous variables / Patrick Royston, Willi Sauerbrei.

p. cm.Includes bibliographical references and index.ISBN 978-0-470-02842-1 (cloth : acid-free paper)1. Regression analysis. 2. Polynomials. 3. Variables (Mathematics)I. Sauerbrei, Willi. II. Title.QA278.2.R696 2008519.5′36—dc22 2008003757

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 978-0-470-02842-1

Typeset in 10/12pt Times by Integra Software Services Pvt. Ltd, Pondicherry, IndiaPrinted and bound in Great Britain by Antony Rowe Ltd, Chippenham, Wiltshire

Page 5: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

Contents

Preface xv

1 Introduction 11.1 Real-Life Problems as Motivation for Model Building, 1

1.1.1 Many Candidate Models, 11.1.2 Functional Form for Continuous Predictors, 21.1.3 Example 1: Continuous Response, 21.1.4 Example 2: Multivariable Model for Survival Data, 5

1.2 Issues in Modelling Continuous Predictors, 81.2.1 Effects of Assumptions, 81.2.2 Global versus Local Influence Models, 91.2.3 Disadvantages of Fractional Polynomial Modelling, 91.2.4 Controlling Model Complexity, 10

1.3 Types of Regression Model Considered, 101.3.1 Normal-Errors Regression, 101.3.2 Logistic Regression, 121.3.3 Cox Regression, 121.3.4 Generalized Linear Models, 141.3.5 Linear and Additive Predictors, 14

1.4 Role of Residuals, 151.4.1 Uses of Residuals, 151.4.2 Graphical Analysis of Residuals, 15

1.5 Role of Subject-Matter Knowledge in Model Development, 161.6 Scope of Model Building in our Book, 171.7 Modelling Preferences, 18

1.7.1 General Issues, 181.7.2 Criteria for a Good Model, 181.7.3 Personal Preferences, 19

1.8 General Notation, 20

v

Page 6: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

vi CONTENTS

2 Selection of Variables 232.1 Introduction, 232.2 Background, 242.3 Preliminaries for a Multivariable Analysis, 252.4 Aims of Multivariable Models, 262.5 Prediction: Summary Statistics and Comparisons, 292.6 Procedures for Selecting Variables, 29

2.6.1 Strength of Predictors, 302.6.2 Stepwise Procedures, 312.6.3 All-Subsets Model Selection Using Information

Criteria, 322.6.4 Further Considerations, 33

2.7 Comparison of Selection Strategies in Examples, 352.7.1 Myeloma Study, 352.7.2 Educational Body-Fat Data, 362.7.3 Glioma Study, 38

2.8 Selection and Shrinkage, 402.8.1 Selection Bias, 402.8.2 Simulation Study, 402.8.3 Shrinkage to Correct for Selection Bias, 422.8.4 Post-estimation Shrinkage, 442.8.5 Reducing Selection Bias, 452.8.6 Example, 46

2.9 Discussion, 472.9.1 Model Building in Small Datasets, 472.9.2 Full, Pre-specified or Selected Model? 472.9.3 Comparison of Selection Procedures, 492.9.4 Complexity, Stability and Interpretability, 492.9.5 Conclusions and Outlook, 50

3 Handling Categorical and Continuous Predictors 533.1 Introduction, 533.2 Types of Predictor, 54

3.2.1 Binary, 543.2.2 Nominal, 543.2.3 Ordinal, Counting, Continuous, 553.2.4 Derived, 55

3.3 Handling Ordinal Predictors, 553.3.1 Coding Schemes, 553.3.2 Effect of Coding Schemes on Variable Selection, 56

3.4 Handling Counting and Continuous Predictors:Categorization, 583.4.1 ‘Optimal’ Cutpoints: A Dangerous Analysis, 583.4.2 Other Ways of Choosing a Cutpoint, 59

3.5 Example: Issues in Model Building with Categorized Variables, 603.5.1 One Ordinal Variable, 613.5.2 Several Ordinal Variables, 62

Page 7: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

CONTENTS vii

3.6 Handling Counting and Continuous Predictors: Functional Form, 643.6.1 Beyond Linearity, 643.6.2 Does Nonlinearity Matter? 653.6.3 Simple versus Complex Functions, 663.6.4 Interpretability and Transportability, 66

3.7 Empirical Curve Fitting, 673.7.1 General Approaches to Smoothing, 683.7.2 Critique of Local and Global Influence Models, 68

3.8 Discussion, 693.8.1 Sparse Categories, 693.8.2 Choice of Coding Scheme, 693.8.3 Categorizing Continuous Variables, 703.8.4 Handling Continuous Variables, 70

4 Fractional Polynomials for One Variable 714.1 Introduction, 724.2 Background, 72

4.2.1 Genesis, 724.2.2 Types of Model, 734.2.3 Relation to Box–Tidwell and Exponential Functions, 73

4.3 Definition and Notation, 744.3.1 Fractional Polynomials, 744.3.2 First Derivative, 74

4.4 Characteristics, 754.4.1 FP1 and FP2 Functions, 754.4.2 Maximum or Minimum of a FP2 Function, 75

4.5 Examples of Curve Shapes with FP1 and FP2 Functions, 764.6 Choice of Powers, 784.7 Choice of Origin, 794.8 Model Fitting and Estimation, 794.9 Inference, 79

4.9.1 Hypothesis Testing, 794.9.2 Interval Estimation, 80

4.10 Function Selection Procedure, 824.10.1 Choice of Default Function, 824.10.2 Closed Test Procedure for Function Selection, 824.10.3 Example, 834.10.4 Sequential Procedure, 834.10.5 Type I Error and Power of the Function Selection

Procedure, 844.11 Scaling and Centering, 84

4.11.1 Computational Aspects, 844.11.2 Examples, 85

4.12 FP Powers as Approximations toContinuous Powers, 854.12.1 Box–Tidwell and Fractional Polynomial Models, 854.12.2 Example, 85

Page 8: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

viii CONTENTS

4.13 Presentation of Fractional Polynomial Functions, 864.13.1 Graphical, 864.13.2 Tabular, 87

4.14 Worked Example, 894.14.1 Details of all Fractional Polynomial Models, 894.14.2 Function Selection, 904.14.3 Details of the Fitted Model, 904.14.4 Standard Error of a Fitted Value, 914.14.5 Fitted Odds Ratio and its Confidence Interval, 91

4.15 Modelling Covariates with a Spike at Zero, 924.16 Power of Fractional Polynomial Analysis, 94

4.16.1 Underlying Function Linear, 954.16.2 Underlying Function FP1 or FP2, 954.16.3 Comment, 96

4.17 Discussion, 97

5 Some Issues with Univariate Fractional Polynomial Models 995.1 Introduction, 995.2 Susceptibility to Influential Covariate Observations, 1005.3 A Diagnostic Plot for Influential Points in FP

Models, 1005.3.1 Example 1: Educational Body-Fat Data, 1015.3.2 Example 2: Primary Biliary Cirrhosis Data, 101

5.4 Dependence on Choice of Origin, 1035.5 Improving Robustness by Preliminary Transformation, 105

5.5.1 Example 1: Educational Body-Fat Data, 1065.5.2 Example 2: PBC Data, 1075.5.3 Practical Use of the Pre-transformation gδ(x), 107

5.6 Improving Fit by Preliminary Transformation, 1085.6.1 Lack of Fit of Fractional Polynomial Models, 1085.6.2 Negative Exponential Pre-transformation, 108

5.7 Higher Order Fractional Polynomials, 1095.7.1 Example 1: Nerve Conduction Data, 1095.7.2 Example 2: Triceps Skinfold Thickness, 110

5.8 When Fractional Polynomial Models are Unsuitable, 1115.8.1 Not all Curves are Fractional Polynomials, 1115.8.2 Example: Kidney Cancer, 112

5.9 Discussion, 113

6 MFP: Multivariable Model-Building with Fractional Polynomials 1156.1 Introduction, 1156.2 Motivation, 1166.3 The MFP Algorithm, 117

6.3.1 Remarks, 1186.3.2 Example, 118

Page 9: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

CONTENTS ix

6.4 Presenting the Model, 1206.4.1 Parameter Estimates, 1206.4.2 Function Plots, 1216.4.3 Effect Estimates, 121

6.5 Model Criticism, 1236.5.1 Function Plots, 1236.5.2 Graphical Analysis of Residuals, 1246.5.3 Assessing Fit by Adding More Complex Functions, 1256.5.4 Consistency with Subject-Matter Knowledge, 129

6.6 Further Topics, 1296.6.1 Interval Estimation, 1296.6.2 Importance of the Nominal Significance Level, 1306.6.3 The Full MFP Model, 1316.6.4 A Single Predictor of Interest, 1326.6.5 Contribution of Individual Variables to the Model Fit, 1346.6.6 Predictive Value of Additional Variables, 136

6.7 Further Examples, 1386.7.1 Example 1: Oral Cancer, 1386.7.2 Example 2: Diabetes, 1396.7.3 Example 3: Whitehall I, 140

6.8 Simple Versus Complex Fractional Polynomial Models, 1446.8.1 Complexity and Modelling Aims, 1446.8.2 Example: GBSG Breast Cancer Data, 144

6.9 Discussion, 1466.9.1 Philosophy of MFP, 1476.9.2 Function Complexity, Sample Size and

Subject-Matter Knowledge, 1486.9.3 Improving Robustness by Preliminary Covariate

Transformation, 1486.9.4 Conclusion and Future, 149

7 Interactions 1517.1 Introduction, 1517.2 Background, 1527.3 General Considerations, 152

7.3.1 Effect of Type of Predictor, 1527.3.2 Power, 1537.3.3 Randomized Trials and Observational Studies, 1537.3.4 Predefined Hypothesis or Hypothesis Generation, 1537.3.5 Interactions Caused by Mismodelling Main Effects, 1547.3.6 The ‘Treatment–Effect’ Plot, 1547.3.7 Graphical Checks, Sensitivity and Stability Analyses, 1547.3.8 Cautious Interpretation is Essential, 155

7.4 The MFPI Procedure, 1557.4.1 Model Simplification, 1567.4.2 Check of the Results and Sensitivity Analysis, 156

Page 10: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

x CONTENTS

7.5 Example 1: Advanced Prostate Cancer, 1577.5.1 The Fitted Model, 1587.5.2 Check of the Interactions, 1607.5.3 Final Model, 1617.5.4 Further Comments and Interpretation, 1627.5.5 FP Model Simplification, 163

7.6 Example 2: GBSG Breast Cancer Study, 1637.6.1 Oestrogen Receptor Positivity as a Predictive Factor, 1637.6.2 A Predefined Hypothesis: Tamoxifen–Oestrogen Receptor

Interaction, 1637.7 Categorization, 165

7.7.1 Interaction with Categorized Variables, 1657.7.2 Example: GBSG Study, 166

7.8 STEPP, 1677.9 Example 3: Comparison of STEPP with MFPI, 168

7.9.1 Interaction in the Kidney Cancer Data, 1687.9.2 Stability Investigation, 168

7.10 Comment on Type I Error of MFPI, 1717.11 Continuous-by-Continuous Interactions, 172

7.11.1 Mismodelling May Induce Interaction, 1737.11.2 MFPIgen: An FP Procedure to

Investigate Interactions, 1747.11.3 Examples of MFPIgen, 1757.11.4 Graphical Presentation of

Continuous-by-Continuous Interactions, 1797.11.5 Summary, 180

7.12 Multi-Category Variables, 1817.13 Discussion, 181

8 Model Stability 1838.1 Introduction, 1838.2 Background, 1848.3 Using the Bootstrap to Explore Model Stability, 185

8.3.1 Selection of Variables within a Bootstrap Sample, 1858.3.2 The Bootstrap Inclusion Frequency and the Importance

of a Variable, 1868.4 Example 1: Glioma Data, 1868.5 Example 2: Educational Body-Fat Data, 188

8.5.1 Effect of Influential Observations on ModelSelection, 189

8.6 Example 3: Breast Cancer Diagnosis, 1908.7 Model Stability for Functions, 191

8.7.1 Summarizing Variation between Curves, 1918.7.2 Measures of Curve Instability, 192

8.8 Example 4: GBSG Breast Cancer Data, 1938.8.1 Interdependencies among Selected Variables and Functions

in Subsets, 193

Page 11: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

CONTENTS xi

8.8.2 Plots of Functions, 1938.8.3 Instability Measures, 1958.8.4 Stability of Functions Depending on Other Variables Included, 196

8.9 Discussion, 1978.9.1 Relationship between Inclusion Fractions, 1988.9.2 Stability of Functions, 198

9 Some Comparisons of MFP with Splines 2019.1 Introduction, 2019.2 Background, 2029.3 MVRS: A Procedure for Model Building with Regression Splines, 203

9.3.1 Restricted Cubic Spline Functions, 2039.3.2 Function Selection Procedure for Restricted Cubic

Splines, 2059.3.3 The MVRS Algorithm, 205

9.4 MVSS: A Procedure for Model Building with Cubic SmoothingSplines, 2059.4.1 Cubic Smoothing Splines, 2059.4.2 Function Selection Procedure for Cubic Smoothing

Splines, 2069.4.3 The MVSS Algorithm, 206

9.5 Example 1: Boston Housing Data, 2079.5.1 Effect of Reducing the Sample Size, 2089.5.2 Comparing Predictors, 212

9.6 Example 2: GBSG Breast Cancer Study, 2149.7 Example 3: Pima Indians, 2159.8 Example 4: PBC, 2179.9 Discussion, 219

9.9.1 Splines in General, 2209.9.2 Complexity of Functions, 2219.9.3 Optimal Fit or Transferability? 2219.9.4 Reporting of Selected Models, 2219.9.5 Conclusion, 222

10 How To Work with MFP 22310.1 Introduction, 22310.2 The Dataset, 22310.3 Univariate Analyses, 22610.4 MFP Analysis, 22710.5 Model Criticism, 228

10.5.1 Function Plots, 22810.5.2 Residuals and Lack of Fit, 22810.5.3 Robustness Transformation and Subject-Matter Knowledge, 22910.5.4 Diagnostic Plot for Influential Observations, 23010.5.5 Refined Model, 23110.5.6 Interactions, 231

10.6 Stability Analysis, 232

Page 12: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

xii CONTENTS

10.7 Final Model, 23510.8 Issues to be Aware of, 235

10.8.1 Selecting the Main-Effects Model, 23510.8.2 Further Comments on Stability, 23610.8.3 Searching for Interactions, 238

10.9 Discussion, 238

11 Special Topics Involving Fractional Polynomials 24111.1 Time-Varying Hazard Ratios in the Cox Model, 241

11.1.1 The Fractional Polynomial Time Procedure, 24211.1.2 The MFP Time Procedure, 24311.1.3 Prognostic Model with Time-Varying Effects for

Patients with Breast Cancer, 24311.1.4 Categorization of Survival Time, 24511.1.5 Discussion, 246

11.2 Age-specific Reference Intervals, 24711.2.1 Example: Fetal growth, 24711.2.2 Using FP Functions as Smoothers, 24811.2.3 More Sophisticated Distributional Assumptions, 24911.2.4 Discussion, 249

11.3 Other Topics, 25011.3.1 Quantitative Risk Assessment in Developmental

Toxicity Studies, 25011.3.2 Model Uncertainty for Functions, 25111.3.3 Relative Survival, 25211.3.4 Approximating Smooth Functions, 25311.3.5 Miscellaneous Applications, 254

12 Epilogue 25512.1 Introduction, 25512.2 Towards Recommendations for Practice, 255

12.2.1 Variable Selection Procedure, 25512.2.2 Functional Form for Continuous Covariates, 25712.2.3 Extreme Values or Influential Points, 25712.2.4 Sensitivity Analysis, 25712.2.5 Check for Model Stability, 25812.2.6 Complexity of a Predictor, 25812.2.7 Check for Interactions, 258

12.3 Omitted Topics and Future Directions, 25812.3.1 Measurement Error in Covariates, 25812.3.2 Meta-analysis, 25812.3.3 Multi-level (Hierarchical) Models, 25912.3.4 Missing Covariate Data, 25912.3.5 Other Types of Model, 259

12.4 Conclusion, 259

Page 13: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

CONTENTS xiii

Appendix A: Data and Software Resources 261A.1 Summaries of Datasets, 261A.2 Datasets used more than once, 262

A.2.1 Research Body Fat, 262A.2.2 GBSG Breast Cancer, 262A.2.3 Educational Body Fat, 263A.2.4 Glioma, 264A.2.5 Prostate Cancer, 264A.2.6 Whitehall I, 265A.2.7 PBC, 265A.2.8 Oral Cancer, 266A.2.9 Kidney Cancer, 266

A.3 Software, 267

Appendix B: Glossary of Abbreviations 269

References 271

Index 285

Page 14: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

Preface

Multivariable Model-Building: a pragmatic approach to regression analysis based on frac-tional polynomials for modelling continuous variables is principally written for scientists(including statisticians, researchers and graduate students) working with regression modelsin all branches of application. Our general objective is to provide a readable text giving therationale of, and practical advice on, a unified approach to multivariable modelling which aimsto make such models simpler and more effective. Specifically, we focus on the selection ofimportant variables and the determination of functional form for continuous predictors. Sinceour own background is in biostatistics and clinical research, inevitably there is a focus onapplications in medicine and the health sciences, but the methodology is much more widelyuseful. The topic of multivariable model-building is very broad; we doubt if it is possible tocover all the relevant topics in a single book. Therefore, we have concentrated on what wesee as a few key issues. No multivariable model-building strategy has rigorous theoreticalunderpinnings. Even those approaches most used in practice have not had their propertiesstudied adequately by simulation. In particular, handling continuous variables in a multivari-able context has largely been ignored. Since there is no consensus among researchers on the‘best’ strategy, a pragmatic approach is required. Our book reflects our views derived fromwide experience. The text assumes a basic understanding of multiple regression modelling,but it can be read without detailed mathematical knowledge.

Multivariable regression models are widely used in all areas of science in which empir-ical data are analysed. We concentrate on normal-errors models for continuous outcomes,logistic regression for binary outcomes and Cox regression for censored time-to-event data.Our methodology is easily transferred to more general regression models. As expressed in avery readable paper by Chatfield (2002), we aim to ‘encourage and guide practitioners, and alsoto counterbalance a literature that can be overly concerned with theoretical matters far removedfrom the day-to-day concerns of many working statisticians’. The main focus is the modellingof continuous covariates by using fractional polynomials. The methods are illustrated by theanalysis of many datasets, mainly from clinical epidemiology, ranging from prognostic factorsin breast cancer and treatment in kidney cancer to risk factors in heart disease.

WHAT IS IN OUR BOOK

Our main concern is how to build a multivariable regression model from several candidatepredictors, some of which are continuous. We are more interested in explanatory models (thatis, in assessing the effects of individual variables in a multivariable context) than in deriving

xv

Page 15: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

xvi PREFACE

a ‘good’ predictor without regard to its components. The basic techniques that are dealt within many textbooks on regression analysis are not repeated.

Chapters 2 and 3 deal mainly with the selection of variables and coping with different typesof variables in a modelling context. Relationships with continuous covariates are assumedlinear. The importance of the coding chosen for categorical covariates is discussed. Chapters 4and 5 provide a reasonably comprehensive account of univariate fractional polynomial (FP)models, our preferred method of working with continuous predictors. We introduce the func-tion selection procedure (FSP). Chapter 6, the heart of the book, introduces multivariable FP(MFP) modelling, combining backward elimination with the FSP. In Chapter 7, FP modellingis extended to include interactions between predictors, both categorical-by-continuous andcontinuous-by-continuous. Chapter 8 looks at techniques for assessing the stability of mul-tivariable models. Bootstrap resampling is the key tool here. Chapter 9 briefly outlines splinemodels. We introduce two multivariable modelling procedures in which the FSP is adaptedfor splines, and we compare the results with FP models in several examples. Chapter 10 is afairly self-contained guide to working with MFPs, taking a problem-oriented approach usingan artificial but realistic dataset. A practitioner with some experience in regression modellingshould be able to take in the principles and practice of MFP modelling from this chapter. Asthroughout the book, frequent use is made of model criticism, particularly of plots of fittedfunctions and of smoothed residuals, and of techniques for assessing the effects of influen-tial observations on the selected model. Chapter 11 is a brief tour of further applications ofFP methodology. Chapter 12 gives our recommendations for practice, briefly discusses sometopics not dealt with in our book, and points to further research.

We lay stress on deriving parsimonious models that make sense from a subject-matterviewpoint. We are more concerned with getting the ‘big picture’ right than in refining theminor details of a model fit.

HOW TO READ OUR BOOK

The chapters have been organized such that the ideas unfold in a logical sequence, withChapter 1 providing motivation and a flavour of what is to follow. However, to grasp thecore ideas of our book more rapidly, we suggest as a bare minimum reading the followingsegments:

• Section 1.7 defines our approach to modelling in general terms.• Section 2.6 discusses stepwise and other procedures for selecting variables.• Sections 4.2–4.10 introduce FP functions and show how they are used in modelling a

single continuous predictor. Section 4.14 contains a worked example. For an experiencedmodeller, the example may be a sufficient guide to the main principles.

• Sections 6.1, 6.2, 6.3 and 6.5 describe the key parts of the MFP method of multivariablemodel-building.

• Chapter 10 is particularly recommended to the practitioner who wants an appreciation ofhow to use MFP. We include material on some of the pitfalls that may be avoided usingsimple diagnostic techniques. Sections 10.5.6 and 10.8.3 on interactions may be omittedat a first reading.

• Chapter 12 summarizes some recommendations for practice.

Page 16: [Wiley Series in Probability and Statistics] Multivariable Model-Building || Front Matter

PREFACE xvii

SOFTWARE AND DATA

For practical use it is important that the necessary software is generally avail-able. Software for the basic MFP method has been implemented in Stata, SASand R. Special-purpose programs for Stata are available on our book’s websitehttp://www.imbi.uni-freiburg.de/biom/Royston-Sauerbrei-book for all the extensions wedescribe. In some of the examples, we show that use of the software is simple if basic principlesof the methodology are understood. To assist the reader in developing their own experiencein multivariable model-building, some of the datasets used in the book are available on thewebsite.

EDUCATIONAL RESOURCES

Supplementary materials, including datasets, software, exercises and relevant Web links, areavailable on the website. Many of the issues in multivariable model-building with continu-ous covariates discussed in our book are explored in Chapter 10, where an artificial dataset(the ‘ART study’) is described and analysed. The dataset and details of the design and Stataprograms used to create it are available on the website, allowing the data to be modified fordifferent purposes. Exercises based on the ART study are suggested. We would encourageothers to extend and develop the exercises as part of the material used to teach MFP method-ology. Slide presentations are available as a starting point for preparing talks and teaching thematerial.

ACKNOWLEDGEMENTS

We are indebted to our colleagues at the MRC Clinical Trials Unit, London, and at the Insti-tute of Medical Biometry and Medical Informatics, University Medical Center Freiburg fordiscussion and encouragement. Our research and the book have benefited from construct-ive comments from many people, including particularly the following: Doug Altman, HaraldBinder, Gareth Ambler, Carol Coupland, Christel Faes, David Hosmer, Tony Johnson, PaulLambert, Rumana Omar, Michael Schemper, Martin Schumacher, Simon Thompson, and Hansvan Houwelingen. We thank the following for kind permission to use their valuable datasets inour book: Lyn Chitty (fetal growth), Tim Cole (triceps), John Foekens and Maxime Look (Rot-terdam breast cancer), Amy Luke (research body fat), John Matthews and Maeve O’Sullivan(nerve conduction), Alastair Ritchie and Mahesh Parmar (kidney cancer), Philip Rosenberg(oral cancer), Martin Shipley (Whitehall I). We are grateful to Lena Barth, Karina Gitina,Georg Koch and Edith Motschall for technical assistance. Finally, we owe very special thanksto the director and staff of the Mathematisches Forschungsinstitut Oberwolfach, Germany.The excellent atmosphere and working conditions during visits there over several years wereconducive to the development of many research ideas and papers which led up to our book.

London and FreiburgNovember 2007