statistical methods in micro-simulation modeling

Statistical Methods in

Micro-Simulation Modeling:

Calibration and Predictive

Accuracy

by

Stavroula ChrysanthopoulouB.S., Athens University of Economics and Business, 2003

Sc. M., University of Athens, 2007

A Dissertation submitted in partial fulfillment of the

requirements for the Degree of Doctor of Philosophy

in Biostatistics, at Brown university

Providence, Rhode Island

May 2014

This dissertation by Stavroula Chrysanthopoulou is accepted in its present form

by the SPH department of Biostatistics as satisfying the

dissertation requirement for the degree of Doctor of Philosophy.

Date

Constantine Gatsonis, PhD (Advisor)

Recommended to the Graduate Council

Date

Carolyn Rutter, Reader, PhD (Reader)

Date

Xi Luo, PhD (Reader)

Date

Matthew Harrison, PhD (Reader)

Approved by the Graduate Council

Date

Peter Weber, Dean of the Graduate School

iii

Curriculum Vitæ

Stavroula Chrysanthopoulou was born on May 2, 1980, in Athens, Greece.

She received her BSc degree in Statistics from Athens University of Economics and

Business (AUEB), in September 2003, and her MSc degree in Biostatistics from

University of Athens (UOA), in February 2007.

In September 2008 she was admitted to the PhD program in Biostatistics, at Brown

University, from where she received her second MSc degree in Biostatistics in 2010.

She successfully defended her PhD Dissertation entitled ”Statistical Methods in

Micro-Simulation Modeling: Calibration and Predictive Accuracy”, on September

13, 2013.

During her five years career as a PhD candidate, she was appointed as a teaching

assistant in the following courses, offered by the department of Biostatistics at Brown

University:

• Introduction to Biostatistics (Fall semester, 2008)

• Applied Regression Models (Spring semester, 2009)

• Analysis of Life Time Data (Spring semester, 2012)

She presented a poster entitled ”Relationship between breast biopsies and family

histrory of breast cancer”, at the Brown University Public Health Research Day, in

Spring semester 2010.

She also presented part of her dissertation work as an invited speaker in the ”Micro-

iv

simulation Models for Health Policy: Advances and Applications” session, at the

Joint Statistical Meetings (JSM) 2013 conference in Montreal, Canada.

She has several years of working experience as:

⇒ 2003-2005: Consulting Biostatistician, mainly involved in the design and con-

duct of statistical analysis for biomedical papers.

⇒ 2005-2008: Statistical Consultant at Agilis SA-Statistics and Informatics, in-

volved with research on methods for official statistics in projects conducted by

the European Statistical Service (Eurostat)

Her research interests are focused on statistical methods for complex predictive mod-

els, such as Micro-simulation Models (MSMs) used in medical decision making, as

well as on High Performance Computing (HPC) techniques for complex statistical

computations using the open source statistical package R.

v

Acknowledgements

The five years of my life as a PhD candidate were full of valuable experiences, ex-

ceptional opportunities to improve myself both as a scientist and as a human being,

and of course a lot of challenging moments. In this beautiful “journey” I was blessed

by God to be surrounded by very important people, without the support of whom I

would never be able to achieve my goal.

First and foremost I would like to thank my advisor, Professor Constantine Gatsonis,

for his willingness to work with me in this very interesting field, and his continuing

support and guidance that helped me to overcome all the obstacles and conduct this

important research. His intelligence, ethos, and integrity render him the perfect role

model for young scientists. I want to also express my gratitude to Dr Carolyn Rutter

for her valuable feedback as an expert in micro-simulation modeling, as well as for

the exceptional opportunities she provided me with to present my work and exchange

opinions with experts in the field. I would also like to thank Dr Matthew Harrison

for his felicitous comments and insight that helped me to improve the Empirical

calibration method, as well as to better organize and carry out the daunting task

of calibrating a micro-simulation model. Thanks also to Dr Xi Luo for serving as a

reader in my thesis committee.

I am also grateful to people from the Brown Center for Computation and Visual-

ization support group, especially Mark Howison and Aaron Shen for always being

very responsive and effective in helping me with the implementation of exhaustive

parallel processes in R. I also thank Dr Samir Soneji for his assistance in estimating

vi

Cumulative Incidence Functions from the National Health Interview Survey data.

I also thank all the faculty, staff, and students of the Brown School of Public Health.

Especially I want to thank all my professors from the Biostatistics department, the

staff of the Center of Statistical Sciences (CSS), and my classmates. Special thanks

go to Denise Arver and Elizabeth Clark for always being very responsive and con-

siderate.

Besides the people in the Academic environment, I was also blessed to have a beau-

tiful family and some wonderful friends that were always there for me in all the ups

and downs of my career as a PhD candidate. To all these people I owe a great deal

of my achievement.

I have no words to express how blessed I am for growing up in a very loving and

caring family who always believed in and supported me. I want to thank my father

for the first nine, full of love years of my life, as well as for being my good angel since

the day he passed away. There is no way to thank my wonderful mother enough, for

dedicating her life to my brother and me, and holding very successfully both parental

roles the past twenty four years of my life. She has been without exaggeration the

best mother ever! I owe her all the good (if any) elements of my personality and a

large portion of the success in my life until now. For all these reasons I will always

be very grateful and proud of being her daughter.

I would also like to thank my brother Vassilios, for always being a good example

for me and undertaking a large portion of the burden as the protector of our family

after the loss of our father. I am also grateful to my brother’s family, his wife Ioanna

Andreopoulou, who I consider a true sister, and my two little “Princesses” Katerina

and Antonia, for the positive effect they have on me.

God has indeed been very generous with me by sending invaluable friends in my life.

I would first like to thank Dr Jessica Jalbert, Dr Dhiraj Catoor and Dr Sinan Karaveli

vii

for considerably helping me with my installation here in Providence. Special thanks

also go to the Perdikakis family, the parents Ann and Costas, and the children Rhea

and Damon Ray, Giana, and Dean for their support, caring and love. I am very

grateful for meeting and being part of this amazing family.

Last but not least I would like to express my gratitude to my heart friend Nektaria

for the continuing support, her kindness and thoughtfulness, and most importantly

for the great honor she did me to baptize her first born, Anna.

Unfortunately, due to space constraints, I have to finalize my list by thanking from

the bottom of my heart all the aforementioned people as well as many other valuable

friends, relatives and important persons in my life. Truly and deeply thankful for

their positive effect in my life, I dedicate my accomplishment to them all.

viii

Abstract of “Statistical Methods in Micro-Simulation Modeling:

Calibration and Predictive Accuracy”

by Stavroula Chrysanthopoulou, Ph.D., Brown University, May 2014

This thesis presents research on statistical methods for the development and evalu-

ation of micro-simulation models (MSM). We developed a streamlined, continuous

time MSM that describes the natural history of lung cancer, and used it as a tool for

the implementation and comparison of methods for calibration and assessment of pre-

dictive accuracy. We performed a comparative analysis of two calibration methods.

The first employs Bayesian reasoning to incorporate prior beliefs on model parame-

ters, and information from various sources about lung cancer, to derive posterior dis-

tributions for the calibrated parameters. The second is an Empirical method, which

combines searches of the multi-dimensional parameter space using Latin Hypercube

Sampling design with Goodness of Fit measures to specify parameter values that pro-

vide good fit to observed data. Furthermore, we studied the ability of the MSMs to

predict times to events, and suggested metrics, based on concordance statistics and

hypothesis tests for survival data. We conducted a simulation study to compare the

performance of MSMs in terms of their predictive accuracy. The entire methodology

was implemented in R.3.0.1. Development of an MSM in an open source statistical

software enhances the transparency, and facilitates research on the statistical prop-

erties of the model. Due to MSMs complexity, use of High Performance Computing

techniques in R is essential to their implementation. The analysis of the two cali-

bration methods showed that they result in extensively overlapping set of values for

the calibrated MSM parameters, and MSM outputs. However, the Bayesian method

performs better in the prediction of rare events, while the Empirical method proved

more efficient in terms of the computational burden. The assessment of predictive

accuracy showed that among the methods suggested here, hypothesis tests outper-

form concordance statistics, since they proved more sensitive for detecting differences

ix

between predictions, obtained by the MSM, and actual individual level data.

x

To my beloved family.

xi

Contents

Abstract ix

1 Introduction 1

1.1 Micro-Simulation Models (MSMs) . . . . . . . . . . . . . 3

1.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.1.2 Applications in health care research . . . . . . . . . . . 3

1.1.3 Development of an MSM . . . . . . . . . . . . . . . . . . 7

1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Micro-simulation model describing the natural

history of lung cancer 12

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Model description . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Model components . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Simulation Algorithm . . . . . . . . . . . . . . . . . . . . 26

2.2.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.1 Ad-hoc values for model parameters . . . . . . . . . . . 32

2.3.2 MSM output - Examples . . . . . . . . . . . . . . . . . . 37

2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

xii

3 Calibration methods in MSMs - a comparative

analysis 54

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1.1 Calibration vs estimation in statistical theory . . . . 55

3.1.2 Calibration methods for MSMs . . . . . . . . . . . . . . 57

3.1.3 Assessing calibration results . . . . . . . . . . . . . . . . 58

3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.2 Bayesian Calibration Method . . . . . . . . . . . . . . . 61

3.2.3 Empirical Calibration Method . . . . . . . . . . . . . . 62

3.2.4 Calibration outputs: interpretation and use . . . . . . 69

3.3 High Performance Computing in R . . . . . . . . . . . . 71

3.3.1 Software for MSMs . . . . . . . . . . . . . . . . . . . . . . 71

3.3.2 Example: computational burden of two MSM cali-

bration methods . . . . . . . . . . . . . . . . . . . . . . . . 72

3.3.3 Parallel Computing . . . . . . . . . . . . . . . . . . . . . . 74

3.3.4 Code architecture . . . . . . . . . . . . . . . . . . . . . . . 76

3.3.5 Algorithm efficiency: Bayesian vs Empirical Cali-

bration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3.3.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . 79

3.4 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . 82

3.4.1 Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.4.2 MSM parameters to calibrate . . . . . . . . . . . . . . . 84

3.4.3 Calibration Targets . . . . . . . . . . . . . . . . . . . . . . 85

3.4.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . 87

3.4.5 Terms of comparison . . . . . . . . . . . . . . . . . . . . . 96

3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

xiii

3.5.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3.5.2 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

3.6 Calibration Methods Refinement . . . . . . . . . . . . . . 118

3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

4 Assessing the predictive accuracy of MSMs 133

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

4.1.1 Assessment of MSMs . . . . . . . . . . . . . . . . . . . . . 134

4.1.2 Predictive accuracy of MSMs . . . . . . . . . . . . . . . 135

4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4.2.2 Concordance statistics . . . . . . . . . . . . . . . . . . . . 141

4.2.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . 145

4.2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . 148

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.3.1 Single run of the MSM . . . . . . . . . . . . . . . . . . . 150

4.3.2 Multiple runs of the MSM . . . . . . . . . . . . . . . . . 154

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

5 Conclusions 167

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

xiv

List of Tables

2.1 MSM simulation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 30

2.2 MSM ad-hoc parameter estimates: Onset of the first malignant cell . 35

2.3 SEER data on lung cancer at diagnosis . . . . . . . . . . . . . . . . . 36

2.4 MSM ad-hoc parameter estimates: Lung cancer progression . . . . . . 37

2.5 Predicted times to events: Males - Non smokers . . . . . . . . . . . . 39

2.6 Predicted times to events: Females - Non smokers . . . . . . . . . . . 39

2.7 Predicted times to events: Males - Current smokers . . . . . . . . . . 40

2.8 Predicted times to events: Females - Current smokers . . . . . . . . . 41

2.9 Predicted times to events: Males - Former smokers, quitting smoking

at age 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44


at age 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45


at age 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.12 Predicted times to events: Females - Former smokers, quitting smok-

ing at age 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47


ing at age 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48


ing at age 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

xv

3.1 Code efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

3.2 Reference population age distribution . . . . . . . . . . . . . . . . . . 84

3.3 Observed lung cancer incidence rates . . . . . . . . . . . . . . . . . . 86

3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.5 Number of microsimulations . . . . . . . . . . . . . . . . . . . . . . . 90

3.6 Summary Statistics - parameters . . . . . . . . . . . . . . . . . . . . 105

3.7 Summary statistics - predictions . . . . . . . . . . . . . . . . . . . . . 114

3.8 Assessing MSM predictions . . . . . . . . . . . . . . . . . . . . . . . . 118

3.9 Discrepancy - predictions . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.10 Summary statistics - Box plots . . . . . . . . . . . . . . . . . . . . . . 120

3.11 Summary Statistics - parameters (sub-analysis) . . . . . . . . . . . . 122

3.12 Summary statistics - predictions (sub-analysis) . . . . . . . . . . . . . 127

3.13 Discrepancy - predictions (sub-analysis) . . . . . . . . . . . . . . . . . 128

4.1 Assessment (toy.1, V=1) . . . . . . . . . . . . . . . . . . . . . . . . . 150

4.2 Assessment (toy.2, V=1) . . . . . . . . . . . . . . . . . . . . . . . . . 153

4.3 Assessment (toy.1, V=200, 400, 600, 800, 1000) . . . . . . . . . . . . 159

4.4 Assessment (toy.2, V=200, 400, 600, 800, 1000) . . . . . . . . . . . . 163

xvi

List of Figures

2.1 Markov State diagram of the lung cancer MSM . . . . . . . . . . . . 16

2.2 Lung cancer mortality: Non-smokers . . . . . . . . . . . . . . . . . . 39

2.3 Lung cancer mortality: Current smokers . . . . . . . . . . . . . . . . 42

2.4 Lung cancer mortality: Former smokers . . . . . . . . . . . . . . . . . 50

3.1 LHS implementation (N=5) . . . . . . . . . . . . . . . . . . . . . . . 66

3.2 LHS implementation (N=20) . . . . . . . . . . . . . . . . . . . . . . . 66

3.3 Micro-simulation size . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.4 Density Plots - parameters . . . . . . . . . . . . . . . . . . . . . . . . 103

3.5 Mahalanobis distances - parameters . . . . . . . . . . . . . . . . . . . 104

3.6 Bayesian method: Contours of calibrated parameters . . . . . . . . . 106

3.7 Empirical method: Contours of calibrated parameters . . . . . . . . . 107

3.8 Density plots - predictions (internal validation) . . . . . . . . . . . . 112

3.9 Density plots - predictions (external validation) . . . . . . . . . . . . 113

3.10 Mahalanobis distances - predictions . . . . . . . . . . . . . . . . . . . 115

3.11 Calibration plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

3.12 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

3.13 Density Plots - parameters (sub-analysis) . . . . . . . . . . . . . . . . 121

3.14 Bayesian method (sub-analysis): Contours of calibrated parameters . 123

3.15 Empirical method (sub-analysis): Contours of calibrated parameters . 124

3.16 Density plots - predictions - sub (internal validation) . . . . . . . . . 125

xvii

3.17 Density plots - predictions - sub (external validation) . . . . . . . . . 126

3.18 MH algorithm flow chart . . . . . . . . . . . . . . . . . . . . . . . . . 131

3.19 Bayesian Calibration flow chart . . . . . . . . . . . . . . . . . . . . . 132

4.1 KM curves - Observed vs Predicted survival (toy.1, V=1) . . . . . . . 151

4.2 KM curves - Observed vs Predicted survival (toy.2, V=1) . . . . . . . 153

4.3 KM curves - Observed vs Predicted survival (toy.1, V=200) . . . . . . 156




4.7 KM curves - Observed vs Predicted survival (toy.1, V=1000) . . . . . 158





4.12 KM curves - Observed vs Predicted survival (toy.2, V=1000) . . . . . 162

xviii

Chapter 1

Introduction

Comparative Effectiveness Research (CER), a novel research framework aimed at

developing broad-based comparative evidence on the outcomes of diagnostic and

therapeutic procedures, has recently attracted significant scientific attention. An

important component of CER is the development of new methodologies for empir-

ical and modeling studies that generate information appropriate for health policy

decisions. Within this context, a class of predictive models, the micro-simulation

models (MSMs), has attracted considerable attention among researchers. MSM’s

use information from various sources of medical research and clinical expertise to

simulate individual disease trajectories, i.e., trajectories that describe events asso-

ciated with the development of the target disease. The summarized results from

these individual trajectories are used to make predictions about long term effects of

a health policy intervention on a given population.

Micro-simulation models have been widely used in several fields. However, the sys-

tematic investigation of their statistical properties is only recently getting under way.

The main objective of this thesis is to address two of the key elements in the devel-

opment and evaluation of an MSM, namely, model calibration and prediction, from a

statistical point of view. To this end we first develop a streamlined micro-simulation

model that describes the natural history of lung cancer, and use it as a tool to explore

1

the statistical aspects of calibration and prediction for MSMs.

The thesis is divided into five chapters. The first chapter provides an introduction

and overview of the thesis. The second chapter focuses on the development of a

streamlined, continuous time MSM that describes the natural history of lung cancer

in the absence of screening and treatment interventions. This MSM serves as a

tool for the study of the statistical properties of MSMs in subsequent chapters.

In particular, the third chapter provides a comparative analysis of two calibration

methods, a Bayesian and an Empirical one, with application to this MSM for lung

cancer. The fourth chapter discusses the assessment of the predictive accuracy of an

MSM, using the lung cancer model. The dissertation concludes with a fifth chapter

which summarizes the main findings and concusions, and outlines the plans for future

work on the study of the statistical properties of MSMs.

2

1.1 Micro-Simulation Models (MSMs)

1.1.1 Overview

Micro-simulation models (MSMs) are complex models designed to simulate individual

level data using Markov Chain Monte Carlo methods. The first applications of

MSMs were in social policy in the late 1950s (Orcutt (1957)). In recent years,

MSMs are beginning to be used extensively in health policy and medical decision

making. MSMs in health policy problems are used to describe the natural history of

a disease in individual members of a cohort, usually in conjunction with the effect of

some intervention. To this end MSMs use mathematical equations with stochastic

assumptions to describe in detail complex observed and latent characteristics of the

underlying process. The inherent intricacy of MSMs posed serious time and cost

constraints in their development and implementation, especially during the first years

of their use. However, the advances in scientific computing in recent years have

contributed considerably to the improvement and expansion of new methodologies

and applications of MSMs in general, and to medical decision making in particular.

1.1.2 Applications in health care research

Rutter et al. (2011), provide a comprehensive review of micro-simulation models

used to predict health outcomes. The review highlights the usefulness of MSMs and

their continuously expanding role in medical decision making. It also indicates the

key steps in the development of a new MSM and discusses the essential checks of the

validity of the model. Finally the review points to the need for additional research on

the statistical properties of MSMs, especially the incorporation and characterization

of the model uncertainty.

Another very important application of MSMs is in the context of the Comparative

3

Effectiveness Research (CER), a rapidly growing area of research aimed at improving

health outcomes while reducing related costs. CER has recently attracted a great

deal of attention in the medical and scientific community. According to the American

Health and Human Services (HHS) department (109) CER is defined as:

“ the conduct and synthesis of systematic research comparing different inter-

ventions and strategies to prevent, diagnose, treat and monitor health condi-

tions.The purpose of this research is to inform patients, providers and decision-

makers, responding to their expressed needs, about which interventions are

most effective for which patients under specific circumstances. To provide this

information, CER must assess a comprehensive array of health-related out-

comes for diverse patient populations. Defined interventions compared may

include medications, procedures, medical and assistive devices and technologies,

behavioral change strategies, and delivery system interventions. This research

necessitates the development, expansion, and use of a variety of data sources

and methods to assess comparative effectiveness.”

Tunis et al. (112) provide a comprehensive introduction to CER in the context of

the recently enacted USA health care reform, and discuss the statistical challenges

in carrying out this research. The authors highlight the need for sufficient, credible,

relevant and timely evidence in the conduct of CER, and emphasize that ”the primary

purpose of CER is to help health-care decision makers make informed decisions at

the level of individual care for patients and clinicians, and at the level of policy

determinations for payers and other policymakers”. The conduct of CER comprises

a great variety of novel and existing methods in medical research, all of which can

be classified in five broad categories, i.e., systematic reviews, decision modeling,

retrospective analysis, prospective observation studies and experimental studies.

A key example of the use of CER in medical decision making, mentioned in both

the Tunis et al. (112) paper as well as the commentary by Gatsonis (27), is the

4

evaluation of diagnostic modalities for cancer. Both papers indicate the necessity for

individual-level information to assist decisions. However this type of information can

prove very costly, time-consuming or even totally impracticable due to the complex-

ity of the health-care setting. Therefore micro-simulation has risen to prominence as

a promising tool that can make projections about the impact of interventions (such

as screening) when applied to population cohorts, and inform health policies and

medical decision making. A characteristic example of the application of new mod-

eling techniques in Medical Decision Making (MDM) (including micro-simulation

modeling) is the research conducted by the Cancer Intervention and Surveillance

Modeling Network (CISNET) of NCI (http://cisnet.cancer.gov). The CISNET group

is a consortium of NCI-sponsored investigators with research interest focused on the

development and application of advanced statistical modeling. Its main objective

is to use advanced modeling techniques to better understand the effects of cancer

control interventions (prevention, screening, treatment, etc.) on individuals as well

as on population trends (incidence and mortality rates). The CISNET consortium

currently comprises five large groups focusing their research on five different types of

cancer: breast, colorectal, esophagus, lung and prostate cancer. Models developed

to describe each one of these types of cancer, can be used to guide health research

and priorities.

The complexity of an MSM can make its development a daunting task. However,

a valid MSM can be useful to many stakeholders. In particular, it can be used to

inform patients, providers and decision-makers and assist them in deciding on the

most effective and efficient intervention under certain circumstances. Despite their

complexity, MSMs hold some very “attractive” features that have distinguished them

from other useful tools for the conduct of CER. First, MSMs are designed to describe

and evaluate complicated processes when analytical formulas are not available. The

models focus on making predictions about individual patient trajectories rather than

5

describing the average patient. This, as already mentioned, is a key element of any

statistical tool used for the conduct of CER which is essentially patient-centered.

In addition, MSMs provide an easy way of representing time dependent transition

probabilities between major states of the disease course while, at the same time,

they facilitate the explicit incorporation of different sources of uncertainty intrinsic

to the system (stochastic, parameter, structural, etc). Furthermore they compile

and sometimes even reconcile contradictory facts about the disease process derived

from different sources (e.g. experimental studies, observational studies, expert opin-

ions, etc). MSMs also provide short or long-term predictions about the course of a

disease and the effect of interventions (e.g., screening schedule, treatment, etc) on a

population. In the case of simulating results from longitudinal studies, MSM based

projections can be available well in advance of the actual study conclusion. Finally,

MSMs can be used to produce large pseudo-samples, a very important feature es-

pecially in cases where the conducting of large, well designed studies (e.g. large

scale clinical trials, etc) is prohibitive by time and/or cost constraints or even ethical

considerations.

An example of the application of MSMs in health care is their wide use to evaluate

and compare cancer screening programs. In this setting an MSM is used to describe

the main stages of the natural history of the specific type of cancer and to model

the effect of screening on several aspects of a patient’s lifetime (e.g., survival time,

quality of life, etc). In many instances, the course of cancer can be divided into five

main stages: the disease free state, the onset of the malignancy (local state), the

involvement of detectable lymph nodes metastases (regional state), the involvement

of distant metastases and the death either from cancer or from other causes. Modelers

may be interested in all or only some of these stages. Several papers have studied

each of these disease states separately and have tried to fit complex mathematical

models on real data (41; 40; 43; 72; 75; 15; 61; 26; 33; 58; 59; 70; 102; 103). These

6

models aim to combine information from the biological process of the disease with

observed outcomes and describe the entire phenomenon in as much detail as possible.

Micro-simulation modeling can be used to combine all the models that describe the

essential parts of a disease process, and use the Monte Carlo method to simulate

individual patients’ trajectories.

1.1.3 Development of an MSM

The development of a micro-simulation model is a complex undertaking involving,

as any other statistical predictive model, three major building blocks, namely the

model specification, calibration and assessment.

Model specification refers to defining the structure of the model that will be used to

describe, analyze, and/or simulate the phenomenon of interest, including the nature

of the model (e.g., regression, Markov, etc), as well as the set of rules and assumptions

imposed. For a new MSM, in particular, describing the natural history of a disease,

model specification entails identification of the major distinct states of the disease

as well as stipulation of the transition rules among them, including the relevant

mathematical and distributions to describe the underlying stochastic process.

Calibration is the process of determining values of the parameters so as the model to

provide good fit to available data about the phenomenon of interest. In the context

of MSMing, calibration is analogous to parameter estimation followed in ordinary

statistical models (e.g., GLM).

Assessment, pertains to the model’s predictive performance, comprising overall model’s

performance and discrimination ability (105). Overall performance can be expressed

as the percentage of the explained, variation of the system (R2 statistics) as well

as proximity between observed and predicted quantities of interest (GoF statistics).

Discrimination, on the other hand, is the model’s ability to correctly classify sub-

7

jects (e.g. patients) with different characteristics based on the individual predictions

about the outcome of interest. The goal of this thesis is to explore these building

blocks through the development of a new, streamlined MSM describing the natural

history of lung cancer.

The main purpose of an MSM is to predict individual trajectories for the phenomenon

it describes (in MDM disease trajectories). These individual trajectories can be

point estimates of several quantities of interest (outputs) including time to events

(e.g., time to the development of lung cancer), binary responses (e.g., death from

lung cancer), or even estimates of continuous quantities (e.g., tumor diameter at

diagnosis).

As in any other type of statistical analysis, it is important to accompany point esti-

mates with some measures of variability, so as to give an idea about their precision.

In order to do so in the context of MSMing, it is very important to understand all

possible sources of uncertainty inherent in the model, and find a way to incorporate

them in the model estimates. Rutter et al. (92), identify the following sources of

uncertainty in MSMs:

• population heterogeneity : differences between individuals in the population

of interest, with a significant effect on the observed outcomes

• parameter uncertainty : variability due to the estimation of unknown model

parameters

• selection uncertainty : incorporation of information based solely on a small

portion of studies from the pool of available studies on the specific topic

• sampling variability : variability owing to the fact that the calibration data

are summary statistics estimated from a finite sample from the population of

interest

8

• stochastic uncertainty : variability due to the random numbers generation

procedure followed in the Monte Carlo approach for the evaluation and imple-

mentation of the MSM

• structural uncertainty : variability caused by the ignorance about the exact

procedure of the phenomenon described by the MSM and related to the model

assumptions (incertitude about the functional form of the model)

All the methods presented in this thesis, take into account the problem of the iden-

tification and characterization of MSM’s uncertainty.

1.2 Thesis Outline

The remainder thesis is divided into four chapters. Chapter 2 presents the develop-

ment of a streamlined continuous-time micro-simulation model (MSM) that describes

the natural history of lung cancer in the absence of screening and treatment compo-

nents. The chapter begins with an extensive literature review on the subject matter

of lung cancer history modeling and surveys use of MSMs in this area. The chap-

ter continues with the determination of the major distinct stages of the disease and

description of the set of rules and assumptions governing the MSM.

We kept the number of covariate classes to a minimum in order to achieve a man-

ageable level of model complexity. Therefore, the set of covariates in the model

comprises the gender, age, smoking history (age at beginning and quitting smoking)

and smoking habits (smoking intensity based on the average number of cigarettes

smoked per day) of each individual. Published results on several stages of lung can-

cer course are used for an ad-hoc specification of the model parameters. MSM’s

functionality is depicted using characteristic examples of model’s output given cer-

tain real life scenarios. The chapter also describes in detail the simulation algorithm

followed for the implementation of the model, and illustrates MSM’s performance by

9

running the model for several, characteristic, real life scenarios, and comparing MSM

predictions to knowledge attained in the field (i.e., lung cancer research). The main

objective in building this MSM is to serve as a tool for the comparative evaluation

of statistical methodologies for the model calibration, validation and assessment of

predictive accuracy described in subsequent parts of the thesis.

The second chapter discusses with the calibration of an MSM. Here, the literature

review includes references to methods used for the calibration of MSMs in medical

decision making specifically. The main objective of this chapter is to provide a

comparative analysis of two calibration methods for MSMs. To this end a simulation

study is designed and conducted, the results of which, comprise the basis of the

comparative analysis.

The first method is the Bayesian calibration developed by Rutter et al. (90) and

implemented on an MSM for colorectal cancer. The second method is a new empirical

calibration method. The idea underlying this method, is to combine some of the best

modeling practices currently applied for the empirical calibration of several types of

MSMs, including search algorithms of possible values from the multidimensional

parameter space, GoF statistics to assess model’s overall performance, convergence

criteria, stopping rules, etc. A key component of the new method relies on the

incorporation of the broadly used Latin Hypercube Sampling (LHS) design in the

searching algorithm for more efficient (compared to simple random sample) search

of the multidimensional parameter space of a (usually rather involved) MSM.

Both the Bayesian and the empirical calibration methods are implemented on the

continuous-time MSM for the natural history of lung cancer, described in the first

chapter. The comparison of the models uses both qualitative (e.g., efficiency, prac-

ticality, interpretation of calibration results, etc), as well as quantitative measures

assessing overall model’s performance (GoF statistics) including both internal and

10

external validation. Internal validation pertains to assessing model’s performance us-

ing exactly the same data that were used during the calibration procedure, whereas

for external validation purposes different data are used. In addition, graphical ways

for assessing model’s performance are also provided. The results from this compari-

son are used for recommendations regarding the use of these two, as well as similar

approaches in practice.

Although very widely used, at least to our knowledge, no systematic work has been

carried out yet on the assessment of an MSM’s predictive accuracy. The fourth chap-

ter is concerned with the assessment of the predictive accuracy of a “well” calibrated

MSM. Micro-simulation models are considered here as a special type of predictive,

survival models, since they predict actual survival times unlike other, broadly used

survival models, which predict hazard rates, or ratios (e.g., Cox Proportional Haz-

ards, Accelerated Failure time, etc., models). The extensive literature review aims at

identifying measures of predictive accuracy used in the context of survival modeling

that could also be applied for the assessment of an MSM.

Two broadly used methodologies are applied to the two calibrated MSMs resulted

from Chapter 3, namely, concordance statistics and methods aimed at comparing

predicted with observed survival curves. These approaches are adapted to the par-

ticularities of MSMs. The chapter compares the two methodologies, summarizes

findings from a simulation study, and concludes with suggestions about useful statis-

tics for the assessment of the predictive accuracy of an MSM.

In the last chapter of this thesis (Chapter 5) we summarize the main findings and

include future work related to our research.

11

Chapter 2

Micro-simulation model describing the natural his-

tory of lung cancer

In the first chapter we develop a new, streamlined, continuous time micro-simulation

model (MSM) that describes the natural history of lung cancer in the absence of

any screening or treatment component. This is a predictive model that simulates

individual patient trajectories given a certain set of covariates, namely the age, gender

and smoking history. The model structure is in line with methods , and combines

findings from several sources, related to lung cancer research. This new MSM predicts

the course of lung cancer for each individual, from the initiation of the first malignant

cell, to the tumor progression to regional and distant stages, until death from lung

cancer (or some other cause), or the end of the prediction period. The main goal is

for the model to serve as a tool to explore, in subsequent chapters, some properties

of the MSMs from a statistical point of view. In particular, the research focus will

be on model calibration and assessment of the model’s predictive accuracy. The

model is developed using the open source statistical software R.3.0.1, in order to

enhance its transparency and explore the potentiality of this software to be used for

the development of MSMs in general.

The chapter begins with background information regarding MSMs currently used

12

to describe the natural history of lung cancer. The main part of the chapter is

dedicated to the description of the new, streamlined MSM for lung cancer that we

develop here. The second section describes the main model components, namely the

distinct disease states, the set of transition rules between them, the distributions

and mathematical equations describing the particularities of the process as well as

an account of the model parameters. Thereafter, we present in detail the simulation

algorithm followed to predict individual trajectories. The next section pertains to the

explanation of the process followed for the determination of some ad-hoc values for

the model parameters in conjunction with a brief description of the data used for this

purpose. Model performance is exemplified by running the MSM under hypotheti-

cal scenarios, i.e., for different individual baseline characteristics including smoking

habits. The chapter concludes with discussion on the overall model’s performance,

advances, and shortcomings, as well as future work on this topic.

13

2.1 Background

Micro-simulation models (MSMs) are complex models designed to simulate individual

level data using Markov Chain Monte Carlo methods. Several micro-simulation

models have been developed in order to describe the natural history of lung cancer.

Two of the most comprehensive and widely used ones are the Lung Cancer Policy

Model (LCPM) developed by McMahon (70), and the MIcro-simulation ScCreening

ANalysis (MISCAN) model by Habbema et al. (38). Other simplified versions of

MSMs for lung cancer can be also found in the literature (Goldwasser (33), Hazelton

et al. (40), etc).

The LCPM is a discrete time epidemiological MSM that combines information related

to multiple stages of lung cancer mainly based on epidemiological models. The

MISCAN model on the other hand is a continuous time MSM that in addition takes

into account the biology of the tumor cells (latent process). Noteworthy is the fact

that all the MSMs that have been developed to describe the course of lung cancer

take into account the smoking history and smoking habits in the prediction of lung

cancer risk and mortality.

McMahon et al. (71) and Shi et al. (98) present two representative applications of

the aforementioned models in medical decision making. The first paper presents

the application of the Lung Cancer Policy Model (LCPM) to assess the long-term

effectiveness of lung cancer screening in the Mayo CT study, an extended, single-arm

study aiming to evaluate the effect of helical CT screening for lung cancer on current

and former smokers. Here, the LCPM micro-simulation model is used to simulate

the end results of interest for pseudo-individuals of a hypothetical control arm, i.e.

in the absence of any screening program.

The second paper refers to the application of the MISCAN micro-simulation model

14

for lung cancer to explore a number of hypotheses that could potentially explain the

controversial finding of the Mayo Lung Project (MLP), namely the increase in lung

cancer survival since the time of diagnosis without a corresponding reduction in lung

cancer mortality. In this case, the authors modify the MISCAN model parameters

accordingly so as to simulate pseudo-individuals under different tested scenarios that

could possibly explain that controversial finding, such as, over-diagnosis, screening

sensitivity, and population heterogeneity. They subsequently fit each model on real

data from the MLP randomized clinical trial and compare their goodness of fit (GoF)

to that of the simplest model, i.e. the one in which the model parameters related to

the hypotheses of interest are set to their neutral values. For instance, a parameter

for indolent cancers is introduced in the model to account for possible effect of over-

diagnosis. Only a notable improvement of the GoF measure (deviance) could strongly

support the validity of the scenario under consideration. For example, if the model

with the indolent cancer parameter does not decrease the deviance measure resulted

from the simpler model, then the micro-simulation result does not support over-

diagnosis as the reason for the controversial finding of the Mayo Lung Project.

In both papers noteworthy is the fact that results from the MSM application are

presented only as point estimates of the quantities of interest, lacking any measure of

precision. This is very typical in studies involving use of micro-simulation modeling.

15

2.2 Model description

We have developed a new, streamlined, continuous time MSM that describes the

natural history of lung cancer in the absence of any screening or treatment compo-

nent. This is a Markov model in the sense that it satisfies the Markovian property,

i.e., the transition to any subsequent state depends exclusively upon the state the

process currently resides.

The Markov state diagram in figure 2.1 depicts the five distinctive states of the

model, i.e. the disease free state (S0), the onset of the first malignant cell (local

state, S1), the beginning of the regional (lymph node involvement, S2), and distant

stage (involvement of distant metastases, S3), and eventually the death (S4) state. In

the same figure hij denotes the hazard rate characterizing the transition from state

i to state j.

Death can be attributed to either lung cancer or other causes. In order to consider

that a lung cancer death occurred, the individual has to move from state S3 to S4.

That is, the model assumes that death from lung cancer can occur only after the

tumor is already in distant stage.

Figure 2.1: Markov State diagram of the lung cancer MSM

The model essentially consists of the absorbing state of death (S4) and four “tunnel”

states. The “tunnel” states are consecutive states stipulating the specific course of

16

the phenomenon described in the Markov state diagram (101). According to the

Markov state diagram presented in figure 2.1, from a disease free state at some time

point the first malignant cell initiates (local stage), and proliferates up to the point of

lymph nodes involvement (transition to regional stage). The tumor progresses from

this stage to the involvement of distant metastases, and eventually causes death from

lung cancer unless death from some other cause precedes. As already mentioned, a

key model assumption is that it is very unlikely to observe death from lung cancer

without previous involvement of distant metastases.

The development and course of lung cancer in a person’s lifetime according to this

model is stipulated by a set of transition rules described in detail hereafter. Estimates

of the model parameters are obtained from a thorough literature review on the topic

including various sources (e.g. RCTs, case-control and cohort studies, meta-analyses,

expert opinions, etc). These estimates are used in the present chapter as ad-hoc

values for working examples of MSM’s performance, while, in subsequent chapters

they will serve as guidance for the specification of plausible values for the MSM

parameters. Simulations on individual level basis are carried out using the Monte

Carlo method. In particular, this approach involves the generation of a fair amount

of individual trajectories resulting in a large number of independent and identically

distributed natural histories in each covariate class. This trajectories are summarized

so as to get an indication of the predicted quantities of interest, e.g. lung cancer

incidence and mortality rates overall and by covariate group, etc.

2.2.1 Model components

Onset of the first malignant cell

We model the onset of the first malignant cell using the exact solutions, for the

expression of the hazard rates and the survival probabilities, of the biological two-

17

stage clonal expansion (TSCE) model (75). For piecewise constant parameters, the

hazard function for the development of the first malignant cell is (44):

h(t) =νµX(e(γ+2B)t − 1)

γ +B(e(γ+2B)t + 1)(2.1)

that for piecewise constant parameters, γ and B can be determined using the follow-

ing parameterization:

with γ = α− β − µ and B = 12(−γ +

√γ2 + 4αµ)

where X is total number of normal cells, ν is the normal cell initiation rate, α is the

division rate of initiated cells, β is the apoptosis rate (death or differentiation) of

initiated cells, and µ is the malignant conversion rate of initiated cells.

Following equation 2.1, the cumulative hazard function is:

H(t) =νµX

γ +B·(− t+

1

B· log

(γ +B +B · e(γ+2B)t

))(2.2)

Previous empirical data analyses with the TSCE model, exploring the dose-response

relationship of smoking with lung cancer incidence indicated that power laws are good

approximations to this relationship (40; 41). In the same studies X=107 has been

provided as a plausible figure for the total number of normal stem cells. Furthermore,

the TSCE multistage model allows tests for differences in the initiation, promotion

and malignant conversion rates of the course of lung cancer between population sub-

groups. Previous analyses of lung cancer incidence data in the nurses (NHS) and

the health professionals (HPFS) follow-up studies revealed a significant difference

in tobacco-induced promotion and malignant conversion rates between males and

females (72).

18

We incorporate these findings, about the effect of smoking on the onset of the first

malignant cell, in our model. In particular, if q(t) denotes the smoking intensity at

age t, expressed as average cigarettes smokes per day, the effect it has on the α and

γ rates is described by the following, power law relationships:

α = α0(1 + α1q(t)a2) and γ = γ0(1 + α1q(t)

a2)

where γ0 and α0 are the coefficients for the non-smokers. To account for differences

between men and women, as well as between current smoking habits we assume dif-

ferent hazards (function of age t) corresponding to all possible combinations of gender

(male/female), and smoking (never/former/current smoker). For each individual the

time period from birth (t=0) to the onset of the first malignant cell can be split into

k intervals in which the hazard rate is constant and depends on the person’s smoking

status (smoking or not) within this interval. For simplicity reasons we only assume

up to two possible changing time points in a lifetime; time at starting (τ1) and time

at quitting (τ2) smoking, where relevant.

The survival function S(t) for the development of lung cancer is:

S(t) = exp{−H(t)} = exp

{−∫ t

0

h(x)dx

}(2.3)

Depending on the smoking status of each person we discern the following three

possible scenarios:

• Never smoker:

S(t) = exp

{−∫ t

0

h(x)dx

}(2.4)

• Current smoker:

S(t) = exp

{−∫ τ1

0

h(x)dx−∫ t

τ1

h(x)dx

}(2.5)

19

• Former smoker:

S(t) = exp

{−∫ τ1

0

h(x)dx−∫ τ2

τ1

h(x)dx−∫ t

τ2

h(x)dx

}(2.6)

Tumor growth

Several studies have showed an inverse correlation between the tumor growth and its

size, namely the tumor growth rate is usually non-constant, but decreases steadily.

According to these studies the Gompertz function provides a good approximation

of the tumor growth for most cancer types and describes the specific process more

efficiently than, e.g., the exponential distribution (15). The Gompertz model suggests

the proliferation of tumor cells by a modified exponential process in which successive

doubling times occur at increasingly longer time intervals (61). Hence the Gompertz

function stipulates a shorter pre-clinical period than the exponential model, and

longer survival after diagnosis.

The model assumes a Gompertzian (61) tumor growth, i.e. the tumor volume at age

t is:

V (t)

V0

= esm

(1−e−mt) (2.7)

where V0 and V(t) represent the initial tumor volume (volume of the first malignant

cell) and the tumor volume at age t respectively and m, s are the location and scale

parameters of the Gompertz function.

The hazard rate of the Gompertz distribution as a function of time t is (26):

r(t) = s · e−mt (2.8)

The time at which the tumor has reached volume V(t) can be found using the inverse

20

of the Gompertz function:

t = log

[1− m

slog

(V (t)

V0

)]− 1m

(2.9)

In order this equation to be defined values for the Gompertz parameters (m, s) should

be chosen so as:

1− m

slog

V (t)

V0

> 0, ∀ s⇒ s > m · log

(VmaxV0

)(2.10)

This limitation is very important especially in the specification procedure of the

model parameters either as ad-hoc values or in a regular calibration setting.

Moreover, assuming a spherical tumor growth (i.e. symmetric towards all directions),

the tumor size at age t is a function of its diameter at that age (d(t)), and is calculated

using the sphere volume formula:

V (t) =π

6[d(t)]3 (2.11)

The tumor volume limits are stipulated from the minimum and maximum possi-

ble diameter. The minimum diameter (diameter of one cancerous cell) is set to

d0=0.01mm (70; 29; 15) while the maximum diameter (tumor diameter that causes

death) is set to dmax=13cm (15).

In order to keep the model parameterization to a minimum, so as the model to be

more flexible and easily handled for the purposes of subsequent analyses (calibration

and assessment), we assume the same Gompertz distribution for all tumors irrespec-

tive of lung cancer type.

21

Disease progression

Disease progression of an existing lung cancer can occur via nodal involvement and

distant metastases (70). Current MSMs for lung cancer(70; 33) adopt, in their disease

progression parts, methodologies developed to describe the progress of breast cancer

(26; 86; 59; 110).

Previous studies (59; 58; 102; 103) have shown that, given a Gompertzian tumor

growth, the distribution of tumor volumes at specific time points can be adequately

described using the log-Normal distribution. In particular, let (Vreg, Treg), (Vdist,

Tdist) and (Vdiagn, Tdiagn) the pairs of tumor volume and age at the beginning of

regional or distant stage, and at time to diagnosis (clinical detection) respectively.

We use distributions logNormal (µreg, σ2reg), logNormal (µdist, σ

2dist), and logNormal

(µdiagn, σ2diagn) to simulate tumor volumes Vreg, Vdist, and Vdiagn respectively.

In addition, the simulated tumor volumes are subject to the following restrictions:

V0 < Vreg < Vdist < Vmax and V0 < Vdiagn < Vmax (2.12)

Given the tumor volume and its growth rate we can find the time (age) at which

the tumor has reached the specific volume. The tumor progression according to

the MSM for lung cancer proposed here, relies on several key assumptions. First

of all there is a positive correlation between the tumor size and the probability of

symptomatic detection, i.e. the larger the tumor size, the higher the probability to

be clinically detected. The beginning of the local stage is when the first malignant

cell develops. The transition from regional to distant stage is defined to occur at

the moment distant metastatic disease first becomes detectable by usual clinical

care. In addition the transition to the distant stage presupposes a tumor already

at regional stage which in turn develops only after the transition to a local stage.22

Finally, another very important assumption implied by this model is that there are

no large differences in the growth rates and the tumor size and stage distributions

across different covariate classes (age-gender-smoking status group).

The disease progression model also implies that no symptom detection was possible

due to lymph node involvement or benign lesions whilst patients with symptom

detected distant metastases are by assumption M1 (according to the TNM staging

sytstem (76) ) with probability equal to 1. Furthermore, the conditional distribution

of the tumor stage given its size at clinical diagnosis is considered multinomial. When

defining the ad-hoc values for the model parameters we use the observed, in SEER

data, frequencies of local, regional and distant cancers by size at diagnosis presented

in Table 2.3. According to this table there are no large differences between males and

females, hence we assume the same tumor volume distributions for the two genders,

and try to fit the overall size information.

Survival

Competing risks

In a multi-state model as the MSM for the natural history of lung cancer presented

here, calculation of the survival probabilities is a rather complicated task due to the

presence of competing risks. The competing risks issue arises when individuals are

subject to risk factors the can cause two or more mutually exclusive events (54).

Smoking, for instance, is strongly associated with both lung cancer and other cause

death. Hence, when modeling lung cancer mortality, by taking into account risk

factors such as age, smoking habbits, etc, death from other causes is the competing

risk since it precludes death from lung cancer.

A significant amount of work has been done on the problem of competing risks, a

concise summary of which can be found in Moeschberger and Klein (1995). The

usual practice is to assume independence among the competing risks and use some

23

conventional non-parametric (e.g. Kaplan-Meier estimator) or semi-parametric (e.g.

Cox Proportional Hazards model) method to estimate the survival probabilities.

In cases where the independence assumption is not valid more complicated methods

should applied. The reason is that simple Kaplan-Meier estimators of the net survival

probabilities by cause of death are not enough to describe the mortality rates in this

setting. Crude probabilities that express the probability of death from a specific

cause after adjusting for other causes of death should be used instead. One way

of expressing these crude probabilities is by using a cause-specific, sub-distribution

function, i.e., the Cumulative Incidence Function (CIF).

In the natural history model for lung cancer each person faces the risk of dying from

lung cancer (main event of interest) or dying from some other cause (competing

risk). In order to express the lung cancer survival probability accounting for the

competing risk, we employ the CIF techniques described in Gray (1988), and Fine

and Gray (1999) that have also been incorporated in the R statistical package library

“cmprsk”. In particular, let Yi be the number of individuals at risk, li the number

of those who died from lung cancer, and oi the number of those who have died from

some other cause by time ti. Here t1 <t2 <...tk represent all the distinct times at

which a competing risk occurs. In this setting, li+oi is the total number of individuals

experiencing any of the competing risks (here death from any cause) at time ti.

The CIF in this case is defined as:

CIF (t) =

∑

ti≤t

{∏i−1j=1

[1− lj+oj

Yj

]}ljYj

, if t1 ≤ t

0 , otherwise(2.13)

Note here that, for t1 ≤t, CIF (t) =∑

ti<tS(t−i)

liYi

, where S(t−i) is the Kaplan-Meier

estimator, evaluated at time just before ti, considering death from causes other than

lung cancer as the event of interest. Hence, the CIF estimates the probability that

24

the event of interest (death from lung cancer) will occur before time t, and before

the occurrence of any competing risk (death from other causes).

Note here that, as already mentioned, a very important assumption made here is that

death from lung cancer is unlikely to occur without the previous detection of distant

metastases (symptomatic or not). We compute the CIF using combined information

from the NHIS and the SEER data (section 2.3.1).

25

Other cause mortality

Given the main covariates of interest, namely age, gender, smoking status (current,

former or never smoker), and smoking intensity expressed as average number of

cigarettes smoked per day, we use the non-parametric estimates we get for the CIF

using the observed NHIS data. The MSM simulations depend on the strong assump-

tion that the death patterns observed in these data do not change dramatically over

time. Hence they are also relevant to the prediction period we are interested in.

Lung cancer mortality

Using the SEER data we get non-parametric estimates of the CIF given the individ-

ual’s characteristics at the time of clinical (symptomatic) detection of lung cancer.

In particular, the CIF estimates are grouped by age (5 years age bins), gender, tumor

size (tumor diameter: ≤2cm, 2-5cm, and >5cm) at diagnosis. Given these estimates

we can simulate the time to death from lung cancer after the symptomatic detection

of lung cancer using an inverse CIF search approach.

2.2.2 Simulation Algorithm

In this section we describe in detail the algorithm we follow in order to run a single

micro-simulation, i.e., to predict the lung cancer trajectory of an individual with

certain baseline characteristics.

Simulate baseline characteristics

For each person we either have access or simulate some baseline characteristics that

will be used as input in the model to make predictions. In particular, from each

sample, for which predictions regarding the course of lung cancer are to be made, we

either have the individual records or some information regarding the distribution of

the main covariates of interest, i.e. age, gender and smoking history. The smoking

26

history includes the age at starting and quitting smoking (where relevant) as well

as the smoking intensity expressed as the average number of cigarettes smoked per

day. Given the form of the available information (individual records or overall sam-

ple distributions) we simulate the baseline characteristics using bootstrap method

(randomly drawing with replacement from the available data). The set of baseline

characteristics stipulates the covariate class g each individual belongs to.

Time to death from other causes

→ Draw uo1 ∼ Unif(0, 1) and uo2 ∼ Unif(0, 1)

→ Compare uo1 to the non-parametric estimate CIFg(t) from the NHIS data and

find the closest estimate to uo1 in order to specify the time interval during

which death from other causes can occur for this person. That is, for {t :

min |u01−CIF (t)|}, we assume that death from a cause other than lung cancer

for that person may occur between [t, min(ti) ≥ t].

→ Use the uo2 to assign the specific time point (age) at which death occurs within

the pre-specified [t, min(ti) ≥ t] time interval (key assumption: the time at

which death from other causes occurs is uniformly distributed within the pre-

specified interval).

Time to the onset of the first malignant cell

Given the baseline covariates we simulate the time (age) to the first malignant cell

(Tmal) based on the exact formulas of the hazard function according to the TSCE

model, as described in section 2.2.1. In particular:

→ Draw um1 ∼ Unif(0, 1)

→ Use numerical integration1 to find age t such that S(t) = um1 ⇒ t = S−1(um1)

1Given the S(t) we use the ”uniroot” function in R to solve the expression exp{−∫ t

0h(x)dx} -

S(t)=0 for t, where t is the age at the onset of the first malignant cell in years.

27

where S(t) is the survival function (eq.2.3), and h(t) is the respective hazard rate

(eq.2.1). Depending on the smoking status the survival probability is given by the

equations (2.4 - 2.6). For each patient we either have the detailed smoking history,

i.e., the exact τ1 and τ2 ages or we can estimate the average age at starting and

quitting smoking from available data (e.g., McMahon (2005)).

Disease progression

Assuming the same parameters for the tumor growth, volume and stage at diagnosis

across the covariate classes of interest, we simulate the tumor progression as follows:

→ Draw Vreg ∼ logNormal(µreg, σ2reg), and Vdist ∼ logNormal(µdist, σ

2dist)

→ Repeat the previous step until drawing the first pair (Vreg, Vdist) with:

V0 < Vreg < Vdist < Vmax

→ Draw Vdiagn ∼ logNormal(µdiagn, σ2diagn) with V0 < Vdiagn < Vmax

→ Calculate the tumor diameters dreg, ddist, and ddiagn using the sphere volume

formula (eq.2.11).

→ Find the times (ages) treg, tdist, and tdiagn using the inverse Gompertz function

(eq. 2.9).

→ Simulate ages at the beginning of the regional (Treg) and distant stage (Tdist),

as well as age at diagnosis, given age at the onset of the first malignant cell

(Tmal), as:

– Treg = Tmal + treg

– Tdist = Tmal + tdist

– Tdiagn = Tmal + tdiagn

→ Find the tumor stage at diagnosis comparing Vdiagn to Vreg and Vdist (or,

28

alternatively, Tdiagn to Treg and Tdist)2

Time to death from lung cancer

Given the age (Tdiagn), tumor size (ddiagn) and tumor stage at diagnosis we can

simulate the time to death from lung cancer using the non-parametric estimates

CIF(t, g) we get for the CIF from the SEER data as follows:

→ Draw ul1 ∼ Unif(0, 1) and ul2 ∼ Unif(0, 1)

→ Compare ul1 to the non-parametric estimate CIF(t, g) from SEER data, and

find the closest estimate to ul1 in order to specify the time interval during which

the death from lung cancer can occur for this person.

→ Use the ul2 to assign the specific time point (age) at which death occurs within

the pre-specified time interval3 (key assumption: the time at which death from

lung cancer occurs is uniformly distributed within the pre-specified interval).

Comparing the simulated times resulting in from the aforementioned simulation pro-

cedure, we ”tell the story” for each individual with certain characteristics regarding

the development and course of lung cancer in his lifetime. This ”story” is the pre-

dicted individual trajectory resulting after completing one micro-simulation. Table

2.1 recapitulates the main steps of the simulation algorithm to be followed in order

to predict the trajectory of an individual with certain baseline characteristics.

2The decision about the quantities compared for the specification of the tumor stage at diagnosismay be very important when, for example, improvement of the algorithm’s efficiency is a key issue,as it is the case with MSM’s calibration (chapter II).

3The length of the pre-specified time intervals varies, and is related to the discontinuity in thenon-parametric estimate of the CIF

29

Table 2.1: Continuous time MSM for lung cancer: simulation algorithm to predictthe lung cancer trajectory of an individual.

1. Simulate baseline characteristics g=(age, gender, smoking history1).

2. Simulate age to death (Td other) from a cause other than lung cancer

given age, gender, and smoking status.

3. Simulate age to the onset of the first malignant cell (Tmal), given gender,

smoking status, smoking history (age at starting and quitting smoking),and smoking intensity.

4. Simulate ages at the beginning of the regional (Treg) and the distant

stage (Tdist) given the tumor growth rate.

5. Simulate age (Tdiagn) and tumor diameter (ddiagn) at diagnosis. Find

tumor stage comparing Tdiagn with Treg and Tdist.

6. Simulate age to death from lung cancer (Td lung) given the simulated

individual’s characteristics at diagnosis (Tdiagn and tumor stage).

7. Compare the simulated ages Tdother, Tmal, Treg, Tdist, Tdiagn, and Td lung

to ”tell” a story for the specific individual with g set of covariates, i.e.,

to predict that individual’s trajectory.

1Smoking history includes: smoking status (never, former or current smoker), and smoking intensity (average

number of cigarettes smoked per day)

30

2.2.3 Software

To enhance transparency the model is developed in the open source statistical soft-

ware R (version 2.15.2). A comprehensive R code describes the model structure (set

of transition rules and assumptions). Given the model parameters (either ad-hoc

or calibrated values) for an individual with specific characteristics (set of covariate

values) the model stipulates the times to the transition to each state. Combining all

the simulated times together, gives the predicted trajectory of this specific individual

in regards to the development of lung cancer.

Handling random numbers

The implementation of a large number of simulations, required for the evaluation of

a complex process using micro-simulation modeling, necessitates a special consider-

ation and treatment of the massive quantity of random numbers generated. For this

purpose we use the methodology described in Leydold and J. (2005) regarding the

generation of independent streams of random numbers for stochastic simulations,

that was motivated by the work on the object-oriented random number generator

(RNG) with streams and substreams presented in the L’Ecuyer et al. (2002) pa-

per. The adoption of the specific methodology, among other things, ascertains the

generation of “statistically independent“ streams- i.e., independent random numbers

despite the enormous size of random numbers produced - thus avoiding unintended

correlations between the several parts of the simulation algorithm. For the implemen-

tation of this methodology, we use the built-in functions included in the “rlecuyer“

package in the R library.

31

2.3 Application

2.3.1 Ad-hoc values for model parameters

The MSM for lung cancer we propose here comprises a set of parameters repesenting

both latent and observable variables as well as describing the distribution of certain

characteristics of the underlying process. Typically the stipulation of MSM parame-

ters involves extensive calibration procedures (chapter II). The goal of this section is

simply to exemplify the model’s performance by running MSM under hypothetical

scenarios. For this purpose, in this chapter, we use some ad-hoc point estimates

for the model parameters. In this section we describe the determination of those

ad-hoc values that can be used as model inputs to run micro-simulations and predict

individual trajectories of lung cancer patients.

Onset of the first malignant cell

Several studies have tried to elucidate the biological process of lung carcinogenesis

by fitting the TSCE model on real data (75; 64; 41; 40; 72). As ad-hoc values for the

TSCE model parameters we use the point estimates reported in Hazelton et al. (40)

resulting from the analysis of the second Cancer Prevention Study (CPS II). Table

2.2 provides the complete list of parameters related to the specification of the age at

the onset of the first malignant cell, depicts the ad-hoc values (point estimates along

with 95% CIs) used for some of these and indicates the type and order of calculations

used for the determination of the rest of them.

Tumor growth and disease progression

The ad-hoc values for the location and scale parameters of the logarithmic distri-

bution describing the tumor volume distribution at clinical detection come from the

Koscielny et al. (1985) study. This paper studies the initiation of distant metastasis

in breast cancer. In particular, it compares two different patterns of tumor growth,

32

that is an exponential and a Gompertzian one, with respect to their fit on avail-

able data concerning distributions of tumor volume at diagnosis, as well as tumor

doubling times. Results from this paper agree with findings from previous studies

(103) indicating that the tumor growth in humans can be better described using the

Gompertz function rather than assuming an exponential curve.

The relationship between the Gompertz distribution parameters (m, s) describing

the tumor growth, results from the restriction related to the definition of the inverse

Gompertz function (eq. 2.9) for the specification of age t when tumor reaches size

V(t). According to this:

1− m

slog

(V (t)

V0

)> 0⇒ s > m · log

(V (t)

V0

), ∀ V (t)

s > m · log

(VmaxV0

)(2.14)

Given the tumor volume at diagnosis (Vdiagn) we can calculate the age (Tdiagn) at

which the tumor reached this volume using again (2.9). The doubling time as a

function of age Tdiagn is:

DT = − 1

mlog[1− m

s· log(2) · exp(m · Tdiagn)

](2.15)

For m=0.00042, and s=31·m, the mean doubling time is close to the observed one

recorded in previous studies (70), while (2.14) is satisfied. Finally the logNormal

location and scale parameters for Vreg and Vdist are specified so as to reproduce

distributions of tumor stage at diagnosis by size similar to what has been observed

in SEER data (Table 2.3).

Mortality data

Estimates of lung cancer and other cause mortality rates are based on data from two

major sources: the National Health Interview Survey NHIS and the Surveillance,

33

Epidemiology and End Results SEER data respectively.

Both databases are representative of the US population and constitute the main

source of information about baseline characteristics, health risk factors as well as

incidence and mortality rates in the entire population. The NHIS is a national

cross-sectional survey aimed at monitoring the national health patterns since 1957.

NHIS collects data about several demographic characteristics, risk factors and health

statuses of the US population. It also provides information about the age and cause

of death. From the large pool of available NHIS data we worked with the Integrated

Health Interview Series (IHIS) harmonized set of data. The IHIS variables are given

consistent codes and have been thoroughly documented to facilitate cross-temporal

comparisons. The SEER program provides information on cancer statistics in an

effort to reduce the burden of cancer among the US population. In particular, SEER

data record information regarding the incidence and mortality cancer rates by certain

demographic characteristics of a geographic sample representing the 28 percent of

the US population since 1973.

We based our estimates on lung cancer incidence and mortality on SEER data cov-

ering the interval from 1973 to 2008 and IHIS data from 1986 to 2004.

The model is structured so as to predict the main events of interest, i.e. lung cancer

incidence and mortality, based on the gender, age, and smoking history of a person

including the average time at starting and quitting smoking as well as the average

smoking intensity. The NHIS data only provide information about the age, gender,

smoking, and, when relevant, cause of death. Information about smoking, in partic-

ular, includes, for current smokers at the time of the study, the number (“heaviest

amount”) of cigarettes smoked per day, grouped in four categories, namely “less than

15”, “15-24”, “’25-34’, and “35 or more” cigarettes. On the other hand, the SEER

data also record the age and gender while in addition they provide information re-

34

garding the age, tumor size and stage at clinical diagnosis as well as the age and

cause of death. Therefore we need a way to combine the information coming from

these two, representative of the US population, datasets in order to simulate the

time and cause of death given the age, gender, smoking history and tumor stage at

diagnosis. As already mentioned, the cause of death is classified as lung cancer or

other cause.

Table 2.2: Ad-hoc values and calculations for the MSM parameters relatedto the onset of the first malignant cell.

ParameterGender

TypeMales Females

X 107 107 fixedv0 7.16·10−8 1.07·10−7 fixed

(4.6·10−8, 1.21·10−7) (6.97·10−8, 1.62 · 10−7)α0 7.7 15.82 fixed

(6.45, 12.99) (13.39, 42.12)γ0 0.09 0.071 fixed

(0.071, 0.106) (0.055, 0.088)v1 0.00 0.02 fixed

(0.00, 1.76) (0.00, 12.5)α1 0.6 0.5 fixed

(0.43, 0.91) (0.27, 0.86)α2 0.22 0.32 fixed

(0.12, 0.30) (0.14, 0.40)v v0(1-v1) calculatedγ γ0(1+α1 · [q(t)]α2) calculatedα α0(1+α1 · [q(t)]α2) calculatedµ0 v0 calculatedµ µ0 calculatedβ0 α− µ− γ calculatedPoint estimates are extracted from the analysis of the CPS II study data (40).

Hazard function: h(t) = [νµX(e(γ+2B)t − 1)]/[γ +B(e(γ+2B)t + 1)]

where, B=(1/2)(−γ +√γ2 + 4αµ)

35

Table 2.3: Tumor stage by size at diagnosis (SEER data).

SizeStage

Overalllocal regional distant Total

≤ 2cm 6031(48%) 2705(21%) 3868(31%) 126042-5cm 7050(24%) 8348(29%) 13894(47%) 29292≥ 5cm 1387(9%) 4803(29%) 10112(62%) 16302

Males≤ 2cm 2518(44%) 1238(22%) 1957(34%) 57132-5cm 3445(23%) 4352(29%) 7228(48%) 15025≥ 5cm 810(8%) 2921(31%) 5857(61%) 9588

Females≤ 2cm 3513(51%) 1467(21%) 1911(28%) 68912-5cm 3605(25%) 3996(28%) 6701(47%) 14302≥ 5cm 577(9%) 1882(28%) 4255(63%) 6714

Table 2.4 provides a complete list with the ad-hoc values for the MSM’s parameters

related to the tumor growth and disease progression. As already mentioned in the

simulation procedure of the specific parts of the model, non-parametric estimates of

the CIF from the NHIS and SEER data are used as fixed model inputs.

36

Table 2.4: Ad-hoc values and calculations for the model parameters related to thelung cancer progression

Quantity Value

Tumor growth

Diameter of one malignant cell* d0 = 0.01mmDiameter of one malignant cell* dmax = 130mmTumor volume of diameter d** v = π

6d3

Parameters of the Gompertz m = 0.00042distribution** s = 31 · m

Disease progression**

Parameters of the logNormal distribution for tu-mor volume at the beginning of the regional stage

µreg = 1.1 , σreg = 1.1

Parameters of the logNormal distribution for tu-mor volume at the beginning of the distant stage

µdist = 2.8 , σdist = 2.8

Parameters of the logNormal distribution for tu-mor volume at diagnosis

µdiagn = 3.91 , σdiagn = 3.91

* Values stipulated from the lung cancer literature.

** Values specified by the modeler to match data.

2.3.2 MSM output - Examples

In this section we present predictions after multiple runs of the MSM under different

scenarios. The focus is on lung cancer incidence and mortality of people 65 years

old at the beginning of the prediction period, that covers the entire lifespan. We

compare MSM outputs between males and females, for never, former, and current

smokers separately. For current smokers we also compare results given three different

average smoking intensities, i.e., for 10, 30, and 50 cigarettes per day. Furthermore,

for former smokers we also include comparisons for different quitting smoking ages,

i.e., 40, 50, and 60 years old. Current and former smokers are assumed to have

started smoking at the age of 20 years.

For each of these cases, we present the distributions (mean, standard deviation,

quartiles, min and maximum value) of the course of ages at the major states in

the lung cancer, namely the age (T mal) at the onset of the first malignant cell,

37

which also indicates the beginning of the local stage, the age at the beginning of

the regional (T reg) and distant stage (T dist), the age at diagnosis (T diagn), and

the age of death from lung cancer (T death). These age distributions pertain to

people for whom the model predicted development and death from lung cancer. We

indicatively present the distributions for the aforementioned characteristic scenarios,

highlighting the effect of gender and smoking on the development and death of lung

cancer. Lung cancer mortality is depicted using survival curves. In addition, we

report estimates of the probabilities of lung cancer death (Pd) and diagnosis (Pdiagn).

All the results presented in this section are based on sets of 100,000 micro-simulations

for each scenario.

Never smokers

Tables 2.5 and 2.6 compare lung cancer mortality (Pd) and distributions of times to

each of the main lung cancer states of the MSM, developed in this chapter, between

males and females that have never smoked in their lives. According to these tables,

men have higher (almost double) probability of dying from lung cancer (0.218%)

than women (0.120%). Overall, the distributions of the predicted times are very

similar for the two genders, although slightly shifted to earlier ages for woman. That

is to say, for those cases, for which the model predicted death from lung cancer,

all the events of main interest in the lung cancer course, happened in younger ages

for the woman than for the man in the examples. These finding is in agreement

with recent findings on lung cancer incidence and mortality in never smokers (116),

indicating that women are more likely than men to have non−smoking associated

lung cancer. Figure 2.2 confirms the small difference in lung cancer survival between

the two genders.

38

Table 2.5: Male, 65 years old, never smoker.

(Pd = 0.218%)Mean ± SD Min Q1 Median Q3 Max

T mal 66.98 ± 8.50 44.42 60.59 66.93 73.88 85.38T reg 74.49 ± 8.62 50.53 68.14 74.42 81.39 93.44T dist 75.79 ± 8.53 52.31 69.51 75.52 82.59 94.48T diagn 76.76 ± 8.08 51.88 70.58 76.88 83.39 92.26T death 78.48 ± 7.49 65.15 72.47 78.56 84.63 92.77

Table 2.6: Female, 65 years old, never smoker.

(Pd = 0.120%)Mean ± SD Min Q1 Median Q3 Max

T mal 64.67 ± 10.32 37.44 58.48 65.26 72.72 83.55T reg 72.22 ± 10.20 44.78 65.63 72.57 80.63 91.14T dist 73.51 ± 10.33 47.25 66.65 73.83 81.59 92.31T diagn 74.65 ± 10.51 45.26 68.85 75.26 82.54 92.12T death 78.25 ± 7.62 65.15 71.74 78.40 84.52 92.37

Figure 2.2: MSM predicted lung cancer survival for non-smokers, 65 years old.

39

Current smokers

In all the working examples for current smokers we examine different scenarios (de-

pending on the average smoking intensity, i.e., 10, 30 and 50 cigarettes per day) for

a person 65 years old, who started smoking at the age of 20 years old. Table 2.7 and

table 2.8 present the results for a male and a female respectively. As it was expected

(99; 47), we overall observe higher proportions of predicted lung cancer deaths for

males than for females. These proportions also increase with the smoking intensity,

namely the heavier the smoker, the more probable the development and death of

lung cancer is. In addition, the entire course of the lung cancer is shifted towards

earlier ages as the average smoking intensity increases, i.e., the onset of the local,

regional and distant stages as well as the diagnosis and finally the death of lung

cancer, occur in younger ages for heavy smokers.

Table 2.7: Male, 65 years old, current smoker, started smoking at age 20.

Mean ± SD Min Q1 Median Q3 Max

Average smoking intensity: 10 cigarettes per day (Pd = 6.91%).T mal 65.87 ± 9.05 33.96 59.20 65.14 72.21 92.10T reg 73.37 ± 9.08 41.07 66.61 72.65 79.83 98.56T dist 74.71 ± 9.07 42.24 68.01 74.01 80.98 100.80T diagn 75.97 ± 9.12 44.46 69.22 75.33 82.56 99.43T death 78.93 ± 8.40 65.01 71.95 78.28 85.25 99.84



40

No large differences are noted in the time distributions between males and females

of the same smoking intensity group (Tables 2.7 and 2.8).

Plots in figure 2.3 verify the difference in lung cancer survival between the two

genders. In addition, according to these plots, the survival probability dicreases with

increasing average smoking intensity from 10 to 30 cigarettes/day, while it remains

almost unchanged between 30 and 50 cigarettes/day.

Table 2.8: Female, 65 years old, current smoker, start smoking at age 20.

Mean ± SD Min Q1 Median Q3 Max




41

Figure 2.3: MSM predicted lung cancer survival for current smokers, 65 years old.42

Former smokers

In the examples for former smokers, we investigate the effect of smoking intensity

(10, 30, and 50 cigarettes smoked per day on average) and quitting smoking age (40,

50 and 60 years old) on the lung cancer course of a male (tables 2.9 to 2.11) and a

female (tables 2.12 to 2.14) 65 years old who started smoking at the age of 20 years.

As in the case of current smokers, the predicted proportions of lung cancer death are

higher for men compared to women in the same smoking category. This tendency

is verified in several observational studies on lung cancer (99; 47), namely men,

with exactly the same characteristics, are in general more susceptible to lung cancer

than women. Furthermore, we observe a positive correlation between the predicted

probability of death from lung cancer and the duration of smoking. This correlation

is more pronounced in heavier smokers (higher average smoking intensity). Similar

patterns hold for women. No large differences were found in the predicted times to

the main events of interest between males and females with the same characteristics.

Noteworthy is the fact that lung cancer survival for people (men or women) who

smoked for only 20 years in their lives (i.e., start and quit smoking at ages 20 and 40

respectively) is very similar to lung cancer survival for non-smokers. Furthermore,

the negative effect of smoking on lung cancer survival is more prominent for longer

duration of smoking.

Survival plots in figure 2.4 confirms the similarity in the survival curves of former

smokers who started and quit smoking at ages 20 and 40 respectively, with those of

non-smokers. These plots also confirm the effect smoking has on the deterioration of

lung cancer survival, which become more pronounced as the total number of years

of smoking, as well as the average number of cigarettes smoked per day increase.

43

Table 2.9: Male, 65 years old, former smoker, starting and quitting smoking at 20and 40 years old respectively.

Mean ± SD Min Q1 Q2 Q3 Max

Average smoking intensity: 10 cigarettes per day

(Pd = 0.23%)

T mal 63.92 ± 12.47 32.49 57.78 65.11 72.95 84.82

T reg 71.45 ± 12.37 39.94 65.82 72.56 80.55 92.14

T dist 72.76 ± 12.49 40.62 66.47 74.38 81.86 93.17

T diagn 73.56 ± 12.30 44.21 67.43 75.59 82.16 93.54

T death 77.22 ± 8.10 65.08 69.55 76.70 83.85 93.78


(Pd = 0.26%)

T mal 61.23 ± 14.11 30.96 53.80 64.36 71.72 84.44

T reg 68.76 ± 14.14 39.16 62.01 72.13 79.02 92.74

T dist 70.04 ± 14.03 40.43 63.49 73.24 80.93 93.26

T diagn 71.08 ± 14.25 44.22 65.14 74.06 81.68 92.78

T death 76.83 ± 8.15 65.06 69.49 75.85 83.53 93.17


(Pd = 0.27%)

T mal 59.01 ± 14.81 33.45 39.80 61.99 70.86 85.32

T reg 66.51 ± 14.75 39.52 47.74 69.47 78.45 92.67

T dist 67.79 ± 14.80 41.58 49.02 70.95 79.82 94.05

T diagn 68.81 ± 14.88 44.10 51.97 72.72 80.40 92.44

T death 75.74 ± 7.91 65.02 68.53 74.51 81.66 92.99

44

Table 2.10: Male, 65 years old, former smoker, starting and quitting smoking at 20and 50 years old respectively



(Pd = 0.37%)

T mal 56.27 ± 12.93 35.60 46.00 49.98 68.16 84.39

T reg 63.75 ± 12.86 42.85 53.53 58.59 75.43 92.62

T dist 65.10 ± 12.81 44.38 54.96 59.60 76.88 93.01

T diagn 65.66 ± 13.30 44.60 54.37 64.74 77.83 92.42

T death 75.65 ± 6.97 65.07 70.01 75.24 80.08 92.48


(Pd = 0.54%)

T mal 52.38 ± 11.84 32.84 44.76 48.53 59.58 84.69

T reg 59.93 ± 11.93 40.85 52.08 55.98 66.87 92.34

T dist 61.30 ± 11.86 41.40 53.60 57.12 68.30 95.15

T diagn 61.92 ± 12.41 44.43 52.54 57.46 69.80 92.89

T death 74.78 ± 6.52 65.04 69.67 74.41 78.70 93.11


(Pd = 0.70%)

T mal 50.05 ± 10.65 32.26 43.84 47.21 49.86 84.42

T reg 57.75 ± 10.68 40.57 51.40 54.82 58.59 93.35

T dist 58.99 ± 10.68 40.79 52.52 56.33 59.25 95.33

T diagn 59.29 ± 11.28 44.22 51.38 55.80 64.72 91.19

T death 74.17 ± 5.91 65.01 69.40 74.03 78.01 92.26

45

Table 2.11: Male, 65 years old, former smoker, starting and quitting smoking at 20and 60 years old respectively.

Quit smoking at 60 years old



(Pd = 2.04%)

T mal 56.45 ± 5.74 30.93 54.19 56.79 58.72 84.85

T reg 64.00 ± 5.84 38.41 61.56 64.29 66.24 92.80

T dist 65.33 ± 5.78 41.39 62.96 65.57 67.57 94.42

T diagn 66.84 ± 6.21 44.09 64.79 66.89 69.37 92.79

T death 71.48 ± 6.08 65.01 67.23 69.33 73.28 93.08


(Pd = 3.33%)

T mal 55.57 ± 5.45 31.83 53.43 56.20 58.27 84.73

T reg 63.11 ± 5.48 40.94 60.89 63.62 65.73 93.42

T dist 64.46 ± 5.49 41.27 62.26 65.04 67.02 95.02

T diagn 66.04 ± 6.11 44.26 64.38 66.55 69.04 93.36

T death 71.05 ± 5.75 65.00 67.02 69.21 72.68 93.47


(Pd = 3.93%)

T mal 55.07 ± 5.77 33.56 52.85 55.89 58.08 85.28

T reg 62.60 ± 5.80 40.81 60.32 63.34 65.58 91.77

T dist 63.95 ± 5.78 43.24 61.61 64.66 66.94 94.25

T diagn 65.53 ± 6.55 44.13 64.04 66.37 68.87 92.21

T death 71.14 ± 5.78 65.00 67.02 69.19 72.99 92.37

46

Table 2.12: Female, 65 years old, former smoker, starting and quitting smoking at20 and 40 years old respectively



(Pd = 0.17%)

T mal 62.70 ± 12.12 34.02 56.86 63.35 72.07 82.97

T reg 70.25 ± 12.17 40.38 63.76 71.31 79.84 91.54

T dist 71.56 ± 12.20 44.28 64.84 72.03 81.12 92.89

T diagn 72.23 ± 12.27 44.40 66.72 73.99 82.43 92.56

T death 76.93 ± 7.85 65.09 69.67 75.81 84.00 93.45


(Pd = 0.19%)

T mal 60.38 ± 13.62 33.13 53.55 62.48 71.32 85.05

T reg 67.93 ± 13.52 39.81 60.69 69.60 78.71 91.77

T dist 69.21 ± 13.45 42.34 62.65 70.98 80.07 93.86

T diagn 70.42 ± 13.57 44.33 64.65 72.39 81.60 91.65

T death 76.23 ± 7.92 65.16 69.34 74.61 83.36 92.25


(Pd = 0.24%)

T mal 55.47 ± 15.15 31.80 38.86 58.37 69.26 82.28

T reg 63.10 ± 14.99 40.25 46.75 65.66 76.62 89.58

T dist 64.44 ± 15.01 41.03 48.12 67.09 77.73 90.86

T diagn 65.51 ± 15.37 44.12 48.10 68.82 78.87 91.66

T death 74.28 ± 7.45 65.04 67.78 72.33 80.72 91.91

47

Table 2.13: Female, 65 years old, former smoker, starting and quiting smoking at 20and 50 years old respectively.



(Pd = 0.24%)

T mal 56.67 ± 12.29 32.71 46.66 54.35 66.48 85.00

T reg 64.28 ± 12.24 40.34 54.11 61.94 73.96 92.43

T dist 65.67 ± 12.24 41.13 55.42 63.54 75.32 94.96

T diagn 66.21 ± 12.82 44.24 54.70 65.43 76.50 92.51

T death 75.72 ± 6.82 65.11 70.37 75.19 79.83 92.61


(Pd = 0.37%)

T mal 52.59 ± 12.07 32.50 44.02 48.34 61.33 85.63

T reg 60.10 ± 12.02 40.52 51.48 55.92 68.72 93.58

T dist 61.45 ± 12.08 40.93 53.02 57.03 70.03 94.06

T diagn 61.97 ± 12.69 44.09 52.13 57.28 72.62 93.68

T death 74.43 ± 6.51 65.00 68.81 74.06 78.27 93.68


(Pd = 0.58%)

T mal 50.30 ± 11.44 34.03 42.40 47.11 54.07 84.17

T reg 57.88 ± 11.45 40.30 49.91 54.73 61.67 92.65

T dist 59.14 ± 11.44 42.00 51.28 56.00 63.38 93.55

T diagn 59.68 ± 12.12 44.12 50.60 55.71 65.86 93.61

T death 73.65 ± 6.17 65.03 68.67 73.24 77.43 93.72

48

Table 2.14: Female, 65 years old, former smoker, starting and quitting smokingsmoking at 20 and 60 years old respectively



(Pd = 1.01%)

T mal 56.64 ± 6.41 35.88 54.16 56.70 58.84 85.77

T reg 64.14 ± 6.39 43.36 61.33 64.21 66.39 91.84

T dist 65.49 ± 6.42 44.03 62.83 65.53 67.63 93.95

T diagn 66.85 ± 6.90 44.46 64.59 66.89 69.59 92.70

T death 71.63 ± 6.11 65.00 67.28 69.60 73.80 93.24


(Pd = 1.94%)

T mal 55.09 ± 6.13 32.88 53.07 55.93 58.24 83.79

T reg 62.61 ± 6.13 41.27 60.30 63.37 65.69 91.65

T dist 63.92 ± 6.12 42.08 61.81 64.76 66.99 92.46

T diagn 65.44 ± 6.89 44.15 63.58 66.45 68.90 92.70

T death 71.27 ± 5.60 65.01 67.34 69.49 73.30 93.07


(Pd = 2.49%)

T mal 54.38 ± 6.33 29.45 51.98 55.50 57.86 83.55

T reg 61.98 ± 6.38 36.55 59.36 63.06 65.40 93.28

T dist 63.29 ± 6.34 37.82 60.85 64.36 66.71 93.38

T diagn 64.74 ± 7.13 44.14 62.34 65.98 68.62 91.61

T death 71.34 ± 5.79 65.00 67.01 69.50 73.57 92.41

49

Figure 2.4: MSM predicted lung cancer survival for former smokers, 65 years old,starting smoking at age 20.

50

2.4 Discussion

The main purpose of the natural history MSM for lung cancer, that we developed in

this thesis, was for this model to serve as a tool to the exploration of the statistical

properties of micro-simulation models in general. To this end we developed a sim-

plified yet valid model that follows current practices in micro-simulation modeling

while at the same time adequately describes the natural history of the disease.

The MSM aimed at combining some of the best practices currently followed in this

domain, while it remains simple enough to serve as an efficient tool for the exploration

of the statistical properties of this type of models. It is a continuous time MSM,

namely the events can take place at any time point. Depending on the degree of

discretization, this assumption is sometimes more reasonable than the very restrictive

one of fixed time lengths imposed by a discrete time MSM .

Furthermore it combines some of the most widely used models for the description

of several distinctive stages of the natural history of lung cancer, including both

biological and epidemiological models. More specifically, it uses the biological Two

Stage Clonal Expansion (TSCE) model (75) to describe the risk for the onset of

the first malignant cell. In particular, the model employs the exact solutions for

the expression of the hazard rates and the survival probabilities. Moolgavkar and

Luebeck (74) comment on the inaccuracy of the approximations that can lead to

serious data misinterpretation and they emphasize on the need to use the exact

solutions instead.

Moreover, the model employs the Gompertz function to simulate the tumor growth.

Several studies have shown that this distribution fits available data well, hence it is

preferable for simulating this process compared to other distributions found in the

51

literature (e.g., exponential). Finally the model breaks down the time from the local

stage to death (lag time) into three time sub-intervals (local to regional, regional to

distant, distant to death) instead of just assuming, e.g. fixed or Gamma distributed

lag time (40; 72). This approach enables a more detailed representation of the natural

course of lung cancer, hence a more accurate prediction of the times to the events of

interest.

However, perhaps the most attractive feature of this MSM is that it is developed in

R. The development in an open source, statistical package enhances the transparency

of the model and facilitates research on the statistical properties of MSMs in general.

Using ad-hoc estimates for the model parameters (as described in section 4), we make

predictions for hypothetical scenarios by running multiple micro-simulations for each

case. The results seem plausible compared to what was expected based on relevant

studies and reports about lung cancer. We note, for example, higher lung cancer

mortality in men compared to women, as well as positive correlation of the negative

smoking effects, on the course of lung cancer, with the total smoking duration and

intensity.

These examples are only provided as an indication that the model performs rea-

sonably well. Large deviations from the truth are attributed to inadequacy of the

ad-hoc values for the MSM parameters to reproduce real numbers. Thorough cali-

bration exercises are necessary to achieve proximity between MSM predictions and

real data. This is one of the main objectives of the next chapter, where we will

perform a thorough calibration and validation of this MSM using real data.

Some of the most serious limitations of this MSM are that it does not involve any

screening or treatment component, as well as it does not take into account the

detection of benign lesions. Moreover, the complexity of the model was kept to

a minimum because the main objective of this chapter was to develop a streamlined

52

MSM that sufficiently describes the natural history of lung while it can serve as a

handy tool for the exploration of the statistical properties of MSMs in general.

Improvement of the MSM with respect to those limitations were beyond the scope

of this thesis. However, working examples demonstrate potentiality of the model to

be used in real life scenarios. In this perspective future work includes enhancement

of the MSM performance by increasing the level of complexity and incorporating

additional components in it, e.g., screening and treatment components. Another

immediate goal is to refine the R code and publish it in the form of a library into the

CRAN package repository of the open source statistical software, R. This will enhance

the transparency of the model, and will give the opportunity to many potential users

to use it, either a a tool for further research and development of statistical methods

related to micro-simulation models, or in order to simulate entire populations or sub-

groups of patients and assist, for instance, decision making in lung cancer research.

53

Chapter 3

Calibration methods in MSMs - a comparative anal-

ysis

The second chapter of this thesis pertains to the calibration of micro-simulation mod-

els. The main goal is to provide a comparative analysis of two different approaches to

this problem, a Bayesian and an Empirical one. The Bayesian calibration adapts the

methodology described in Rutter et al. (90). The Empirical method aims at com-

bining broadly applied practices for empirically calibrating MSM parameters (92).

Both methods are implemented for the calibration of the streamlined MSM for the

natural history of lung cancer, developed in the previous chapter. The entire proce-

dure is conducted using the open source statistical software R.3.0.1. The comparative

analysis comprises graphical, qualitative and quantitative discrepancy measures of

the results the two methods produce. This is a first attempt of a thorough compar-

ison between two calibration methods in the context of MSMing, with focus on the

statistical aspects of these procedures. The chapter results in suggestions about the

best method, under certain circumstances, based on the overall assessment of the

calibration results according to these measures.

The chapter begins with the description of the two calibration methods that will

be implemented. It continues with detailed discussion on the serious computational

54

restrictions related to the implementation of the calibration of the MSM in R. Em-

phasis is put on the need for HPC techniques to deal with the particularities of the

involved code. It follows the description of the simulation study design conducted

for the purposes of the comparative analysis, along with detailed results from this

analysis. The chapter concludes with some general remarks about the performance

of the two calibration methods with respect to both, MSM validation as well as the

computational requirements and restrictions imposed be each method. We comment

on the advantages and disadvantages of the two methods, and we refer to future

work related to this chapter.

3.1 Background

3.1.1 Calibration vs estimation in statistical theory

Calibration pertains to the specification of model parameters to fit observed quan-

tities of interest. The term has many instances in the statistical literature and is

closely related to the development of stochastic predictive models. Calibration is also

used in the context of fitting complex deterministic mathematical models Kennedy

and O’Hagan (2001); Campbell (2006). The terms ”calibration”, ”estimation” and

”model fitting”, are often used interchangeably in the modeling literature Vanni

et al. (2011). In the context of ordinary statistical modeling (e.g. generalized linear

models), calibration is considered an ”inverse prediction” problem. Simply stated,

the question for a new value of the response variable is what set of values for the

predictor variables in the model could result in the quantity of interest with high

probability. Moreover, this parameter model specification usually refers to point es-

timation rather than distribution characterization of the model parameters. In this

thesis, we consider calibration as a “model tuning” procedure aiming at specifying

those sets of model parameter values which, when used as model inputs, can pre-

55

dict with a desired amount of accuracy the pre-specified target summaries from the

available data.

In the context of the specification of MSM parameters, calibration seems more rel-

evant than estimation. This is because in micro-simulation modeling it is possible

more than one set of parameter values to reproduce results close to the observed

quantities of interest. In addition some of the model parameters represent latent

variables (i.e., unobserved quantities); hence, model identification problems may

arise. Therefore, purely analytical estimation procedures aimed at finding the opti-

mal set of parameter values that fit best the observed data (e.g. MLE) are not useful

in the specific problem. Identifying the optimal set of parameter values instead is

preferable since these sets can provide an idea of the underlying correlation structure

of the model parameters. In addition, those sets can be used in order to capture and

express the model parameter uncertainty in the produced outputs.

According to Vanni et al. (2011) the goal of a calibration process is manifold and in-

cludes specification of unobserved/unobservable model parameters, parameters that

are observed with some level of uncertainty, correlation among the model parameters

(both observable and unobservable) as well as approximation of the joint distribution

of the model parameters. This last goal can be achieved only if the result from the

calibration process is more than one combination of parameter values. The set of

all plausible combinations of values can be used as an approximation of both the

marginal as well as the underlying joint distributions of the MSM’s parameters and

outputs. This result of the calibration procedure is extremely useful in the context

of MSMing. Unlike typical statistical models, e.g., generalized linear models, where

the output variable is directly expressed as a function of the model’s parameters and

covariates, usually in MSMs there is no closed form expression of the relation among

the model input, output, and parameters. On the contrary, it is very difficult to

identify and quantify the correlation mechanisms that govern the model’s structure,

56

because of the complicated relationships dominating the process described by the

MSM. This complexity also often give rise to identifiability problems.

3.1.2 Calibration methods for MSMs

Vanni et al. (2011) provide a systematic overview of the calibration procedure that

should be followed in the development of economic evaluation mathematical models

in general. According to this paper, the calibration procedure comprises decisions on

seven essential steps, i.e., decision on the model parameters to calibrate, calibration

targets, goodness of fit (GOF) measure(s), search strategy among the range of possi-

ble parameter values, convergence criteria, stopping rule for the calibration process

as well as integration, presentation and use of the model calibration results.

Several methods have been proposed in the literature specifically for the calibration

of MSMs in medical decision making. Stout et al. (106) classify model parame-

ter estimation methods currently used in cancer simulation models in two broad

categories: the purely analytical methods and the calibration methods. This clas-

sification is also relevant in the context of micro-simulation modeling calibration.

Purely analytic methods refer to direct estimation of the model parameters (e.g.

MLE(3; 22; 87; 75)) without reference to model fit. On the contrary, calibration meth-

ods result in model parameters based on an efficient search of the parameter space

and can be further categorized into undirected and directed methods. Undirected

methods involve an exhaustive grid search (65; 59) of the parameter space or grid

search using some sampling design (e.g., random sampling (23; 4; 115; 50), Latin Hy-

percube Sampling (LHS)(49; 5; 69; 95), etc.). Directed methods, on the other hand,

aim at finding the optimum set of parameter values using, for example, the Nelder-

Mead(77; 11; 18) or some other optimization algorithm (118; 107; 53; 52). In addition to

the two aforementioned broad calibration categories Bayesian(90; 117; 14) calibration

methods are also often used in micro-simulation modeling.

57

We could further split the various calibration methods of complex models into em-

pirical and theoretical. The characteristics that actually differentiate these two cat-

egories lie on the nature of the searching strategy, the convergence criteria, the stop-

ping rule, as well as on the interpretation of the produced results. In an empirical

method, for instance, the searching strategy usually involves some sort of random

sampling within the multivariate parameter space, the convergence criteria and the

stopping rules are usually arbitrary (sometimes even based on convenience), while

the interpretation of the results (set(s) of values for the calibrated model parame-

ters) is often abstruse. Theoretical methods, on the other hand, involve structured

searching strategies and stopping rules (e.g., optimization algorithms, Gibbs sampler,

etc.), while the interpretation of the results is easier and based on sound theoretical

background (e.g., joint posterior distribution of calibrated parameters in Bayesian

calibration).

3.1.3 Assessing calibration results

Calibration methods aim at resulting in models that fit well observed data. Hence-

forth the evaluation of a calibration method is closely related to model’s validation.

There are several means, qualitative and quantitative, to assess the performance of

a predictive model. Within the scope of MSMing, to our knowledge, no systematic

work has been carried out yet on the assessment of a calibrated model. Usually

the performance of a calibrated MSM is evaluated only with plots that compare the

MSM predictions with the respective observed data (4; 94; 23). In such situations,

the conclusions about the quality and adequacy of the MSM are arbitrary and en-

tail a certain amount of subjectivity. Plots should be rather used in conjunction

with measures (GoF statistics) that quantify this deviation of the MSM outputs

from the observed quantities of interest. The most popular among the quantitative

measures applied for MSM validation is the chi-square GoF statistic. Bayesian cal-

58

ibration methods provide additional means for assessing the overall performance of

a calibrated MSM, i.e. comparison of the observed quantities of interest (calibra-

tion targets) with the corresponding posterior predictive distributions. Other GOF

measures employing, e.g., profile likelihoods etc, are also suggested in the literature

(18).

59

3.2 Methods

3.2.1 Notation

In this section, we introduce some notation that will be used throughout the remain-

ing of this document.

M(θ) : micro-simulation model

θ = [θ1, θ2, . . . , θK ]T : vector of model parameters

Z = [Za, Zg, Zs, Zd]T : vector of covariates (baseline characteristics) with age (Za),

gender (Zg), smoking status (Zs), and smoking intensity (Zd),

as average number of cigarettes smoked per day

Y = [Y1, Y2, ..., YJ ] : vector of data, i.e. summary statistics found in the literature

that describe quantities of interest in the natural history model

π(θ) : joint prior distribution of θ

π(θk) : prior distribution of θk

h(θ|Y,Z) : joint posterior distribution of θ

f(Y|g(θ),Z) : data distribution. This distribution depends on a function g(·)

of the model parameters θ and the model covariates.1

h(θk|θ(−k),Y,Z) : full conditional for parameter θk given θ(−k),Y and Z

θ(−k) : the θ vector excluding the θk component (k = 1, 2, ...K

and K= total number of MSM parameters)

Mm(θ,SN) : MSM predictions after running the model m times in total on

the input sample (SN) of size N, given θ

1Note: the functional form of g(·) is unknown and hard to specify.

60

3.2.2 Bayesian Calibration Method

The first method employs Bayesian reasoning to the calibration of MSMs. The goal

is to use a sound way to incorporate both prior beliefs about the MSM parameters,

and observed data found in the literature of lung cancer natural history, in the

MSM calibration procedure. To this end we apply the Bayesian calibration method

described in detail in Rutter et al. (90), aimed at drawing values from the joint

posterior distribution h(θ|Y ) of the model parameters. This method essentially

involves a sufficiently large number of Gibbs sampler iterations that result in draws

from the full conditional distributions h(θk|Y, Z). Due to the model’s complexity

the algorithm also involves embedded approximate Metropolis-Hastings (MH) steps

within each Gibbs sampler iteration in order to draw from the unknown forms of the

full conditional distributions.

In particular, within each Gibbs sampler step, we implement multiple iterations of a

random-walk Metropolis-Hastings algorithm. Given a symmetric jumping distribu-

tion the MH-algorithm accepts a new value for θ∗i with transition probability:

a(θk, θ∗k) =

min(ri(θk, θ∗k), 1) if πk(θi)

∏Jj=1 fj(yj|g(θ)) > 0

1 if πk(θi)∏J

j=1 fj(yj|g(θ)) = 0(3.1)

Assuming that the micro-simulation model (M(θ)) and the data distributions f(Y , g(θ))

are correctly specified, we use M(θ) to simulate M draws from fj(Yj, gj(θ)), where j

indicates the j-th covariate class. We use the maximum-likelihood estimation (MLE)

to estimate the data distribution parameter (e.g., for Binomial and Poisson counts

the estimate of gj(θ) is the average: gj(θ) = 1M

∑Mi=1 Yij). We then use these esti-

61

mates to calculate the transition probability function α(θ,θ∗) based on:

rk(θk, θ∗k) =

πk(θ∗k)∏J

j=1 fj(yj|gj(θ∗k,θ(−k)))

πk(θk)∏J

j=1 fj(yj|gj(θ))(3.2)

The Bayesian calibration method results in a V×K matrix of calibrated values, de-

noted as ΘBayes, the rows of which represent a random sample from the joint posterior

distribution h(θ|Y ) of the MSM parameters. This sample is used to provide esti-

mates about both the posterior distributions of the calibrated MSM parameters, as

well as the posterior predictive distributions of the quantities of interest.

3.2.3 Empirical Calibration Method

Several empirical calibration methods for micro-simulation models have been sug-

gested in the literature (section 3.1.2). Most of them comprise some type of sampling

for searching the multidimensional parameter space, stipulation of some proximity

measure between observed and predicted quantities of interest and selection of a set

of parameter vectors satisfying pre-specified convergence criteria. In many cases the

result from an empirical calibration procedure is a set rather than a single parameter

vector for the calibrated model.

For the development of our generic empirical calibration method we focus on two key

elements of the procedure, i.e., the searching algorithm and the convergence criteria

involved. In this section we describe how we combine popular practices, found in the

literature of MSMing, so as to create a generic Empirical calibration method that

will be compared with the Bayesian method previously described.

When the model dimensionality permits, it is possible to use extensive grid search

algorithms to search the parameter space (65). For models comprising many param-

eters (as is usually the case in micro-simulation modeling) random sampling rather

62

than extensive grid search algorithms is preferred for searching the parameter space

(23; 4; 115; 50). Alternatively, another more efficient sampling scheme can be used

in order to sample from the multidimensional parameter space, namely the Latin

Hypercube Sampling (LHS) (69; 104).

63

The LHS method was introduced by McKay et al. (69) as an extremely efficient

sampling scheme that outperforms the simple random and the stratified sample.

LHS and its variations (16) increases the realization efficiency of the algorithm while

preventing the introduction of bias and reducing the effect of extreme values in the

resulting estimates. Another very attractive feature of the LHS is that it allows

for characterizing the uncertainty and conducting sensitivity analysis of complex

deterministic or stochastic models

Application of the method is met in several instances of model calibration in medical

research (49). In Blower and Dowlatabadi (5) we find an application of the LHS

on a deterministic complex model as a technique to explore the uncertainty of the

parameter values on the predicted outcomes. Another very interesting application of

the LHS is found in Cronin et al. (13) where this method is used in conjuction with

a response surface analysis as an efficient way to explore the parameter space and

investigate the relationship between the parameter values and the respective model

outputs.

The second very important feature of the calibration procedure we focus on is the

specification of the convergence criteria to identify acceptable parameter sets. The

most commonly used discrepancy measures in the context of calibrating complex

models are χ2 and likelihood statistics (115). These two are also the most typi-

cal measures used for the overall assessment of the calibrated model fit. However,

noteworthy is the fact that in many instances of empirically calibrated complex

models, the assessment of the overall model fit is completely arbitrary and based

solely on graphical comparisons between observed and predicted quantities of inter-

est (94; 23; 4).

64

Latin Hypercube Sample

Before continuing to the description of the Empirical Calibration method, we first

discuss the particularities of the Latin Hypercube Sampling (LHS) design.

Let θk ∈ Rk, where Rk is the range of plausible values for θk. We divide the Rk into

N equiprobable (according to the pre-specified distribution we assume for each θk)

intervals, and we assign the integers 1 through N to each one of them. We create

a sequence of K vectors each of which is a random permutation of the 1, 2, ..., N

integers. For each θk we randomly draw a value from the indexed interval according to

the K vectors of random permutations previously created. Alternatively the middle

point of each interval could be used. The result of this procedure is an MN×K matrix

with columns the k vectors of random values for each of the model parameters. The

mik element of this matrix corresponds to the value extracted from the ith indexed

interval of the θk variable. The ith row of the matrix is a sample point of values from

the parameters space. The MN×K matrix is the Latin Hypercube Sample extracted

from a single replication of this sampling design.

In the Empirical calibration method we implement the LHS design as a more effi-

cient searching algorithm of the multi-dimensional parameter space than the simple

random sampling. The goal from a single implementation of this design is to collect

a sample of NLHS values for K parameters (where NLHS is the size of the LHS de-

sign). To this end, the range of each parameter is divided into NLHS equiprobable

(according to the pre-specified underlying distribution) intervals. For each parameter

we create a different permutation of the NLHS intervals, and we subsequently draw

a value from each corresponding interval, following the underlying distribution. In

particular, we utilize the ’maximinLHS ’ R function (lhs library), aimed to optimize

the collected sample by maximizing the distance between the design points. The set

of NLHS points (i.e.,vectors of parameter values) is the sample extracted from the

65

multivariate parameter space using the LHS design.

Figure 3.1: Single implementation of LHS of size NLHS=5 for extracting values froma 2-dimensional parameter space (θ1 and θ2).

Figure 3.2: Single implementation of LHS of size NLHS=5 for extracting values froma 2-dimensional parameter space (θ1 and θ2).

Figures 3.1 and 3.2 present examples of the application of the LHS design in two66

dimensions. In each of these examples the LHS is used to extract a sample from the

bivariate space stipulated by two of the MSM parameters to be calibrated, i.e., θ1=m

∈ [0.00001, 0.0016], and θ2=mdiagn ∈ [0.0001, 8]. The grid indicates the partition

of the bivariate space based in equiprobable marginal intervals for each parameter.

The dots in each graph represent the set of points of the latin hypercube sample.

The figure depicts four samples for different sizes (NLHS=5, and 20) and different

extracted points from the individual intervals (center vs random).

A limitation of the LHS is that the single implementation of this design can only

result in a restricted number (NLHS) of vectors for the parameter values, hence

rendering it inefficient for searching of the multi-dimensional parameter space of

an MSM. To overcome this obstacle we suggest the recurrent implementation of

the aforementioned design to collect a large enough sample for the purposes of the

calibration procedure.

Description

The second method combines some basic concepts of empirical calibration procedures

found in the literature of MSMs, which are based on random search of the parameter

space. It further suggests the adoption of the LHS design as a more efficient tool

for searching the multi-dimensional parameter space. In particular, this empirical

method implements the LHS design multiple times to extract a large number of sets

of parameter values. This sample is then checked for ”acceptable” sets, i.e., for sets of

parameter values that produce model outputs close to the observed ones. The goal,

is with this method to eventually collect a sample representative of the underlying

population of all the ”acceptable”, according to some convergence criteria, sets of

parameter values.

Let Y ∼ f(Y|Λ) the data of interest (calibration data). We implement the LHS

design L times in total. Since each repetition of the LHS provides NLHS sets of

67

parameter values (where NLHS is the size of the LHS design), this empirical cali-

bration method essentially results in Nemp = NLHS × L sets of parameter values in

total. For each set of parameter values we run the MSM model M(θ) a sufficient

number of times, M, and we calculate estimates of the data distribution parameters

gj(θ) = 1M

∑Mm=1 Ymj (as in the Bayesian calibration method). Given these estimates

we calculate the log-likelihood as:

l(g(θ)|Y ) =J∑j=1

lj(g(θj)|Yj)

We want to check the null hypothesis Ho : Λ = Λ0, where Λ0 is the vector with the

calibration targets versus the alternative H1 : Λ 6= Λ0. For this check we use the

deviance statistic:

D = −2[l(g(θ)|Y )− l(Λs|Y )

]= −2

J∑j=1

[l(g(θj)|yj)− l(λsj|yj)

](3.3)

where l(Λs|Y ) is the likelihood of the saturated model.

Under H0 the deviance statistic D follows a chi-square distribution with ν degrees

of freedom, one for each tested mean in the calibration target vector. Among the

sets of θ values for which H0 is not rejected, we randomly draw V (to match the

Bayesian procedure) vectors of parameter values. Hence, the result of the empirical

calibration method is again a V×K of calibrated values, denoted as ΘEmp, the rows

of which represent a random sample from the population of all ”acceptable” sets

of parameter values according to the pre-specified convergence criterion (e.g., here

the population of parameter values resulting in the higher log-likelihood given the

calibration data). These calibrated values can be used in a way analogous to the one

suggested for the Bayesian calibration results, in order to provide estimates of the

empirical distributions of the calibrated MSM parameters, as well as the empirical

distributions of the predicted quantities of interest.

68

3.2.4 Calibration outputs: interpretation and use

An important aspect of the calibration of a MSM is what the anticipated outputs

of this procedure should be. To answer this question we have to consider both the

conceptual aspect of the problem as well as real life practice. In the comparative

analysis, presented here, we suggest the results from both methods to be a collection

of parameter vectors rather than simple point estimate for each MSM parameter.

This type of calibration output is preferable, especially in the case of complex MSMs

for several reasons.

First of all, the nature of the problem itself dictates this form of calibration output.

As already mentioned (section 3.1.1) MSM’s complexity renders the parameter spec-

ification to be a calibration rather than a point estimation problem. It is possible for

more than one set of parameter values to produce equivalent outputs, i.e., predictions

”close” to what has been observed. Therefore, we rather wish to collect a sample of

these equivalent sets rather than finding the set that maximizes some convergence

criterion. Second, the matrix with the calibrated values can reveal interesting rela-

tionships between the MSM parameters usually representing unobservable (latent)

variables. Understanding these relationships may also be useful for the improvement

of the model’s structure, in order for the MSM to better describe the underlying

process and, therefore, to enhance the predictive ability of the model. Third, by

using a matrix of calibrated values rather than point estimates of the MSM parame-

ters, we are able to capture a major source of MSM uncertainty, i.e., the parameter

uncertainty, and convey the effect it has on the final results.

The Bayesian method results in the ΘBayes matrix of calibrated values representing

a sample from the joint posterior distribution of the MSM parameters given the data

(calibration targets). The Empirical method results in the ΘEmp matrix, essentially

comprising a sample of vectors from the joint distribution of the ”acceptable” param-

69

eter values, namely those fulfilling the convergence criteria. In both cases the matrix

of calibrated parameter values can be used in order to fulfill the aforementioned

purposes of presenting the MSM characteristics (joint and marginal distribution of

the model parameters) as well as model predictions of the quantities of interest. In

particular, for each one of these sets of parameter values (i.e., for each row of the

ΘBayes or ΘEmp matrix) we run the model M times and we summarize the results

in order to estimate the quantities of interest, given a specific input sample SN . We

denote Y = MM(Θ,SN) the predictions from a calibrated MSM with Θ matrix of

values for the calibrated MSM parameters and input sample SN . Averages, medians,

etc, can be used as point estimates, while measures of variability such as variance,

interquartile range etc, provide an indication of the model uncertainty including,

sampling variability, and parameter uncertainty.

70

3.3 High Performance Computing in R

3.3.1 Software for MSMs

There is a wide range of programming languages for the development of MSMs.

Kopec et al. (2010), in their comprehensive review about the quality of MSMs used

in Medicine, provide a list of programming languages and existing toolkits currently

used for the implementation of MSMs. Java, C #, C++ are very popular languages

for the development of such models. Other toolkits, such as TreeAge, are also met

in the MSM bibliography. There are also some MSMs (MicMac(24), JAMSIM(66))

that embed the R statistical programming language, only, though, to provide the

user with the enhanced statistical and graphical capabilities of the R package for

post-simulation processing. This means that the R software is only involved in the

analysis of the MSM outputs rather than the actually micro-simulations.

For reasons explained here, the streamlined MSM for the natural history of lung

cancer is written in R. To our knowledge, this is the very first attempt to develop

and implement any MSM exclusively in R. The R open source statistical software is

being widely used by many statisticians from the entire statistical spectrum. The

implementation of an MSM in R, not only allows the wide use of this new, very

attractive technology in medical decision making, even by people not very familiar

with this field, but it also enhances the transparency of the model and, facilitates

the research and development of statistical methods related to this technology.

The release of the code, e.g., in the form of a special library in the open source

statistical software R, is a feature very attractive, especially to model developers, who

can actually read the codes, and thus understand the particularities and compare

the structure of similar MSMs. Researchers, on the other hand, unfamiliar with

the technical details of an MSM, who intend to use the model as a tool in medical

71

decision making, e.g., to simulate and make predictions for large cohorts, are highly

interested in being able to simulate and compare different scenarios. This is another,

perhaps more powerful aspect of model’s transparency related to the release of the

freely available relevant source code.

The streamlined MSM, that describes the natural history of lung cancer, can provide

a handy tool for exploring the statistical properties of MSMs in general.

Although very exciting and attractive the idea of writing an MSM in R, the im-

plementation can prove to be a daunting task. Even the term ”micro-simulation”

modeling predisposes for extensive computations and rather time consuming pro-

cesses. A simple implementation of the model, e.g., to make predictions for a single

person or even for a relatively small sample of persons (as in the case of the tables

presented in the first chapter) although not instantaneous, is definitely a feasible

and relatively easy task to carry out. However, the development of such a model

from scratch requires, among others, calibration and overall assessment of the model

(goodness of fit tests, validation, etc.), namely processes that can prove hard to

design and implement and extremely time consuming to run. In the following para-

graphs we attempt to give an idea of what the computational burden, in terms of

the required running times, for such processes can be. To this end, we provide as an

example our experience from the implementation of the two calibration methods for

the comparative analysis, described in this chapter.

3.3.2 Example: computational burden of two MSM calibra-

tion methods

The objective of the second chapter is the calibration of the streamlined MSM for

the natural history of lung cancer, with two different methods, a Bayesian and an

Empirical one. Trying to keep this problem as simple as possible, we focus our

72

interest on only four MSM parameters. As outlined in the description of the two

methods section, each calibration procedure aims at the identification of the most

suitable among a total of 100,000 vectors of parameter values for each.

The Empirical calibration method entails the simultaneous check of the values in

a candidate parameter vector. In our case, the whole procedure involves testing

of 100,000 vectors of parameter values in total. In addition, each vector drawn

from the multi-dimensional parameter space is totally independent from the others.

The Bayesian calibration method is a little bit more complicated in that it requires

sequential check of parameter values. That is, each parameter chain update depends

on the suggested parameter values in the previous step. In our case, the Bayesian

calibration method entails testing of 4*100,000 parameter values in total. Therefore,

the architecture of the Bayesian calibration method allows parallelization of the

process to a much more restricted extent.

As described in the simulation study section, in our example, each calibration update

entails the implementation of the micro-simulation model M=10 times on a sample of

N=5000 people. That is, checking one combination of parameter values (Empirical

calibration) or one parameter value (Bayesian calibration) requires 50,000 micro-

simulations in total. Hence, for 100,000 updates of all (four) MSM parameters, we

need 50,000·100,000=5·109 and 50,000·100,000·4=2·1010 micro-simulations for the

Empirical and the Bayesian calibration method respectively. Given these numbers

we realize how time consuming the implementation of just a single calibration method

can be, let alone a comparative analysis between two of them.

These numbers ascertain that the calibration of an MSM falls into the ”embarrass-

ingly parallel” category of computational problems (89), meaning that the entire task

can be split into numerous, completely independent, repeated computations, each of

which can be executed by a separate processor in parallel. Hence, instead of ”end-

73

less” running times, an ”embarrassingly parallel” procedure, as the calibration of an

MSM, can be approached using high performance computing (HPC) techniques, and

run within plausible times. A closer look at table 3.1, that presents the required

times to run M·N micro-simulations under different settings, verifies the fact that in

the absence of HPC the calibration of an MSM is simply impossible.

3.3.3 Parallel Computing

In order to overcome the time limitations posed by the extensive computations in-

volved in calibrating an MSM, we harness the idea of parallel computing. This can

be achieved by distributing the independent computations simultaneously to multi-

ple computer clusters (nodes) that we have set up for this purpose. These clusters

may comprise only a single machine with one or more processors, or even multiple

machines connected by a communications network. Hence we distinguish between

two major types of parallelization; the implicit and the explicit one, depending on

the composition of the computer clusters used. It is crucial to decide upon the

type of parallelization (available in R) to work with, so as to maximize the benefit

from using advanced techniques of high performance computing developed for this

statistical software.

Tierney (2008) describes the notions of the implicit and explicit parallel computing

within the R context. According to this paper, implicit parallelization pertains basi-

cally to exploiting multiple processors of one machine as well as internal R functions

to speed up calculations (e.g., vectorized arithmetic operations, ’apply-’like functions,

etc.). It essentially takes advantage of the parallelism inherent in the program. This

method does not require any special intervention (set up) from the user, hence it is

much easier to implement and can prove very beneficial especially for large vectors

(e.g., n>2000). Nevertheless, as it can be seen from the example provided in table

3.1, the implicit parallelization can only provide the researcher with a limited ability

74

for improving the efficiency of an R program and is definitely not the solution to the

extremely time consuming algorithms problem of the MSM calibration process.

Explicit parallelization on the other hand, provides the user with the ability to set

up computer clusters (multiple computers with multiple processors each) so as to

distribute the independent computations to a wider range of resources in parallel.

Hence, explicit parallelization can substantially improve the efficiency of algorithms

involving ”embarrassingly parallel computations”, as in the case of calibrating an

MSM. This type of parallelization requires more work and certain amount of com-

puter science knowledge to set up the cluster and distribute the algorithm accord-

ingly.

After having improved the required time for one micro-simulation, by using the

most efficient (to the extend of our knowledge) built-in R functions, the next step

is to take advantage of high performance computing techniques so as to carry out

the calibration computations within realistic time intervals. Schmidberger et al.

(2009) provide a comprehensive account of R packages with advanced techiniques for

performing parallel computing in R. According to this paper, the two R packages that

stand out as better serving the implementation of parallel computing on computer

clusters, are ’snow ’ and ’Rmpi ’.

For the purposes of the comparative analysis we will be mainly using the ’snow ’

library to set up computer clusters using the Message-Passing-Interface (MPI) low-

level communication mechanism. This R library has intermediate and high level

functions for parallel computing. For the calibration purposes we make use of the

high level ones, which are basically parallel versions of the ’apply ’ like R-functions.

By using the possibilities the R ’snow ’ package gives for parallel computing, we can

overcome the R single-threaded nature and spread the computational burden across

multiple machines and CPUs (McCallum and Weston (2012)). Information about

75

the ’snow ’ built-in functions can be found in the relevant R documentation for this

package, while some examples for the implementation of parallel computing R using

the snow package can be found in Tierney (2008), and Matloff (2013).

3.3.4 Code architecture

Another very important decision to be made, regarding the problem of improving

the efficiency of the calibration methods, is which chunk of the code should be paral-

lelized. Obviously, the sequential nature of the Bayesian calibration method, leaves

a much smaller scope for parallelization than the Empirical one with the random,

undirected search approach (independent draws of values from the multidimensional

parameter space). This is also the case when comparing the efficiency of undirected

with any directed search method, due to the sequential nature of the later one, since

each step in a directed method depends on the result from the preceding one.

The sequential nature of the Bayesian method drives the decision, for more efficient

results, to perform in parallel the M·N=50,000 micro-simulations involved in each

parameter update. The independent draws, on the other hand, of vectors from the

multi-dimensional parameter space, involved in the Empirical calibration, allow for

greater extent of parallelization, that is only restricted by the size of the induced ta-

bles in comparison to the respective memory limits 2. In our case, we take advantage

of the architecture of the Empirical calibration method to further parallelize the test-

ing of 20 parameter vectors, i.e., M·N·NLHS=50·1000·20=106 micro-simulations in to-

tal (where NLHS is the size of the Latin Hypercube Sampling). Thereafter, in order

to test 100,000 parameter vectors, we need to repeat this procedure 100,000/20=5000

times in total.

However, in order to make the most out of the implementation of parallelization,

2There are several methodologies and respective packages developed in R that harness highperformance computing techniques to deal with large memory, or even out-of-memory data problemsEddelbuettel (2013).

76

we have to make sure that the R code for predicting one trajectory (one micro-

simulation) is the most efficient one. Hence, there is one more step before we move

forward to the implementation of parallel computing, i.e., to improve the efficiency

of the R algorithm for a single micro-simulation. A very helpful R library for this

task is the ’Rprof’. This library provides a set of built-in R functions that enable a

relatively easy profiling of the execution of R expressions.

Tables in Appendix, with R-profiling results, indicate the degree of improvement we

achieved in our code by simply replacing time consuming R structures with their more

efficient counterparts. More specifically, by simply replacing ”data.frame” by ”list”

in all R-code instances we managed to make the program almost twice as fast (e.g.,

from 5.86 the total running time dropped to 2.88 time units). By also replacing the

approximate integration of the hazard function for the onset of the first malignant

cell, by the respective definite cumulative hazard function, we achieved a further

22.2% reduction (from 2.88 to 2.24 time units).

We have described, so far, the gain in the required running time we achieved when

we optimized the R code internally, by simply performing R profiling and improving

its efficiency accordingly. This process involved work on the efficiency of the code’s

architecture, i.e., removing unnecessary computations/parts of the code, replacing

’loops ’ with vectorized R functions, etc. Furthermore, we replaced complicated, time

consuming R functions and structures with more efficient counterparts (e.g., saving

results from a function in ’list ’ instead of ’data.frame’ format). In this way we were

able to substantially reduce the required time to run 50,000 micro-simulations, from

5532.77 secs (≈1.5 hours) to 2114.92 secs (≈35 minutes). However, even with this sig-

nificant improvement, running 50,000·100,000=5·109 micro-simulations to calibrate

just one parameter of the MSM parameters (Bayesian method) or sets of parameter

values requires the absolutely absurd time of 6.7 years (!!!). After optimizing the

efficiency of the algorithm working inside the R code, we drew our attention to HPC

77

techniques, in order to perform parallel computing in R. We focused on the partic-

ularities of each calibration method to reduce the respective computational burden

to the minimum by implementing relevant techniques accordingly.

3.3.5 Algorithm efficiency: Bayesian vs Empirical Calibration

To better understand the gain in the computational burden, as well as to compare

the two methods in terms of their efficiency, we calculate the required running time to

calibrate all four MSM parameters with each method. As already mentioned, in order

to implement the Bayesian calibration method and take one chain of 100,000 values

for the joint posterior of the calibrated MSM parameters we need to repeat the set of

M·N=50,000 micro-simulations, 4·100,000 times in total. Hence the required time for

the Bayesian calibration is 4.26·100,000·4≈19.7 days (table 3.1), if we use a cluster

of 64 nodes (8 computers with 8 cpus each). An analogous task with the Empirical

calibration method requires testing of 100,000 vectors from the parameter space in

total. By further parallelizing the process and simultaneously computing e.g., 20 ·50 ·

1000 = 106 micro-simulations, we can calibrate the four MSM parameters much faster

compared to the Bayesian method3. According to table 3.1, the required time to run

that many micro-simulations in parallel, is 105.4 seconds≈1.7 minutes. To complete

the empirical calibration procedure, we have to run this set of microsimulations

100,000/20=5,000 times in total. Hence the required time to calibrate the four MSM

parameters with the Empirical method is ≈ 6.1 days. Depending on the available

resources we can achieve further reduction in the running times. In our case, for

example, if we further split the Empirical calibration process into three independent

pieces, we can receive the results from this method in ≈ 2 days (i.e., almost 10

times faster than with the Bayesian method). Consequently, we realize that the

3Actually, depending on the available HPC techniques we use and the computer clusters capacity,we can further parallelize this problem and achieve even large reduction in the required time to runthis procedure

78

architecture of this method provides for parallelization to a significant extent, with

corresponding reduction of the required time, unlike the Bayesian calibration method,

or any directed search method. Hence an Empirical method for calibrating an MSM

can be proved much more practical (efficient) than a Bayesian one, with respect to

the required time to run.

Table 3.1 describes improvements in efficiency of an embarrassingly parallel algorithm

involving M·N micro-simulations. This ’journey ’ begins from the completely trivial

case, i.e., performing the computations on a single machine without interfering in the

single-threaded R nature and without taking care of the time consuming R functions

and structures. From this starting point, the process of improving the efficiency of

the algorithm ’travels ’ through the notion of implicit parallelism to eventually reach

the optimum solution by using explicit parallelism and properly set up of computer

clusters (network of multiple computers with multiple processors each) and relevant

HPC R techniques/packages/toolkits. The ultimate gain from this process, in order,

e.g., to perform M·N=50·1000=50,000 micro-simulations, can reach the impressive

number of three orders of magnitude (5532.77/4.26≈1299 (!!!) ). The implementation

of HPC for parallel computing with R was actually what made feasible to calibrate

the MSM for lung cancer using this open source statistical software.

3.3.6 Concluding remarks

The problem of calibrating an MSM in R falls in the category of the ”embarrass-

ingly parallel computations” and necessitates use of high performance computing.

In the previous paragraphs we explain the computational considerations imposed

by the calibration of an MSM R, using as an example the implementation of the

two calibration methods described in this chapter. According to this example the

Empirical calibration method is much more efficient than the Bayesian one, since it

79

Type Nodes M N Time Reduction Notes(secs) Ratio*

- 1

50 1000 5532.77 -

no parallel computingor profiling

100 1000 11065.61 -20 2500 5534.07 -40 2500 11077.26 -

SOCK 1

50 1000 2114.92 2.62

no parallel computingafter profiling

100 1000 4229.86 2.6220 2500 2115.41 2.6240 2500 4234.31 2.62

SOCK 10

50 1000 16.55 334.31implicit parallelcomputing, after

profiling

100 1000 34.82 317.7920 2500 17.24 321.0040 2500 34.25 323.42

SOCK 32


profiling

100 1000 34.06 324.8920 2500 16.11 343.5240 2500 34.37 322.29

SOCK 50


profiling

100 1000 33.78 327.5820 2500 16.06 344.5940 2500 34.58 320.34

SOCK 100


profiling

100 1000 34.20 323.5620 2500 16.10 343.7340 2500 34.41 321.92

MPI 32

50 1000 7.45 742.65explicit parallelcomputing, after

profiling

100 1000 12.93 855.8120 2500 6.19 894.0340 2500 12.91 858.04

MPI 64

50 1000 4.26 1298.77explicit parallelcomputing, after

profiling

100 1000 7.93 1395.411000 1000 105.420 2500 3.31 1671.9240 2500 8.17 1355.85

SOCK: sockets

MPI: Message Passing Interface

* Ratio reduction in the running time achieved compared to no processing (parallel

computing or profiling)

Table 3.1: Algorithm efficiency: Required time (in seconds) to run M·N micro-simulations using different computing capacities.

80

can actually run 10 times faster. This relevant efficiency between the two methods

is also applicable when comparing undirected with directed searching algorithms for

calibration, due to the conceptual similarities they bare with the Empirical and the

Bayesian calibration method respectively.

The most impressive finding from this exercise, was the ultimate gain we achieved in

performing a set of parallel computations, which essentially reaches three orders of

magnitude compared to the initial time, namely the time required before any work in

the architecture of the R code or any type of parallelization. The reported running

times in this section exemplify the imperative need for using HPC methods in order

to render the development of an MSM in R feasible, with all the beneficial effects

such an attempt will have on the overall research in the area.

81

3.4 Comparative Analysis

The main objective of this chapter is the comparison between an empirical and a

Bayesian approach to the calibration problem of micro-simulation models (MSMs)

in medical decision making (MDM). The streamlined MSM, developed in the first

chapter, is used as a tool for the implementation of both calibration methods, de-

scribed in section 3.2 of the thesis. To our knowledge, this is a first attempt of a

comprehensive and systematic comparison of two calibration methods in the context

of micro-simulation modeling. In the following paragraphs we describe the study

design for the quantitative and qualitative comparison of the two methods.

3.4.1 Input Data

The MSM model for the natural history of lung cancer (described in Chapter 2)

takes into account three baseline characteristics, namely the age, gender and smoking

habits in order to predict a person’s trajectory. The smoking habits comprise the

smoking status of the person at the beginning of the prediction period, i.e., current,

former or never smoker, as well as, when relevant, the smoking intensity, expressed as

the average number of cigarettes smoked per day. In order to keep the dimensionality

of the problem to an easily manageable level, we restrict our interest to males, current

smokers.

We combine information found on census data (US 1980 census) and other relevant

statistics (Statistical Abstract of the US, 1980) in order to simulate the baseline

characteristics of a large sample representative of the US population. This large

sample will be the ”pool” from which several sub-samples will be drawn and will be

used as input to the MSM for the purposes of both model calibration and assessment.

Assuming that the entry year is 1980, we predict 26 years ahead and calibrate the

82

MSM to the observed lung cancer incidence rates reported in SEER 2002-2006 data.

We simulate the age distribution based on information found in the US 1980 census

about males, who are current smokers. Given the age group, we simulate the smok-

ing intensity for each individual following the distribution of the average number of

cigarettes smoked per day, as reported in the Statistical Abstract of the US, 1980.

Due to the fact that these tables report the smoking intensity in groups (i.e., <15,

15-24, 25-34, and >34 cigarettes/day) we first draw the smoking intensity category

given age, and then we randomly draw an integer from the selected group assuming

uniform distribution for the smoking intensity within that group. This integer even-

tually expresses the average number of cigarettes smoked per day for the particular

individual.

Following this procedure we simulate a large sample of NL=100,000 individuals rep-

resentative of the 1980 US reference population. This will be our simulated ”true“

population for which predictions about the lung cancer incidence are to be made

using the MSM. For the purposes of both model calibration and validation two sub-

samples will be drawn from this simulated population. In particular, we randomly

draw two sub-samples of size n=5,000 each. The first one will be the input to the

MSM for the implementation of the two calibration methods. We refer to that sam-

ple as ”calibration input” (smpl.C5000). The second one, referred as ”validation

input”, will be used for validating the calibration results (smpl.V5000). Further-

more, other sub-samples will be also randomly drawn from the same, N=100,000,

simulated population, to serve other purposes of the comparative analysis presented

in this chapter (e.g., samples to produce calibration plots, etc).

Table 3.2 presents the age distributions of the samples used as input for the compar-

ative analysis of the two calibration methods. We denote with smpl100,000 the sample

of 100,000 people used for the calculations of the calibration targets (see section

83

Age (years)Input sample

US 1980 Calibration Validation(smpl100,000 ) (smpl.C5000) (smpl.V5000)

17-39 53672 530 55440-44 7078 79 6345-49 6827 65 7050-54 7122 68 8055-59 6699 72 5360-64 5876 60 5465-69 4833 48 5570-74 3503 43 2575-79 2233 22 2380-84 1284 8 16>85 873 5 7

Table 3.2: Age distributions of the samples (input data) used for the comparativeanalysis of the two calibration methods.

3.4.3), smpl.C5000 the sample of size N=1,000 that was used as an input for both cal-

ibration methods, as well as the internal validation of the results, and smpl.V5000 the

sample used for the external validation of the calibrated models. All these samples

are representative of the US 1980 population, i.e., the age and smoking intensity dis-

tributions of these samples resemble the corresponding observed data about males,

current smokers, reported in the 1980 US census and the Statistical Abstract of the

US from the same year.

3.4.2 MSM parameters to calibrate

The streamlined MSM for the natural history of lung cancer, that we developed in the

first chapter, involves numerous parameters describing different parts of the model.

In order to be able to run the procedures in plausible times, instead of performing an

exhaustive calibration, we rather run a restricted one, focusing our interest only on

four MSM parameters. All the rest are kept fixed, according to known relationships

found in the literature or plausible assumptions to simplify the calibration problem

84

(Chapter 2). In particular:

• we keep the MSM parameters pertaining to the onset of the first malignant

cell fixed to the quantities found in the literature about males, current smokers

(Table 2.2).

• from the Gompertz(m,s) distribution for the tumor growth, we only calibrate

m, assuming s=31·m (section 2.3.1)

• from the log-Normal distributions of the disease progression part of the MSM,

we only calibrate the location parameters (i.e mdiagn, mreg and mdist) assum-

ing that location=scale (i.e. means = standard deviations)

• our prior beliefs (i.e., prior distributions and plausible intervals for the MSM

parameters to calibrate) are in accordance with findings in the literature of the

natural history of lung cancer (section 2.3.1)

3.4.3 Calibration Targets

In order to keep the calibration problem as simple as possible, we only calibrate our

model to lung cancer incidence by age group. As reference point we use the observed

rates in SEER 2002-2006 data, so as to reproduce plausible numbers (Table 3.3). The

calibration exercise relies on the strong assumption that the lung cancer incidence

rates, conditional on gender and smoking status, remain unchanged throughout the

26 years prediction period (from 1980 to 2006) and are close to the reported ones in

SEER 2002-2006 rates. Another problem when calibrating the lung cancer natural

history model is the occurrence of rare events especially in ages less than 55 years

old. To overcome this problem, we combine the eleven 5-years age groups presented

in SEER data, into three, i.e. <60, 60-80, and >80 years old. In this way we are able

to observe all the lung cancer incidence rates even when we use as input a sample

of people of moderate size, (e.g., n=500). We assume that the lung cancer cases yj

85

follow Poisson(λj) distribution, where λj is the rate of the jth age group, expressed

as number of cases per 100,000 person-years (PYs).

Age group Observed Predicted

<40 5.2

41.9

11 ± 4.48

41 ± 4.25[40, 45) 10 15 ± 3.3[45, 50) 26.3 21 ± 2.83[50, 55) 56.7 49 ± 4.18[55, 60) 111.3 107 ± 6.46[60, 65) 208.4

387.4

192 ± 9.01

391 ± 15.96[65, 70) 329.3 392 ± 16.32[70, 75) 455.7 481 ± 18.62[75, 80) 556.2 498 ± 19.89[80, 85) 554.5

498.1504 ± 19.93

464 ± 19.36>85 441.7 425 ± 18.79

Table 3.3: Observed (2002-2006 SEER data) and predicted (M=100, N=100,000,θfix) lung cancer incidence rates (cases/100,000 person·years) by age group.

We ran an ad hoc analysis to identify combinations of parameter values that give

plausible predictions, i.e, predictions close to the observed quantities (SEER data

2002-2006). For this purpose we implemented the MSM on the simulated US 1980

population of N=100,00 males, current smokers. Given their simulated baseline

characteristics, i.e., age and smoking intensity, we predicted twenty six years ahead,

that is, we predicted lung cancer incidence in 2006. We implemented the model

M=100 times in order to increase the accuracy in our predictions. At the end of the

prediction period, we combined the results (predicted lung cancer cases per 100,000

person years) by age group. Following the results of this ad-hoc analysis, we identified

a set of values θfix = [θc1, θc2, θ

c3, θ

c4]T=[0.00038, 2, 1.1, 2.8]T , for which the MSM

predicts lung cancer incidence rates per age group close to the observed quantities

(SEER data). Table 3.3 presents the predicted lung cancer incidence rates for these

fixed parameter values. We set these rates Yclbr=[y1−clbr, y2−clbr, y3−clbr]T=[41, 391,

86

464]T , to be the calibration targets for each one of the two calibration methods.

The reason for choosing Yclbr to be the calibration targets, rather than the respective

observed rates in SEER data, is that we wanted to control for the effect the input

sample as well as the structure of the model would have on the MSM predictions.

In this way, any deviations of the predictions from the reference points would be

attributed to a greater extend to the real underlying differences between the two cal-

ibration methods rather than to other nuisance, for the purposes of this comparative

analysis, factors.

We repeated a similar procedure running the model M=2000 times in total, for

θfix, using this time the “calibration input” sample. We again predicted the lung

cancer incidence twenty six years ahead and combined the results by age group, thus

resulting in the vector of rates Yfix=[y1−clbr, y2−clbr, y3−clbr]T=[50, 353, 452]T . We use

this vector later on to validate the results from the two calibration methods. The

reason for that, the output of an MSM depends, to some extend, on the input sample,

therefore, even for the same θfix and the same total number of micro-simulations (i.e.,

M·N=107), the output can be slightly different (Yclbr vs Yfix).

Using the notation introduced in section 3.2.1, we define Yclbr = M100(θfix, smpl100,000),

and Yfix = M2000(θfix, smpl.C5000), to be the two reference points, the predictions from

the two calibrations MSMs will be compared to. Consequently, we have three refer-

ence points when comparing the results from the two methods: θfix for the calibrated

parameters, as well as Yclbr, and Yfix for the predicted lung cancer incidence rates.

3.4.4 Simulation Study

The ultimate goal of this chapter is the quantitative and qualitative comparison of the

two calibration methods for MSMs, the Bayesian and the Empirical one. To this end

we design a simulation study that allows for comparisons of multiple aspects of the

87

calibration procedure. The simulation study pertains to the implementation of both

methods to calibrate the parameters of the streamlined MSM for the natural history

of lung cancer. In particular, we calibrate all four (θ1, θ2, θ3, θ4) MSM parameters, and

compare the results from the two methods using both qualitative and quantitative

measures, as well as graphical means.

Methods comparability

In order to ensure comparability of the two methods we have to calibrate the same

set of MSM parameters θ, to the same calibration targets Yclbr using the same input

data (”calibration sample”). In addition, the prior information about θ in the

Bayesian calibration method has to be consistent with the plausible intervals assumed

in the empirical calibration method while the estimation of the model’s outputs of

interest should be based on the same number of embedded micro-simulation runs

(simulation study size).

The results from each calibration method, i.e. point estimates for MSM parameters

and predicted outputs, are influenced by the several sources of uncertainty (chapter

1) inherent in the model. Failure to recognize this problem and take precautions to

control for it, may cause misleading results and, consequently erroneous conclusions

from the comparative analysis. Structural uncertainty cannot be examined in our

case since both methods are implemented for the calibration of the exactly the same

MSM. We account for selection uncertainty and sampling variability (both related

to the calibration data) by setting the same calibration targets. Moreover, in order

to account for the effect of the simulation (Monte Carlo) variability by implement-

ing the MSM multiple times on the same input sample, and take point estimates

(means, standard deviations) of the outputs of interest (calibration targets or indi-

vidual trajectories). Parameter uncertainty on the other hand, is an integral part of

the calibration method itself, and is captured by the determination of distributions,

88

Characteristics MethodBayesian Empirical

Parameters to calibrate θ=[θ1, θ2, θ3, θ4]T

Calibration Targets Yclbr (relatively easy tocombine more than one sourcesof information)

Yclbr (when more than onecalibration targets need tospecify a rule to combine them)

GoF log-likelihood (inherent in theapproximate MH algorithm)

Deviance

Convergence criteria Trace plotsConvergence Diagnostics

x2 test (a=5%)

Stopping rule V sets of values for θ from the converged setsResult Random draws from the joint

posterior distribution of themodel parameters θ

Random draws from theempirical joint distribution ofthe “acceptable” values for θ

Table 3.4: Implementation of two calibration methods on the MSM for lung canceraccording to the seven-steps approach presented in Vanni et al. (2011)

rather the point estimates, of the resulted calibrated MSM parameters. This char-

acteristic will provide an additional means of comparison between the two methods.

Furthermore, we use the same sample of baseline characteristics (calibration input)

for the implementation of the two calibration methods, in order to eliminate the

effect of the population heterogeneity on the comparative analysis results.

Finally the calibration results from the two methods are integrated following exactly

the same procedure (e.g., using percentiles to describe the distribution of calibrated

parameters and MSM outputs). Table 3.4 juxtaposes the implementation of the two

calibration methods from the seven steps approach point of view presented in Vanni

et al. (2011) .

Simulation study size

The accuracy of the calibration results depends heavily on the total number of micro-

simulations involved in the computations. As already mentioned, we focus our inter-

est on the calibration of an MSM that describes the natural history of lung cancer

for males, current smokers. As already mentioned, in order to account for the effect

89

of simulation variability, we implement the MSM multiple times M on the input

data (calibration sample of baseline characteristics). Each time the model predicts n

trajectories 26 years ahead, one for each person in the input sample. We summarize

the results at the end of the prediction period, i.e., we calculate the lung cancer

rates per age group. This procedure results in M predictions per age group. As

a point estimate of the predicted lung cancer incidence rates we use the averages

of the M predicted values by age group. The accuracy of the calibration results is

highly related to the size of the simulation study, i.e., the total M·n micro-simulations

involved in the calculations.

Age groupM

10 20 30 50 100

n=500<60 34 ± 38.37 35 ± 47.18 44 ± 68.96 56 ± 78.4 38 ± 57.71

[60, 80) 370 ± 202.63 364 ± 208.72 382 ± 240.71 365 ± 218.9 361 ± 216.48>80 369 ± 316.64 434 ± 331.91 460 ± 259.59 466 ± 309.12 486 ± 315.44

n=1000<60 41 ± 35.32 43 ± 44.23 41 ± 41.53 43 ± 49.18 43 ± 45.3

[60, 80) 415 ± 160.57 392 ± 171.92 401 ± 160.01 393 ± 156.06 405 ± 155.36>80 480 ± 218.8 430 ± 174.45 496 ± 195.51 464 ± 185.99 435 ± 172.29

n=2500<60 47 ± 29.72 41 ± 24.63 41 ± 25.41 40 ± 24.75 41 ± 24.78

[60, 80) 386 ± 101.47 372 ± 103.92 406 ± 100.36 395 ± 103.44 390 ± 101.5>80 466 ± 115.75 472 ± 123.32 492 ± 123.07 465 ± 120.89 476 ± 121.27

n=5000<60 44 ± 23.72 40 ± 15.8 41 ± 19.01 43 ± 19.6 40 ± 19.16

[60, 80) 398 ± 82.34 397 ± 67.61 393 ± 73.56 402 ± 69.93 409 ± 73.82>80 454 ± 71.26 456 ± 97.57 444 ± 79.38 478 ± 93.72 464 ± 89.67

M: total number of micro-simulations per individual

n: input sample size

Table 3.5: Predicted lung cancer incidence rates (cases/100,000 person·years) per agegroup, for different study sizes (M·n).Calibration targets: Yclbr=[y1−clbr, y2−clbr, y3−clbr]

T=[41, 391, 464]T

A key issue in the study design is the choice of the M·n combination of total micro-

simulations involved in the calculations of each calibration method. There is a trade

off between the achieved accuracy in the predictions and the required running time.

Our goal was to identify a combination that provides accurate predictions within

90

plausible running times. To this end, we investigated different M·n combinations

in order to specify the one that better serves the purposes of the simulation study.

We randomly extracted sub-samples of size n=500, 1000, 2500, and 5000, from the

N=100,000 simulated 1980 US population. These samples were subsequently used as

input to predict lung cancer incidence rates 26 years ahead, implementing the model

M=10, 20, 30, 50, and 100 times respectively. Table 3.5 presents the predicted

lung cancer incidence rates (average ± sd) per age group for each scenario. Figure

3.3 provides a graphical representation of table 3.5. According to this table, the

combination of M=10 and n=5000 seems adequate to produce sufficient results in

plausible times 4. The focus, when making this decision was both in model model

accuracy (bias and variability of MSM predictions), as well as on the total required

running time.

4The required running time for M·n=50,000 micro-simulations is close to 5secs using 64 cores(8nodes;8cpus)

91

Figure 3.3: Predicted (mean±sd) lung cancer incidence rates (cases/100,000 person·years) by agegroup, for different M·n combinations, given fixed MSM parameter values (θfix=[0.00038, 2, 1.1,2.8]T ).

92

Implementation

We use both the Bayesian (3.2.2) and the Empirical method (3.2.3), to calibrate all

four MSM parameters θ=[θ1, θ2, θ3, θ4]T . We calibrate the MSM to three targets (

Yclbr=[y1−clbr, y2−clbr, y3−clbr]T =[41, 391, 464]T ) i.e., the predicted lung cancer inci-

dence rates per age group for fixed values of the MSM parameters θfix=[0.00038, 2,

1.1, 2.8]T (Table 3.3). For each θk we use Truncated Normal distribution (TN(µθk ,

sdθk), with µθk=sdθk) to specify either the prior for the Bayesian method or the dis-

tribution of plausible parameter values for the Empirical method. In particular, we

set:

• m=θ1 ∼ TN(µ(θ1) = sd(θ1)=0.0008, L(θ1)=0.00001, U(θ1)=0.0016

)• mdiagn=θ2 ∼ TN

(µ(θ2) = sd(θ2)=4, L(θ2)=0.0001, U(θ1)=8

)• mreg=θ3 ∼ TN

(µ(θ3) = sd(θ3)=2.2, L(θ3)=0.0001, U(θ3)=4.4

)• mdist=θ4 ∼ TN

(µ(θ4) = sd(θ4)=5.6, L(θ4)=0.0001, U(θ1)=11.2

)Suppose that we apply the Bayesian method in order to calibrate only θ1=m and

θ2=mdiagn, and produce a chain of length B=100,000 for each parameter. To imple-

ment the Gibbs sampler with the embedded approximate MH algorithm, we follow

the steps:

1. set θ1 = θ01, θ2 = θ0

2 (starting values), and keep θ3 = θc3 = 1.1, θ4 = θc4 = 2.8

(fixed). Denote θ0 = [θ01, θ

02, θ

c3, θ

c4]T the vector with the starting values for the

MSM parameters.

(a) given θ0 run the micro-simulation model M(θ) on the calibration sample

(n=1000) to predict individual trajectories 26 years ahead and calculate

the predicted lung cancer cases ymj by age group j.

(b) repeat step (a) M=50 times (m=1, 2, ..., M) resulting in M predicted lung

93

cancer incidence counts per age group. These ymj counts are considered

random draws from Poisson(λj) distributions

(c) calculate the likelihood:∏J

j=1 fj(yj−clbr|λj = gj(θ0)). These λj are func-

tion of the MSM parameters. Due to the model’s complexity the form

g(·) is hard to derive, therefore we approximate these quantities using the

respective MLEs (sample means), hence, λj = gj(θ) = 1M

∑Mm=1 ymj

2. propose a new value θ∗1

3. repeat steps (a) through (c) for θ∗ = [θ∗1, θ02, θ

c3, θ

c4]T

4. calculate the ratio r1(θ1, θ∗1) =

π(θ∗1)∏Jj=1 fj(yj |gj(θ∗))

π(θ01)∏Jj=1 fj(yj |gj(θ0))

and accept θ∗k with proba-

bility α(θ01, θ∗1) (section 3.2.2)

5. set θ01 =

θ∗1, if we accept θ∗1

θ01, otherwise

6. propose a new value θ∗2

7. repeat steps (a) through (c) for θ∗ = [θ01, θ∗2, θ

c3, θ

c4]T

8. calculate the ratio r2(θ2, θ∗2) =

π(θ∗2)∏Jj=1 fj(yj |gj(θ∗))

π(θ02)∏Jj=1 fj(yj |gj(θ0))

and accept θ∗k with proba-

bility α(θ02, θ∗2) (section 3.2.2)

9. set θ02 =

θ∗2, if we accept θ∗1

θ02, otherwise

The resulting [θ01, θ

02] values from the aforementioned process is one update for the

chains of the calibrated parameter values. We repeat steps (1) through (9), B=100,000

times. From the total of B=100,000 values, we collect for each chain V=1,000 values

selecting every 50th iteration from the last 50,000 values. The resulting V=1,000 vec-

tors comprise a sample representative of the joint posterior distribution of the MSM

parameters and all together correspond to the ΘBayes matrix of calibrated parameter

values of the MSM for lung cancer.

94

We follow an analogous procedure to calibrate one, or any combination of two, three

or all four parameters of the MSM. Figures 3.18 and 3.19 depict in flow charts the

implementation of the Bayesian method to calibrate θ1.

For the Empirical calibration method, we implement an LHS of moderate (NLHS=10)

size, L=10,000 times, thus resulting in Nemp= NLHS· L = 100,000 vectors of param-

eter values in total. For each vector of parameter values we implement the micro-

simulation model M=10 times, and we calculate the corresponding predicted lung

cancer incidence rates per age group. As in the Bayesian calibration method, we

assume that the calibration targets yj (lung cancer cases per age group j, j={1,

2, 3}) are count data from Poisson distributions, i.e., yj ∼ Poisson(λj=gj(θ)).

Since the form g(·) is hard to derive, we use the M=10 draws predicted by the

model, to calculate estimates for the parameter of these Poisson distributions, i.e.,

λj = gj(θ) = 1M

∑Mm=1 ymj. Here, the deviance statistic follows a chi-square distribu-

tion with ν = 3d.f., hence, stipulating a 5% level of statistical significance, we select

those sets satisfying Di < 7.81, thus resulting in Nemp ”acceptable” sets of parameter

values in total.

Among those we randomly extract V=1,000 vectors (with replacement if necessary,

i.e. if the procedure results in less than 1,000 ”acceptable” vectors). The resulting

vectors comprise a sample representative of the joint distribution of the ”acceptable”

MSM parameters, according to the Empirical calibration criteria, and all together

correspond to the ΘEmp matrix of values for the calibrated parameters of the lung

cancer MSM (section 3.2.3).

Figures 3.1 and 3.2 provide a graphical representation of the mechanism for the

extraction of NLHS vectors of values from a two-dimensional parameter space, for

different NLHS sizes. Two sets of graphs are presented in each figure. In the left

graph the extracted value is the center of each selected interval, whilst in the right

95

one the value is randomly chosen from the respective interval.

3.4.5 Terms of comparison

The results from both methods are sets of values describing the joint distribution of

the calibrated MSM parameters. The resulting sets represent random draws from the

joint posterior distribution, or the empirical joint distribution of the values satisfying

the convergence criteria, for the Bayesian and the Empirical method respectively. For

the purposes of this comparative analysis, each method results in V=1000 vectors

from the multi-variate parameter space.

We use these results to make predictions for the quantities of interest, i.e., lung cancer

incidence rates by age group. In particular, for each vector of parameter values we

implement the MSM multiple (M=50) times, and we produce point estimates (means)

of the respective quantities. The resulting (V=1000) mean incidence rates for each

age group represent random draws from the posterior or the empirical predictive

distribution. Depending on the input sample, the predictions can be used for the

purposes of internal (smpl.C5000) or external validation (smpl.V5000).

We compare the two methods using qualitative and quantitative measures as well as

graphical representations of the results produced.

In particular we provide:

1. Density plots (parameters and predictions)

We compare the density plots of the marginal distributions of the calibrated

MSM parameters, as well as the distributions of the predicted calibration data

(lung cancer incidence rates by age group). We use the Kullback-Leibler dis-

tance (60) to assess the relative entropy between the probability distributions,

resulted from the two methods, with respect to either each calibrated parameter

or the predictions by age group. Low values of this distance indicate similarity

96

of the two distributions, provided that they do not present large differences in

the overall shape (e.g., different skewness, and higher order moments in gen-

eral). We also apply the Kolmogorov-Smirnov test to check whether results

from the two methods come from the same underlying distribution. When

the null hypothesis is not rejected (similar results from the two methods), we

include in the graph the respective p-value. In the density plots for the cali-

brated MSM parameters we also include the respective prior distributions. For

the predictions we take by each calibrated MSM, we present two different sets

of results, one for internal and the other for the external validation of the cal-

ibrated MSM, using as input the calibration (smpl.C5000) and the validation

(smpl.V5000) samples respectively.

2. Correlation and contour plots (parameters)

We also provide correlation (scatter) plots of all the pairs of calibrated MSM

parameters, as well as contour plots to identify high density points in the

bivariate resulted distributions. The scatter plots are accompanied by the

Pearson correlation coefficient.

3. Calibration and box plots (predictions)

In an attempt to provide additional means to compare the two methods, we

use the calibration results (sets of values for the calibrated MSM parameters)

to predict lung cancer incidence rates based on different samples of baseline

characteristics (input data). In particular, we extracted 20 different samples

in total, each of size n=5000, representative of the 1980 US population of

males, current smokers. Each sample includes individual level data on age and

smoking intensity. For each one of these 20 samples we apply the model M=50

times, to predict lung cancer incidence rates by age group. We use the sample

mean as a point estimate of the predicted quantity by age group. Repetition

of this process for each set of values for the calibrated MSM parameters results

97

in 1000 predicted rates for each age group.

Using these estimates we produce calibration and box plots to compare the two

methods. In the calibration plots we plot the point estimates of the predic-

tions from the Bayesian method versus the respective ones from the Empirical

method for each one of the 20 different samples, that were used as input data. If

the two methods produce similar results, the points in this plot should be scat-

tered along the x=y line. We juxtapose the box-plots of the predictions from

each calibration method, for each one of the 20 samples, by age group. The

extent of overlapping between the respective box-plots, indicate equivalence of

the results produced by each method.

4. Discrepancy measures

We also provide four quantitative (two univariate and two multivariate) mea-

sures of discrepancy to compare the predictions from the two methods, namely

the mean absolute (MAD) and mean squared (MSD) deviations, as well as the

Euclidean, and the Mahalanobis distances.

The univariate measures of discrepancy are defined as:

MAD =1

V

V∑v=1

|yvj − yj|yj

(3.4)

MSD =1

V

V∑v=1

(yvj − yjyj

)2

(3.5)

where, yvj are the point estimates for the lung cancer incidence rate of the jth

age group given the vth vector of MSM parameters, and yj is the jth component

of the vector used as reference point.

The multivariate distances, on the other hand, given M-dimensional vectors x

98

and a constant vector c (center), is defined as:

DM =√

(x− c)T · S−1 · (x− c) (3.6)

where, c represents the center of the multidimensional space. In the Euclidean

distance S is the identity matrix, while in the Mahalanobis distance S is the

respective covariance matrix of the x vectors.

In the case of the calibrated parameters, this statistic measures the distance of

each x vector of MSM parameter values from the c=θfix vector of fixed values

assumed in the simulation study. When it comes to MSM predictions, these

distances measure the deviation of each vector of predictions from the vector

used as reference point (Yclbr or Yfix).

Multivariate distances are useful to be used in conjunction with the results from

the univariate ones, since they provide an idea about the combined deviation

of the MSM predictions from the reference points (here, lung cancer incidence

rates per age group). Furthermore, Mahalanobis distance adds objectivity in

the comparison of the results from the two MSMs, since it weighs the relevant

deviation based on the underlying covariance matrix. For instance, the distance

of a vector x with large variance, as well as the distances of two vectors (x1

and x2) from c, are downweighted (and vice versa). Hence, the final results are

not distorted by potentially high correlations or different order of magnitude

between the involved quantities of interest.

Both mean deviations regarding the MSM predictions are weighted based on

the size of the respective lung cancer incidence rates by age group. We use

two reference points for these calculations. The first one is the set of calibra-

tion targets Yclbr = M100(θfix, smpl100,000), i.e., the set of MSM predictions for

99

θfix, smpl100,000 and M=100. The second is the set of MSM predictions Yfix

= M2000(θfix, smpl.C5000), i.e., the model’s output again for θfix, but using the

calibration input (smp.Cl5000) and running the model M=2000 times in total.

As already mentioned (section 3.4.3), the reasoning behind the second compar-

ison is that, even if the calibration procedure resulted in the vector θfix, there

would be a deviation between MSM’s predictions and calibration targets, even

for the same number of total micro-simulations (107), if instead of smpl100,000

we used the calibration input (smpl.C5000). This deviation has to do with two

sources of uncertainty inherent in the MSM (Chapter 1), namely, population

heterogeneity (different composition of the two input samples), and stochastic

uncertainty.

By comparing the MSM predictions, resulting from each calibration method

using the calibration input sample (smpl.C5000), with the model’s output for

θfix using exactly the same input sample, we control for the effect of the pop-

ulation heterogeneity in the final results. Therefore, any deviations between

MSM predictions and Yfix can be attributed, to a greater extend, to the real

underlying differences between the two calibration methods, rather than being

distorted by the population heterogeneity.

3.5 Results

3.5.1 Parameters

Table 3.6 and figure 3.4 compare the marginal distributions of the calibrated param-

eters from each method. There is a considerable overlap between the results from

the two calibration methods. This overlap is more prominent in the case of θ3=mreg

and θ4=mdist, where the Kolmogorov-Smirnov test cannot reject the null hypothesis

100

that the respective pairs of distributions represent the same underlying populations

at α = 0.1%. In these two cases, noteworthy also is the proximity between the

marginal distributions of the calibrated parameters and the assumed priors, indicat-

ing potential identifiability problem for these two MSM parameters. Furthermore

the relative entropy assessed by the Kullback-Leibler (symmetric) distance, is very

close to 0.5 for all MSM parameters except θ1=m.

Both methods include the fixed values, assumed in the simulation study (θfix=[0.00038,

2, 1.1, 2.8]T ), within the range of the calibrated MSM parameters. This is an indica-

tion that both methods produce reasonable results. However, the marginal distribu-

tions are centered away from these fixed values. In most of the times, the respective

fixed parameter value lies outside the Interquartile Range (IQR), with the only ex-

ception being θ1=m for the empirical and θ2=mdiagn for the Bayesian method.

Contour plots (figures 3.6, 3.7) reveal bivariate associations between the calibrated

parameters. The underlying patterns are similar for the two methods. There is a

strong correlation between θ1 and θ2 in both methods. Furthermore, changes in the

θ3 and θ4 do not seem to considerably affect the respective θ1 values (points on the

respective plots are gathered around a conceivable line perpendicular to the θ1 axis).

The three parameters θ2, θ3, θ4 seem totally unrelated to each other.

Identifying highly correlated parameters can prove very helpful for the further devel-

opment and improvement of the MSM, as well as an extremely interesting discovery

for experts investigating the described phenomenon (here lung cancer). With re-

spect to the development of the MSM, strong correlations may indicate redundant

parameters and suggest a more parsimonious version of the model by expressing some

parameters as functions of others, they are highly correlated with. Regarding the

true process described by the MSM, strong correlations may reveal relationships be-

tween the underlying mechanisms, previously unknown or disregarded by the experts,

101

and, hence advance the overall research on the phenomenon through new interesting

paths.

The two multidimensional discrepancy measures lead to contradictory conclusions.

According to the Euclidean distance the resulted, from the Bayesian method, values

for the calibrated MSM parameters are closer to the fixed values specified in the

simulation study (θfix) than those from the Empirical method. Noteworthy, however,

is the fact that, although the univariate (figure 3.4, table 3.6) and the bivariate

analysis (figures 3.6-3.7), as well as the Euclidean distance (figure 3.5), suggest that

there are some discrepancies between the calibrated values, when considering the

multidimensional parameter space, centered at θfix, the Mahalanobis distances (figure

3.5) indicate pretty similar results between the two methods. As already mentioned

(section 3.4.5), conclusions based on the Euclidean distance may be misleading since,

this measure can be distorted by several factors, e.g., high correlations or different

order of magnitude between the involved quantities of interest. In the case of the

calibrated MSM parameters, there is high correlation between two of them (θ1=m

and θ2=mdiagn), while θ1 differs by almost four orders of magnitude from each one

of the other MSM parameters.

102

Figure 3.4: Density plots, Kullback-Leibler distance, and Kolmogorov Smirnov p-value, comparing the marginal distributions of the calibrated MSM parameters be-tween the two calibration methods.

103

Figure 3.5: Distributions of multidimensional distances of the calibrated MSM pa-rameters from the fixed values assumed in the simulation study (θfix).

104

Met

hod

θ 1=

mM

inQ

1M

edia

nM

ean

Q3

Max

Fix

edva

lue

Dev

iati

on(±

SD

)(P∗ k)

(%)

Bay

esia

n2.

14·1

0−4

3.15·1

0−4

3.40·1

0−4

3.38·1

0−4

3.64·1

0−4

4.60·1

0−4

3.8·

10−

44·1

0−5

(3.9·1

0−5)

(88)

(11)

Em

pir

ical

2.78·1

0−4

3.71·1

0−4

3.97·1

0−4

3.97·1

0−4

4.20·1

0−4

5.00·1

0−4

3.8·

10−

4−

2·1

0−5

(3.7·1

0−5)

(33)

(4.4

7)

θ 2=

md

iagn

Bay

esia

n1.

49·1

0−3

1.52

2.65

2.87

3.94

7.95

2-0

.87

(1.7

7)(3

6)(4

3.25

)E

mpir

ical

7.42·1

0−3

2.77

4.25

4.36

6.10

7.98

2-2

.36

(2.0

2)(1

3)(1

17.8

)

θ 3=

mre

gB

ayes

ian

0.01

91.

372.

162.

223.

064.

401.

1-1

.12

(1.0

9)(1

8)(1

01.5

)E

mpir

ical

0.01

31.

332.

212.

253.

184.

391.

1-1

.15

(1.1

1)(1

8)(1

04.7

)

θ 4=

md

ist

Bay

esia

n0.

071

3.59

5.62

5.76

8.02

11.1

82.

8-2

.96

(2.7

2)(1

6)(1

05.5

)E

mpir

ical

0.00

113.

245.

735.

638.

1311

.20

2.8

-2.8

3(3

.00)

(22)

(101

.2)

∗P

erce

nti

leof

the

pre

dic

tive

dis

trib

uti

on

,th

efi

xed

valu

eco

rres

pon

ds

to.

Tab

le3.

6:Sum

mar

yst

atis

tics

ofth

eca

libra

ted

MSM

par

amet

ers.

105

Figure 3.6: Contour plots depicting the bivariate parameter distributions of theBayesian calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05 of the bivariatedistribution.

106

Figure 3.7: Contour plots depicting the bivariate parameter distributions of theEmpirically calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05 of the bivariatedistribution.

107

3.5.2 Predictions

The marginal distributions of the predicted lung cancer incidence rates from both

methods include the respective calibration targets in their range (figures 3.8, 3.9).

Moreover, predictions from the Bayesian MSM include the calibration targets in

their IQRs for both internal and external validation, the only exception being the

“60-80yrs”age group (table 3.7). On the contrary, calibration targets lie outside the

respective IQRs of the predictions from the empirically calibrated MSM, the only ex-

ception being the “>80yrs” age group in the external validation case. Consequently,

although there is a large overlap between the two methods regarding the ranges of

the predicted lung cancer incidence rates by age group (table 3.7), the respective dis-

tributions are very different (KS-test p.value<0.001 in every age group). Predicted

values from the Bayesian calibrated model are more dispersed than those from the

Empirical one, while the bias of the methods varies across the age groups and the

type of validation. However, both calibrated models overall predict better the lung

cancer incidence in the “>80yrs” group, i.e., the group with more cases in it.

As already described in section 3.4.4, the predictions from each calibrated model

resulted from running the model M=50 times for each of the V=1000 calibrated

parameter vectors (Θ matrices), given a specific input sample SN . In the case of

the internal validation SN=smpl.C5000, i.e., the sample used in the calibration pro-

cedure, while in the external validation SN=smpl.V5000, i.e., another sample of the

same size N=5000. Both input samples are extracted from the simulated 1980 US

population (N=100,000). We calculated the MAD and MSD discrepancies measures

for the calibrated MSMs under four different scenarios depending on the input sam-

ple used (internal and external validation), as well as the reference point (Yclbr or

Yfix, section 3.4.3). Table 3.8 depicts the predictions involved in calculations of the

MAD and MSD discrepancy measures presented in table 3.9. Note here that, when

108

comparing MSMs results with Yclbr, predictions involved in the calculations resulted

given different MSM input samples. However, when Yfix is the reference point, in

the internal validation predictions refer to the same input sample (smpl.C5000), while

in the external validation predictions refer to samples of the same size (N=5000).

According to the overall MSM and MAD values (table 3.9) when comparing the

predictions to the calibration targets (Yclbr), it is unclear which method outperforms

the other. However, noteworthy is the fact that, when looking at the results by

age group, the Bayesian calibrated MSM predicts lung cancer incidence better than

the Empirically calibrated one, for younger people (”<60yrs”), i.e., for the group

with fewer observed cases in it. This finding holds for both internal and external

validation and indicates that the Bayesian method results in a set of values for the

model parameters that, when using as MSM input, lead to better predictions of rare

events.

When it comes to deviations from Yfix, the Empirically calibrated MSM overall

results in smaller discrepancies than the Bayesian one. This finding, in conjunction

with the note that predictions in this case refer to input samples that are either the

same (internal validation) or of the same size (external validation), suggests that the

Bayesian method is probably more robust to the sample of baseline characteristics

used as input in the calibration procedure.

To better understand this, remember that θfix is a vector of ad-hoc values for the

model parameters, therefore independent of the the input samples used for the pre-

dictions. The matrices ΘBAYES and ΘEMP, on the other hand, depend on the input

sample (smpl.C5000) used in the calibration procedure. Furthermore, the predic-

tions obtained by the MSM depend on the structure of the model, which remain

unchanged, the parameter values and the input sample used. According to Table

3.8, the predictions obtained from each model, depend on the matrices of calibrated

109

values Θ. In addition, in the internal validation case, these predictions also depend

on the input sample (smpl.C5000) used in the calibration procedure, while in the ex-

ternal validation case they depend on a slightly different input sample of the same

size (N=5000), from the same reference population (smpl.V5000).

Therefore, the proximity between the MSM predictions and Yfix provides an indi-

cation of how strongly the results of each calibration method (ΘBAYES and ΘEMP)

depend on the input sample used in the calibration procedure. The stronger this

relationship is, namely the closer the MSM predictions are to the reference vector

Yfix, the less “robust” the method is to the input sample used in the calibration

procedure.

Looking at the multivariate version of the aforementioned four sets of comparisons

and the respective discrepancy measures (figure 3.10), we have a clearer idea of the

combined deviation of the MSM predictions from the reference vectors. According to

the Euclidean distance predictions from the Empirically calibrated MSM are consid-

erably closer to the reference vectors compared to the those from the Bayesian model

in all cases (internal and external validation). This finding was expected because,

according to the respective univariate distributions, (table 3.7, figures 3.8-3.9), pre-

dictions from the Bayesian MSM are much more dispersed than the ones from the

Empirical model in the ”60-80yrs” and ”>80yrs” age groups. Furthermore, although

predictions in the ”<60yrs” group are less dispersed, and centered around the cali-

bration target, this is not reflected in the Euclidean distance, since this measure does

not take into account the relative magnitudes of the quantities of interest.

The Mahalanobis distances change the overall conclusions a lot. According to this

measure, the Bayesian calibrated MSM seems to perform equally well in all instances,

and only marginally better when comparing predictions with Yfix in the external val-

idation case, compared to the Empirically calibrated model. This finding essentially

110

reflects the fact that the superiority of the Bayesian MSM in the “< 60yrs age group

essentially rules out with the better predictions of the Empirical MSM in the other

two age groups, as indicated by the univariate discrepancy measures applied for the

internal validation of the model (table 3.9). On the contrary, in accordance with the

univariate analysis, the Mahalanobis distance suggest that the predictions from the

Empirical MSM are closer to Yfix than respective ones from the Bayesian model.

The calibration graphs (figure 3.11) plot the average predicted values by age group

for each one of twenty different samples (of size N=5000 each) used as input in the

MSM model. As it was expected, these numbers lie on a straight line, denoting that

the results from the implementation of the two calibrated MSMs on the same input

results in analogous outcomes.

The box-plots (figure 3.12) and the respective summary statistics (table 3.10) are

in accordance with the conclusions from the density plots, i.e., the indicate that,

overall, the Empirical methods leads to less dispersed predictions. Noteworthy is

also the fact that, looking at the medians, the predictions from the Empirical MSM

are constantly higher than those from the Bayesian model. However, the Bayesian

calibrated MSM tends to make more accurate predictions (medians closer to the

respective calibration targets) for the ”<60yrs” and ”>80yrs” age groups.

111

Figure 3.8: INTERNAL VALIDATION: Density plots depicting the marginal dis-tributions of the predicted lung cancer incidence rates (cases/100,000 person·years)

by age group, compared to calibration targets Yclbr= M100(θfix, smpl100,000), and

Yfix = M2000(θfix, smpl.C5000). [KL-dist: Kullback-Leibler distance]

112

Figure 3.9: EXTERNAL VALIDATION: Density plots depicting the marginal dis-tributions of the predicted lung cancer incidence rates (cases/100,000 person·years)

by age group, compared to calibration targets Yclbr=M100(θfix, smpl100,000), and

Yfix=M2000(θfix, smpl.C5000). [KL-dist: Kullback-Leibler distance]

113

INTERNAL Validation EXTERNAL Validation

SummaryBayesian Empirical Bayesian Empirical

statistics < 60 years oldMin 14.05 31.22 15.24 30.48Q1 32.70 44.97 32.40 44.02

Median 39.87 49.8 39.55 48.93Mean±Sd 39.36±9.19 49.8±6.61 38.8±9.05 49.0±6.42

Q3 45.94 54.7 45.35 53.7Max 66.32 69.7 64.44 68.9

Target value 41Bias 1.64 -8.79 2.20 -8.00(%) (4) (21.4) (5.4) 19.5

60-80 years oldMin 208.9 313.6 212.5 307.1Q1 308.1 358.1 301.2 350.5

Median 342.2 373.6 335.2 365.7Mean±Sd 336.6±40.45 372.9±19.77 329.4±40.46 365.1±19.72

Q3 369.2 387.6 361.1 380.0Max 426.1 423.1 425.1 415.1

Target value 391Bias 54.4 18.1 61.6 25.9(%) (13.9) (4.63) (15.8) (6.6)

>80 years oldMin 370.6 383.8 361.6 389.3Q1 433.4 458.9 423.2 447.9

Median 458.8 476.4 449.5 467.2Mean±Sd 465.0±41.72 476.1±26.70 453.8±40.46 465.7±26.37

Q3 494.5 495.2 482.5 483.5Max 622.9 562.0 568.0 556.3

Target value 464Bias -1.0 -12.1 10.2 -1.7(%) (0.2) (2.6) (2.2) (0.4)

Bias(%): deviation of the mean from the target value of the calibration procedure

Table 3.7: Summary statistics of the predicted lung cancer incidence rates by agegroup, by implementing the MSM on both the calibration and validation input sam-ple.

114

Figure 3.10: Mahalanobis distances distributions of the calibrated MSM predictionsfrom Yclbr and Yfix (internal and external validation).

115

Figure 3.11: Calibration plots.

116

Figure 3.12: Box plots.

117

Internal ValidationBayesian Calibration Empirical Calibration

Yclbr = M100(θfix, smpl100,000)M50(ΘBayes, smpl.C5000) M50(ΘEmp, smpl.C5000)

Yfix = M2000(θfix, smpl.C5000)

External ValidationBayesian Calibration Empirical Calibration

Yclbr = M100(θfix, smpl100,000)M50(ΘBayes, smpl.V5000) M50(ΘEmp, smpl.V5000)

Yfix = M2000(θfix, smpl.C5000)

Table 3.8: Comparisons: Predictions vs reference points involved in the calculationsof the MAD and MSD discrepancy measures for the two calibrated MSMs (table3.9).

3.6 Calibration Methods Refinement

Another very important finding is that, when applying the Pearson x2 GoF test, only

34.5% and 59.7% of the predictions from the Bayesian calibrated MSM “pass” the

test at a=95% and 99% respectively. The corresponding percentages from the Empir-

ical calibrated MSM are much higher, i.e., 77.8% and 98.8% respectively. Analogous

findings have in the case of the external validation of the models with the percent-

ages of predictions satisfying the GoF test being 31.4% and 54.3% for the Bayesian

method, and 73.1% and 96.8% for the Empirical one. This interesting note drove

the conduct of a complementary sub-analysis, based on N=100 random draws from

the sets of calibrated parameter values (along with their predictions) “passing” the

95% GoF test, from each method.

The results from this supplementary analysis are somewhat different from the main

analysis. The most prominent differences, as it was expected, are related to the

performance of the Bayesian calibrated model. The distributions of the calibrated

parameters as well as the predictions resulted from this model are much less dis-

persed compared to the full analysis. The posterior distribution of θ2=mdiagn is

118

Inte

rnal

Val

idat

ion

Bay

esia

nC

alib

rati

onE

mpir

ical

Cal

ibra

tion

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

Ycl

br

MA

D0.0

400

0.13

920.0

220

0.0

600

0.21

450.0

464

0.02

610.

0957

MSD

0.0

518

0.03

060.

0081

0.03

000.

0720

0.0

047

0.0

040

0.0

269

Yfi

xM

AD

0.21

280.0

465

0.0

288

0.09

600.0

041

0.05

630.

0534

0.0

379

MSD

0.07

900.

0153

0.00

930.

0345

0.0

175

0.0

063

0.0

063

0.0

100

Exte

rnal

Val

idat

ion

Bay

esia

nC

alib

rati

onE

mpir

ical

Cal

ibra

tion

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

Ycl

br

MA

D0.0

563

0.15

750.

0221

0.0

777

0.19

510.0

663

0.0

037

0.08

84M

SD

0.0

516

0.03

550.

0082

0.03

180.

0626

0.0

069

0.0

032

0.0

242

Yfi

xM

AD

0.22

400.

0668

0.0

039

0.09

820.0

200

0.0

342

0.03

030.0

282

MSD

0.08

300.

0176

0.00

820.

0362

0.0

169

0.0

043

0.0

043

0.0

085

Tab

le3.

9:M

easu

res

ofdis

crep

ancy

toas

sess

over

allM

SM

’spre

dic

tive

per

form

ance

.M

AD

san

dM

SD

sof

model

’spre

dic

tion

sfr

omY

clb

r=M

100(θ

fix,s

mpl 1

00,0

00),

and

Yfi

x=M

2000(θ

fix,s

mpl.C

5000).

[Bol

dnum

ber

sin

dic

ate

the

met

hod

wit

hth

esm

alle

rdis

crep

ancy

].

119

Method Summary statisticsMin Q1 Median Q3 Max

< 60 yrsBayesian 16.3 32.9 40.1, 46.2 65.3Empirical 32.2 45.1 50.1 55.0 68.8

60-80 yrsBayesian 215.7 297.7 330.3 355.9 420.5Empirical 304.5 345.6 360.1 374.7 410.7

> 80 yrsBayesian 363.0 436.0 463.2 496.7 584.3Empirical 408.6 461.2 480.2 497.4 548.8

Table 3.10: Mean values of the main summary statistics (minimum, maximum andquartiles) of the predicted lung cancer incidence rates by age group for 20 differentMSM input samples (figure 3.12).

now centered around the respective fixed value. Predictions are improved for the

“60-80’yrs’ age group. The Bayesian calibrated MSM still performs better when it

comes to rare events, while now the overall performance of this model is better than

the Empirical one (table 3.13) when predictions are compared with the calibration

targets.

120

Figure 3.13: Sub-analysis: Density plots comparing the marginal distributions of thecalibrated MSM parameters between the two calibration methods.

121

Met

hod

θ 1=

mM

inQ

1M

edia

nM

ean

Q3

Max

Fix

edva

lue

Dev

iati

on(±

SD

)(P∗ k)

(%)

Bay

esia

n3.

02·1

0−4

3.45·1

0−4

3.65·1

0−4

3.65·1

0−4

3.81·1

0−4

4.50·1

0−4

3.8·

10−

41.

5·1

0−5

(2.6

4·1

0−5)

(72)

(3.9

)E

mpir

ical

3.15·1

0−4

3.71·1

0−4

3.94·1

0−4

3.95·1

0−4

4.21·1

0−4

4.72·1

0−4

3.8·

10−

4−

1.51·1

0−5

(3.8·1

0−5)

(30)

(3.9

7)

θ 2=

md

iagn

Bay

esia

n0.

1058

1.12

1.83

1.74

2.26

3.93

20.

26(0

.82)

(59)

(13)

Em

pir

ical

7.42·1

0−3

2.74

4.60

4.41

6.33

7.98

2-2

.41

(2.2

4)(1

6)(1

20.5

)

θ 3=

mre

gB

ayes

ian

0.40

21.

412.

062.

162.

924.

271.

1-1

.06

(0.9

6)(1

3)(9

6.4)

Em

pir

ical

0.31

61.

372.

142.

213.

094.

351.

1-1

.11

(1.0

9)(1

6)(1

01)

θ 4=

md

ist

Bay

esia

n0.

157

3.76

5.70

5.58

7.68

10.7

2.8

-2.7

8(2

.74)

(17)

(99.

3)E

mpir

ical

0.09

42.

825.

905.

497.

8311

.02.

8-2

.69

(3.0

0)(2

5)(9

6.1)

∗P

erce

nti

leof

the

pre

dic

tive

dis

trib

uti

on

,th

efi

xed

valu

eco

rres

pon

ds

to.

Tab

le3.

11:

Sub-a

nal

ysi

s:Sum

mar

yst

atis

tics

ofth

eca

libra

ted

MSM

par

amet

ers.

122

Figure 3.14: Sub-analysis: Contour plots depicting the bivariate parameter distribu-tions of the Bayesian calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05 ofthe bivariate distribution.

123

Figure 3.15: Sub-analysis: Contour plots depicting the bivariate parameter distribu-tions of the Empirically calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05of the bivariate distribution.

124

Figure 3.16: INTERNAL VALIDATION (sub-analysis): Density plots depict-ing the marginal distributions of the predicted lung cancer incidence rates(cases/100,000 person·years) by age group, compared to calibration targets Yclbr=

M100(θfix, smpl100,000), and Yfix=M2000(θfix, smpl.C5000).

125

Figure 3.17: EXTERNAL VALIDATION (sub-analysis): Density plots depict-ing the marginal distributions of the predicted lung cancer incidence rates(cases/100,000 person·years) by age group, compared to calibration targets Yclbr=

M100(θfix, smpl100,000), and Yfix=M2000(θfix, smpl.C5000).

126

INTERNAL Validation EXTERNAL Validation

SummaryBayesian Empirical Bayesian Empirical

statistics < 60 years oldMin 37.24 37.24 34.73 37.11Q1 42.96 45.31 41.99 44.73

Median 45.74 50.04 45.08 48.18Mean±Sd 46.35 ±5.09 49.9±6.27 45.61±4.85 49.0±5.77

Q3 50.00 54.4 48.32 53.2Max 59.83 61.9 61.06 60.7

Target value 41Bias -5.35 -8.9 -4.61 -8.04(%) (13) (21.7) (11.2) 19.6

60-80 years oldMin 344.5 344.9 339.7 342.8Q1 357.7 362.5 355.6 355.7

Median 367.2 374.9 362.6 366.3Mean±Sd 368.3±13.71 364.6±13.4 329.4±40.46 368.1±15.53

Q3 379.0 387.8 372.4 379.8Max 401.6 409.2 398.4 402.7

Target value 391Bias 22.7 15.6 26.7 22.9(%) (5.8) (3.99) (6.8) (5.9)

>80 years oldMin 421.6 433.5 428.3 424.5Q1 460.2 460.3 460.1 459.2

Median 481.6 480.0 480.0 472.1Mean±Sd 480.1±24.67 476.8±20.77 478.3±23.68 472.0±17.22

Q3 498.2 495.0 496.0 482.6Max 522.2 514.8 521.7 518.2

Target value 464Bias -16.1 -12.8 -14.3 -8.0(%) (3.5) (2.8) (3.1) (1.7)

Bias(%): deviation of the mean from the target value of the calibration procedure

Table 3.12: Sub-analysis: Summary statistics of the predicted lung cancer incidencerates by age group, by implementing the MSM on both the calibration and validationinput.

127

Inte

rnal

Val

idat

ion

Bay

esia

nC

alib

rati

onE

mpir

ical

Cal

ibra

tion

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

ycl

br

MA

D0.1

305

0.05

820.

0347

0.0

745

0.21

580.0

400

0.0

276

0.09

45M

SD

0.0

323

0.00

460.

0040

0.0

136

0.06

980.0

034

0.0

027

0.02

53

yfi

xM

AD

0.07

300.0

432

0.06

230.

0595

0.0

029

0.06

330.0

548

0.0

403

MSD

0.01

560.0

034

0.00

680.

0086

0.01

560.

0062

0.0

051

0.0

089

Exte

rnal

Val

idat

ion

Bay

esia

nC

alib

rati

onE

mpir

ical

Cal

ibra

tion

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

<60

yrs

60-8

0yrs

>80

yrs

Ove

rall

ycl

br

MA

D0.1

125

0.06

830.

0307

0.0

705

0.19

610.0

584

0.0

173

0.09

06M

SD

0.0

266

0.00

580.

0035

0.0

119

0.05

810.0

049

0.0

017

0.02

16

yfi

xM

AD

0.08

770.0

320

0.05

810.

0593

0.0

192

0.04

290.0

443

0.0

355

MSD

0.01

700.0

024

0.00

610.

0085

0.0

136

0.00

380.0

034

0.0

069

Tab

le3.

13:

Sub-a

nal

ysi

s:M

easu

res

ofdis

crep

ancy

toas

sess

over

allM

SM

’spre

dic

tive

per

form

ance

.M

AD

san

dM

SD

sof

model

’spre

dic

tion

sfr

omyclbr

=M

SM

(sm

pl 1

00,0

00,θ

fix),

and

yfix

=M

SM

(sm

pl.C

5000,θ

fix).

128

3.7 Discussion

In this chapter we presented a comparative analysis of two calibration methods for

micro-simulation modeling. We implemented both methods in the free statistical

software, R. We discussed the computational considerations and compared the results

of the two calibrated MSMs.

The comparative analysis showed that the Empirical calibration method is much

more efficient, regarding the computational burden, since it can be orders of magni-

tude faster than the Bayesian one. This finding is also applicable when it comes to the

comparison of undirected to any directed calibration method due to the structural

similarities those methods respectively bare with the Empirical and the Bayesian

methods presented in this chapter. Furthermore, this chapter emphasizes on the im-

perative need for HPC techniques for calibrating any complicated predictive model

including MSMs.

The two methods produced very similar results with respect to the distributions of

the calibrated MSM parameters, resulted in analogous correlation structures, and

raised the same identifiability issues.

Predictions from the calibrated MSMs differ somewhat between the two methods.

The Bayesian MSM results in more dispersed predictions than the Empirical model,

although there are indications that it predicts better rare events. In addition, the

Bayesian method seems to be more robust to the input sample used in the calibration

procedure.

Finally the supplementary analysis reveals a remarkable improvement in the results

from the Bayesian MSM. This finding is suggestive of two things. First of all, more

work should be done on the collection of the parameter vectors from the Bayesian

129

calibration method (e.g., length of converged chains, sampling rule for each one of

them, etc). Second, the performance of the MSM can be considerably improved if

the Bayesian calibration method is followed by an additional step that would further

refine the collection of the final sets of vectors for the calibrated parameters. As the

supplementary analysis has shown, such an improvement could be achieved if, for

example, we choose a subset of vectors, for the MSM parameters, that provide good

fit of the model to observed data, according to some GoF criterion.

Future work will be towards a more detailed calibration of the streamlined MSM for

lung cancer developed in Chapter 2. We will aim at a complete calibration of the

MSM, so as to be able to predict individual trajectories for all possible combinations

of gender (male/female) and smoking status (never/former/current smokers). Fur-

thermore we envisage the expansion of the two calibration methods so as to account

for multiple calibration targets, i.e., being able to incorporate diverse information

from different stages of the natural history of lung cancer.

130

Figure 3.18: Flow chart of the implementation of the approximate MH algorithm ofthe Bayesian method to calibrate θ1.

131

Figure 3.19: Flow chart of the implementation of the Bayesian method to calibrateθ1. [A(θk)=π(θk) ·

∏Jj=1 fj(yj|λj)]

132

Chapter 4

Assessing the predictive accuracy of MSMs

The third chapter of this thesis is concerned with the assessment of the predictive

accuracy of MSMs, a quality characteristic that has not been studied in the literature

yet. The main outcome of interest for this assessment is individual predicted time to

event, thus our approach is based on techniques applied on survival modeling. We

propose a set of available concordance indices, typically used for the assessment of

the predictive accuracy of survival models. In addition, we study the ability of the

MSMs to predict times to events, and suggest use of hypothesis testing to compare

observed with predicted survival distributions. We implement the suggested methods

in order to assess and compare the predictive accuracy of the two calibrated MSMs,

resulted from the previous chapter, and we make recommendation on those that can

better capture the predictive quality of an MSM.

The chapter begins with background information on methods used for the assessment

of the predictive accuracy of complex models in general, as well as survival models

in particular. It continues with the description of the methods suggested for the

assessment of the predictive accuracy of an MSM. We further describe the simulation

study conducted in order to compare the performance of the suggested methods. For

the purposes of this study, we applied the methods on each one of the two calibrated

MSMs resulted from chapter 3. It follows a detailed analysis of the simulation results

133

accompanied with suggestions on the most appropriate method to be used under

certain circumstances. The chapter concludes with future work in the field.

4.1 Background

4.1.1 Assessment of MSMs

An integral part in the development of a new MSM, as in any predictive model, is

the assessment of the model’s predictive accuracy (92; 105). After having discussed

in detail the two major building blocks in the development of an MSM, i.e. model

specification and calibration, this chapter is concerned with this property of a the

model. Assessment of complex models in general contains the notions of model

validation (internal and external), sensitivity analysis, characterization of uncertainty

and predictive accuracy (92; 105).

The development of an MSM is typically accompanied by a validation analysis. For

example model validation may use empirical approaches (118; 3; 4; 65; 23), chi-

square (94; 18; 70) and likelihood statistics (94), as well as, posterior estimates of

model parameters, and posterior predictive distributions of model outcomes (90).

Validation has been discussed in detail in the previous chapter.

The assessment of uncertainty in MSMs, as in any other complex model, is also of

central concern, with a wide range of relevant references, from a brief introduction to

the problem of measuring uncertainty in complex decision analysis models (83), to the

development and implementation of complicated relevant methods. Such methods

include Bayesian approaches for characterizing uncertainty with emphasis to model

structure (12; 88), expression of patient heterogeneity and parameter uncertainty

(48; 55), applications of Probabilistic Sensitivity Analysis (PSA) (17; 7; 80; 81), etc.

In contrast to the assessment of uncertainty, the assessment of the predictive accuracy

134

of an MSM has not received a systematic attention in the literature. However,

the assessment of this quality characteristic is essential, since, as it is subsequently

noted, one of the most important goal of MSMs, is to accurately predict intervention

effects on individual level, and, consequently, on homogeneous sub-groups of patients.

The study, implementation, and suggestion on statistical measures for assessing the

predictive accuracy of MSM is the main objective of this chapter.

4.1.2 Predictive accuracy of MSMs

Micro-simulation models are broadly used to simulate entire populations with specific

characteristics and, often, under different hypothetical scenarios (interventions) (91).

The ultimate goal is to use these MSMs to make projections about the possible

evolution of the disease or even, when relevant, about the effect of an intervention

on the population, so as to inform health policy decisions (92).

However, there are also examples in the literature where, individual level data are

used to populate MSMs in order to test additional hypothesis or to enhance the

validity of the main findings of the study. McMahon et al. (2008), for instance,

populate the Lung Cancer Policy Model with individual level data from the Mayo

CT screening, single-arm trial, in order to simulate both the observed screening as

well as the missing control arm. They aimed in this way to compare original findings

from the Mayo CT study with estimates about lung cancer incidence and mortality

from a hypothetical control arm with perfectly matched baseline characteristics.

Henderson et al. (2001), on the other hand, emphasize the importance of accurate

point estimates, especially of the predicted survival times, mentioning, among oth-

ers, the effect this accuracy may have on administrating the most efficient treatment,

saving of valuable resources, as well as guiding personal decisions regarding the re-

maining lifespan of each individual. They also refer to other practical needs and

135

pressures imposed by the relevant Health system, which can be vitally assisted by

informed decisions based on accurate survival times predictions. These arguments

coincide with one of the main goals of comparative effectiveness research (CER),

namely the development of adequate methodology to study differences in treatment

response between sub-groups of patients, as well as the enhancement of informed

medical decisions on individual level basis (112; 25). MSMing comprises an essential

tool for predicting intervention effects on individuals, and, consequently, on homo-

geneous subgroups, hence can be an integral part of the conduct of CER studies.

The aforementioned examples of the use of MSMs to inform health decisions, point

out the need for methods to assess the predictive accuracy of MSMs. Perhaps, one

of the most important reasons for lack of references with relevant research, is that,

although very important, the prediction of accurate individual trajectories is a very

complicated task, the intricacy of which increases with the number of individual-level

characteristics involved. In this chapter we suggest methods from the literature that

could be used for the assessment of the predictive accuracy of this type of models.

The simulation study we conducted exemplifies the necessity of this methods in order

to compare two similar, “well” calibrated MSMs.

Predictive accuracy pertains to the ability of a model to correctly predict individ-

ual outcomes. Steyerberg et al. (105) provide an overview of traditional and novel

measures for assessing the performance of prediction models in general. The au-

thors categorize methods into three broad categories, namely, measures of explained

variation (R2-statistics), other quadratic scores of the proximity between predictions

and actual outcomes (GoF statistics such as MSE, Deviance, Brier score, etc.), and

measures of the model’s discrimination ability (C-statistics, ROC curves).

Measures of explained variation (R2-statistics), although very interesting, are hard

to derive in the context of MSM. Such an attempt would require systematic work on

136

identifying of all sources of uncertainty inherent in an MSM, as well as expression of

this uncertainty to the model outcomes. Research on this topic is part of the future

work related to this thesis. We also discussed GoF statistics in the previous chapter,

in the context of the calibration of an MSM. In that setting, we are mostly interested

in the comparison of the overall summary statistics predicted by the model and the

actual data (calibration data) found in the literature of lung cancer, to determine a

”well” calibrated MSM.

In this chapter we focus on the accuracy of individual MSM predictions. The reason

for that is that it is possible for a “good” MSM, according to some overall GoF mea-

sures, to perform poorly when it comes to individual predictions. The streamlined

MSM, for example, may predict lung cancer incidence rate very close to the calibra-

tion target for a specific age group. However, the individuals, for which the MSM

predicted lung cancer, may differ considerably from those who actually did develop

lung cancer.

Depending on the outcome of interest (e.g., continuous, ordinal, binary or survival

data), as well as the type of model’s predictions (e.g., prediction of the actual out-

come, risk score, survival probability, etc) the predictive performance of an MSM

can be assessed using a variety of statistical measures. Since MSMs are designed to

predict individual patient trajectories, and in order to exploit the most comprehen-

sive predicted information, in this chapter we naturally consider MSMs as a special

type of survival models.

Assessing the predictive ability of survival models is a more complicated task than

in models for binary outcomes, such as logistic regression models. The complexity

problem in survival data analysis is due to the presence of censored observations

for which the information about the event of interest is missing. The only thing

known, for these observations, is that, up to the censoring time the subject had not

137

experienced the event of interest. The assessment of the performance of survival

models usually entails comparison of the predicted risk (rather than the predicted

survival times) with the observed ones, usually given a set of covariates. The reason

for this is that predicted survival times are not readily available for this type of

models.

Several measures for the assessment of the predictive accuracy of a survival model

have been suggested in the literature (46; 42; 100; 2; 9; 32; 93). An important class

of measures is that of concordance statistics (C-statistics), which focus on discrimi-

nation, namely the desired property of the model to correctly classify subjects, given

a set of covariates, based on the predicted risk (57; 46).

The most widely used index, due to its simplicity, is the C-index proposed by Harrell

et al. (1996). Pencina and D’Agostino (2004) study the statistical properties of C

and show the relationship between this index and the modified Kendall’s τ . Similar

indices were studied by Gonen and Heller (2005), for the evaluation of Cox propor-

tional hazards models, and Uno et al. (2011). The later is applicable to any type of

survival models that provide an explicit form of the predicted risk as a function of

the model parameters and covariates.

A common characteristic of the C-statistics, proposed for the assessment of a survival

model, is that they all are based on comparisons between actual survival status and

predicted risk score, a closed form expression of which is obtained from the model.

The main reason is that actual predicted survival times are not readily available

from these commonly used survival models, but they rather require some further

processing of the predicted risk, entailing a certain amount of subjectivity in the final

prediction. Furthermore, most of these models (proportional hazards and accelerated

failure time) imply a one-to-one correspondence between the predicted risk and the

expected survival times, therefore, these two quantities can be used interchangeably

138

to express a concordance relationship between observed and predicted outcomes.

Unlike most of the broadly used survival models, MSMs can predict time to events

and censoring status given the baseline characteristics of each individual, rather

than simple risk scores at specific time points. Therefore, assessing MSM predictive

accuracy should not solely involve concordance measures, because, in this way, a

significant portion of the predicted information (the actual predicted survival times)

is ignored. Investigators should rather use discrimination in conjunction with other

measures quantifying the proximity between predictions and actual outcomes on

an individual level basis. Following this reasoning, we suggest here comparisons

between the predicted and the observed survival function as supplementary means

to concordance statistics for assessing the predictive accuracy of an MSM.

We have to note here that assessment of the predictive performance of commonly used

survival models (e.g., Cox proportional hazards) is also possible through comparison

of the observed with the predicted survival. However, a key issue in this assessment is

the methodology used for the estimation of the predicted survival from those models

(79; 78; 36; 45; 100; 32), especially when the model incorporates time-dependent

covariates. Since no readily available predictions are available from these models,

the predicted survival is subject to additional assumptions (modeling mechanism)

beyond those stipulated in the model specification procedure. Therefore, assessment1

of the predictive accuracy of such models depends, not only on the model itself, but

also on the method used for obtaining predicted survival. On the contrary, prediction

of survival times is usually an integral part of the outcome of an MSM (as is the case

with our streamlined MSM), therefore assessment of the predictive performance is

straightforward, and refers directly to the model itself and not some other external

estimation procedure.

1 A systematic review of methods used for the assessment of the predictive performance of riskprediction models can be found in Gerds et al. (2008)

139

A variety of statistics for comparing survival functions is available in the literature.

They include a set of tests based on the comparison of weighted Kaplan-Meier esti-

mates of the survival functions, such as the Log-Rank test (21), and tests based on

the weighted differences of the Nelson-Aalen estimates of the hazard rate, such as

the tests by Gehan (1965), Breslow (1970), Tarone and Ware (1977). These tests,

although very popular, are not very powerful to detect differences in crossing hazards

situations. A class of statistics that has been proposed to amend this shortcoming,

includes the Renyi-type and the Cramer Von Mises statistics. A detailed account of

the statistics used in this chapter for the comparison of the two survival curves can

be found in the Klein and Moeschberger (2003) survival analysis book.

In the following sections we describe in detail the statistics proposed for the assess-

ment of the predictive accuracy of an MSM, as well as the conduct of a simulation

study for the comparison of those methods in an MSM setting.

4.2 Methods

4.2.1 Notation

In order to describe the statistics suggested in this chapter for the assessment of the

predictive accuracy of an MSM, we have to introduce some special notation.

Let X1, X2, ..., XN , and X1, X2, ..., XN the observed and the predicted event

times respectively, and Z1, Z2, ..., ZN , p×1 vectors of covariates in a sample of N

individuals. In our case, where the objective is to predict individual trajectories using

the MSM for lung cancer, the covariates comprise age, gender and smoking history

of each individual. Let also Ti be the actual survival times and Di the corresponding

censoring variable, i.e, the time at which the subject is censored. We assume that D

is independent of T and Z. Let {(Ti, Zi, Di), i=1, ..., N} be N independent copies of

140

{(T, Z, D)}. For each individual i we only observe (Xi, Zi, ∆i) where Xi=min(Ti,

Di) and ∆i =

1, if Xi= Ti

0, otherwise.

Furthermore, when comparing the survival between two samples, t1, t2, ..., tK denote

the distinct event times in the pooled sample, Ykj the number of individuals at risk,

and qkj the total number of events, observed in sample j at time tk, where k=1,2,...,K.

In addition Yk =∑2

j Yjk, and qk =∑2

j qjk are the total number of individuals at risk

and total number of events respectively, in the pooled sample at time tk. Following

this notation, the Kaplan-Meier estimator of the survival function, for example in

the pooled sample is:

S(t) =

1, if tk < t1∏tk≤t(1−

qkYk

), otherwise(4.1)

while the Nelson-Aalen estimator of the cumulative hazard is:

H(t) =

0, if tk < t1∑tk≤t

qkYk, otherwise

(4.2)

4.2.2 Concordance statistics

Definition Let (X1, T1), ... (XN , TN) be a sample of bivariate, continuous obser-

vations. The concordance (C) index for a pair of them, let say (X1, T1) and (X2, T2)

is defined in general as (84):

C = pr(T1 > T2|X1 > X2) (4.3)

The concordance index has been widely used, for the assessment of the predictive

accuracy of regression models for survival data. In this setting the C-index can take141

either of the following two forms:

C = pr(g(Z1) > g(Z2)|T1 < T2) (4.4)

or

C = pr(T1 < T2|g(Z1) > g(Z2)) (4.5)

where

Ti denotes the actual survival time and g(Zi) is some expression of the risk for the

ith individual as a function of the set Z of covariates.

In the first case (eq. 4.4), the concordance probability is defined conditionally on the

true value and can be considered an expression of the model’s sensitivity (i.e., the

probability the model correctly classifies the observations given the ”truth”). The

second form of the concordance probability expression is defined conditionally on

the test value and is analogous to the predictive value of a diagnostic test, in that

it expresses the probability of having a certain ordering in the observed times given

what the model predicts for these specific data. Most of the C-statistics for survival

models are developed based to estimate conditional probability presented in equation

4.4 (39; 113), while estimates of the other conditional probability are also discussed

in the literature (34)

The concordance index can be used to quantify one of the key aspects of the predictive

accuracy, namely the discrimination ability of a statistical model (105). It takes

values between 0.5 and 1. A C-index equal to 1 indicates perfect discrimination

ability while values of the index closer to 0.5 indicate poor discrimination ability of

the model.

142

Harrell’s index

Perhaps the most well-known, easy to compute and, therefore, broadly used measure

of the discrimination ability of a survival model, is the Harrell’s C-statistic (39). Let

consider all different pairs of subjects (i,j), i<j. A pair is said to be concordant if

(Xi < Xj and Xi < Xj) or (Xi > Xj and Xi > Xj). The overall C index suggested

by Harrell et al. (39) is defined as the proportion of all usable concordant pairs in

the sample. Every pair of subjects, at least one of whom had experienced the event

of interest, is usable. This index provides an estimate of the concordance probability

(eq. 4.4) as:

CH =

∑i 6=j ∆iI(Xi < Xj)I(Xi < Xj)∑

i 6=j ∆iI(Xi < Xj)(4.6)

Uno’s index

Uno et al. (113) focus on the estimation of a truncated version of the concordance

probability (eq. 4.4), i.e.:

C = pr(g(Z1) < g(Z2)|T1 > T2, T1 < τ) (4.7)

where τ is a pre-specified time point, the only restriction of which being that it should

be greater than the shortest censoring time observed. The truncation is introduced

to address the problem of the unstable estimation of the tail part of the survival

function.

Uno et al employ an ”inverse probability weighting” technique, (10), and propose a

non-parametric-estimate of the concordance probability. The most important fea-

ture of the Uno’s C-statistic is that, unlike Harrell’s index, it does not depend on

the study-specific censoring distribution. Using a simulation study, Uno et al. (2011)

show that this index is in general robust to the choice of τ and it performs most of

143

the times better or at least equally well to the Harrell’s index.

144

4.2.3 Hypothesis testing

The second set of methods proposed in this chapter for the assessment of the pre-

dictive accuracy of an MSM, comprise statistical tests for the comparison of the

predicted with the observed survival curve. In particular we compute the log-rank

statistic, a Renyi type statistic, and two different versions of a Cramer-von Mises

type statistic. Each of these statistics are used to test the null hypothesis H0 that

there is no difference in the survival distributions between the two samples (observed

versus predicted data).

Log-Rank statistic

We first apply the well known and broadly used log-rank test (85), which, following

the notation previously introduced, employs the statistic:

Z =

∑Kk=1(qk1 − Yk1

qkYk

)∑Kk=1

Yk1Yk

(1− Yk1Yk

)(Yk−qkYk−1

)qk(4.8)

which under the H0 has a standard normal distribution. The main limitation of this

test is that it does not perform very well in crossing hazard situations.

Renyi type tests

The Renyi type statistics aim at comparing two (or more) survival distributions in

a way analogous to the Kolmogorov-Smirnov test for uncensored data (54). These

statistics are more powerful to detect differences in crossing hazards situations. In

our case, we implement the “log-rank” version of this test. The statistic used for

testing the null hypothesis is:

Q =sup{|Z(t)|, t ≤ τ}

σ(τ)(4.9)

145

with

Z(tα) =∑tk≤tα

[qk1 − Yk1

(qkYk

)], α = 1, ..., K (4.10)

and

σ2(τ) =∑tk≤τ

(Yk1

Yk

)(Yk2

Yk

)(Yk − qkYk − 1

)qk (4.11)

where τ is the largest tk for which Yk1, Yk2 > 0.

The statistic Q under the null hypothesis can be approximated by the distribution

of sup{|B(x)|, 0 ≤ x ≤ 1}, where B is a standard Brownian motion process. Critical

values of Q can be found in relevant tables. The supremum of the absolute deviations

in the calculation makes the test more powerful than the simple log-rank test to detect

(existing) differences between two crossing survival curves.

Cramer-von Mises tests

The last two statistics used for the comparison between observed and predicted

survival belong to the Cramer-von Mises type of statistics, which are also analogue

of the Kolmogorov-Smirnov test for comparing two cumulative distribution functions

(54). Both statistics depend on the weighted squared differences between the Nelson-

Aalen estimates of the respective survival functions. The first statistic used for this

type of test, is defined as:

Q1 =

(1

σ2(τ)

)∑tk≤τ

[H1(tk)− H2(tk)

]2 [σ2(tk)− σ2(tk−1)

](4.12)

with t0=0, and the summation calculated over the distinct death times up to time

τ , which is the largest tk for which Yk1, Yk2 > 0, i.e., for that death time for which

146

there are still subjects at risk in both samples. Furthermore, H(tk) (j=1,2 for the two

samples, observed and predicted), is the Nelson-Aalen estimator of the cumulative

hazard function (section 4.2.1), with estimated variance:

σ2j =

∑tj≤t

qijYij(Yij − 1)

, j = 1, 2 (4.13)

The Q1 statistic is based on the difference between H1(t) and H2(t), the variance of

which is estimated as:

σ2(τ) = σ21(t) + σ2

2(t) (4.14)

The statistic of the alternative version of the Cramer-von Mises test applied in this

chapter, is defined as:

Q2 = n∑tk≤τ

[H1(tk)− H2(tk)

1 + nσ2(tk)

]2

[A(tk)− A(tk−1)] (4.15)

where,

A(t) =nσ2(t)

[1 + nσ2(t)]

Under the null hypothesis Q1 and Q2 approximately have the same distribution

with R1 =∫ 1

0[B(x)]2dx, where B(x) is a standard Brownian motion process, and

R2 =∫ A(τ)

0[B0(x)]2dx, where B0(x) is a Brownian bridge process respectively. The

critical values of these two processes are also provided in relevant tables.

Note here that there is some loss of power when using either of the two Cramer-von

Mises tests compared to the log-rank test (97). However, Q1 performs almost equally

well when the hazard rates of the two samples are proportional, while Q2 perform

better compared to the other tests, in the case of large early differences when the

hazards rates cross.

147

4.2.4 Simulation Study

The purpose of the simulation study conducted in this chapter is to implement and

compare the alternatives approaches, denote their differences, and make suggestions

about the most suitable ones to be used for the assessment of the predictive accuracy

of an MSM. To this end, the methods were used to assess and compare the predictive

accuracy of the two calibrated MSMs, obtained in Chapter Two. These two MSMs

have exactly the same structure and were calibrated to the same targets using two

different calibration methods, a Bayesian and an Empirical one. The two meth-

ods resulted in different MSMs with respect to the set of values for the calibrated

parameters.

As input we used a sample of N=5000 men (smpl.15000), current smokers, randomly

drawn from the 1980 US population (smpl100,000, Chapter 3). Note here, that this

sample is different from the one used for the implementation of the two calibration

methods (smpl.C5000). As in chapter II, the baseline characteristics taken into ac-

count for predicting trajectories are age, and smoking intensity, expressed as average

number of cigarettes smoked per day, for each individual.

For the assessment of the predictive accuracy of the MSM we need to know the truth,

namely if and when each person developed lung cancer. In the absence of real data on

the time of the development of lung cancer in the group used in the simulation study,

we simulated the truth. Specifically, we use two simplified “toy” models, which, given

only age, predict time to death and time to lung cancer diagnosis for each individual.

The first simplified model (truth model 1 toy.1) uses exponential distributions to

predict these to time points, while the second one (truth model 2 toy.2) uses Gumbel

distributions. The simulated truth about the censoring status is obtained from the

comparison of the two predicted times for each individual. For instance, if predicted

time to death is larger than predicted age to lung cancer diagnosis, the prediction

148

indicates that this person had the event otherwise it is censored at the age of death.

Ad hoc estimates of the exponential and the Gumbel distributions involved in these

simulations, were chosen so as overall lung cancer incidence rates by age group (i.e.,

<60, 60-80, and >80 years old) to approximate those reported in the 2002-2006

SEER data.

We apply these two “toy” models on the input data (smpl.11000), in order to simu-

late the truth about the age at the development of lung cancer for each individual.

Subsequently the same sample is used as input to each of the two calibrated MSMs,

resulted from chapter II, in order to also predict lung cancer incidence. The compar-

isons between the predictions and the simulated “truth”, will provide an indication

about the adequacy of each proposed method to assess the predictive performance

of an MSM.

As indicated in section 3.2.4, the results from each calibration method is a set of

V=1000 vectors for the four MSM parameters, calibrated in the previous chapter. A

single run of the MSM pertains to the implementation of the model once, in order

to make predictions (one trajectory for each individual) about the input sample of

interest, given one vector of parameter values. In tables we present summary results

of the model’s performance for different number V of parameter vectors (i.e., V=200,

400, 600, 800, and all 1000). In this way we are also able to investigate the effect

the total number of microsimulations has on the final conclusions from the applied

statistics.

149

4.3 Results

4.3.1 Single run of the MSM

For V=1 we present Kaplan-Meier curves of the predicted against the observed sur-

vival functions. We also provide estimates of the suggested measures for assessing

the MSM’s predictive accuracy. Test statistics are accompanied by the respective

p-values.

The results from the implementation of the assessment methods on the MSMs, using

only one vector of calibrated parameter values, indicate that simulated “observed”

lung cancer survival using the first toy model (toy.1, exponential distributions) is

very close to the predictions from both models (Figures 4.1 and 4.2), although the

survival functions resulted from the predictions of the Bayesian calibrated MSM,

crosses with the observed survival.

Table 4.1: Assessment of the predictive accuracy of the two calibrated MSMs: Pre-dicted versus simulated (from toy.1 model) survival.

Method Calibrated MSMBayesian Empirical

Harrel’s index 0.779 0.754

Uno’s indexτ = 100 0.641 0.568

τ = 80 0.733 0.710

Log-Rank x2 7.313 3.013(p-value) (0.00685) (0.0826)

Renyi test Q 4.03 2.11(p-value) (< 0.01) (0.06)

Cramer-von Mises

Q1 0.654 2.26(p-value) (>0.01) (< 0.025)

Q2 1.66 0.326(p-value) (<0.02) (>0.1)

150

Figure 4.1: Kaplan-Meier curves of the predicted versus the observed (simulated bythe first toy model) survival.

151

The proximity between the predicted and the observed survival is also verified by

most of the statistics applied for the assessment of the model (Table 4.1). The C-

statistics are similar for the two models, with slightly higher values for the Bayesian

model. Also the log-rank, Renyi type and Cramer-von Mises (Q2) tests, all reject

the null hypothesis for the predictions from the Bayesian model but do not reject

for those from the Empirically calibrated MSM at α = 5%. However, we draw the

opposite conclusions when looking at the Q1 statistic, according to which, observed

survival is similar with the predicted one from the Bayesian model but differs from

the one predicted by the Empirical MSM. The reason for this is probably because, as

already mentioned, Q2 performs better than the other tests in cases like this, namely

when the hazard rates cross and we observe relative large, early differences among

them.

When it comes to the comparison of the predictions with the simulated truth from

the second toy model (figure 4.2), observed survival is very close to the predicted

one from the Bayesian model, but differs considerably from the predicted survival

by the Empirically calibrated MSM. This difference, apparently cannot be captured

by neither of the C-statistics applied, since the respective estimates are very close

for the two models (table 4.2). On the contrary, this difference is reflected on the

results from all the statistical tests (log-rank, Renyi type, and Cramer-von Mises).

None of these tests rejects the null hypothesis for the Bayesian model, while they all

reject it for the Empirically calibrated model, at least at α = 5% significance level.

152

Figure 4.2: Kaplan-Meier curves of the predicted versus the observed (simulated bythe second toy model) survival.

Table 4.2: Assessment of the predictive accuracy of the two calibrated MSMs: Pre-dicted versus simulated (from toy.2 model) survival.

Method Calibrated MSMBayesian Empirical

Harrel’s index 0.799 0.796

Uno’s indexτ = 100 0.762 0.719

τ = 80 0.807 0.790

Log-Rank x2 0.027 18.52(p-value) (0.869) (<0.0001)

Renyi test Q 1.894 4.317(p-value) (0.110) (< 0.01)

Cramer-von Mises

Q1 0.724 2.853(p-value) (>0.01) (< 0.01)

Q2 0.325 1.318(p-value) (> 0.01) (< 0.02)

153

4.3.2 Multiple runs of the MSM

We also assessed the predictive accuracy running each of the two calibrated MSMs

multiple times, i.e., for multiple vectors V of values for the calibrated parameters.

In particular, we run each MSM for five different cases, namely for V=200, 400, 600,

800, and 1000 vectors of parameter values, in order to also investigate the effect the

total number of MSM runs has on the results from this assessment. We compare

predictions with simulated truth from both toy models. For each case we provide

Kaplan-Meier estimates of the predicted versus the observed survival probabilities.

We further provide summary statistics to describe the results from the application of

each statistical method for the assessment of the predictive accuracy of the model.

In particular, we report means and standard deviations of the concordance statistics

(Harrell’s and Uno’s index) from V implementations of each of these measures on

the MSM predictions. Furthermore, for the statistics comparing the observed with

the predicted survival we report the percentage of times the test has not rejected

the H0 at α = 5%, i.e., the hypothesis that the predicted survival is the same as the

“observed” (simulated) one.

According to the produced graphs (Kaplan-Meier curves in figures 4.3 to 4.12), as

well as the respective tables with the summary statistics (tables 4.3 and 4.4) from the

implementation of the methods, that are suggested in this chapter for the assessment

of the predictive accuracy of an MSM, the total number V of MSM runs, does not

appear to affect the final conclusions. Apparently, even a V=400 appears adequate

to draw safe conclusions about the predictive accuracy of the two MSMs, calibrated

in the previous chapters.

The simulated true survival, simulated by the first toy model, lies within the range of

the predictions from both MSMs, for all five cases (i.e., for V=200, 400, 600, 800, and

1000). This means that, overall, the individual predictions from the two models are

154

very close to the observed survival, resulted from the first toy model. This proximity

between the two survivals is reflected on the summary statistics of all the methods

suggested in this chapter (table 4.3).

The estimates of the Harrell’s and Uno’s index are almost identical for the two

models. The results from the applied tests are also very close for the two MSMs with

a small difference between the non-rejection of the H0 rate, in favor of the Bayesian

calibrated MSM according to the first three tests. However, when looking at the

Cramer-von Mises Q2 test, the difference between the non-rejection rates is bigger

and reversed, namely in favor of the Empirically calibrated MSM. This finding is in

line with the characteristics of this specific test. As already mentioned, Q2 performs

well when there is a large early difference in the hazard rates. The Kaplan-Meier

plots reveal much more dispersed predicted survival curves earlier time points for the

Bayesian compared to the Empirically calibrated MSM, consequently the difference

between predicted and observed survival is larger at those points for the Bayesian

MSM. This difference is reflected on the results from the implementation of the Q2

test.

155

Figure 4.3: Kaplan-Meier curves of the predicted (for V=200 vectors of the calibratedMSM parameters) versus the observed (simulated by the first toy model) survival.


156



157

Figure 4.7: Kaplan-Meier curves of the predicted (for V=1000 vectors of the cal-ibrated MSM parameters) versus the observed (simulated by the first toy model)survival.

In the second example we compare the predictions with the “true” survival, simulated

using the second model. According to figures from 4.8 to 4.12 the observed survival

curve, although marginally, lies within the range of the predicted survival curves

from the Bayesian calibrated MSM. This is not the case for the Empirically calibrated

MSM, for which a considerable part of the observed survival curve lies above the range

of the predicted ones. This is an example of a possible scenario, where two “well”

calibrated MSMs, i.e., two MSMs almost equivalent according to some overall GoF

measures, differ considerably when it comes to the individual predicted trajectories.

The estimates of both C-statistics are almost identical for the two models, thus indi-

cating that a concordance index cannot capture adequately the differences between

the predicted and the observed survival noted in the Kaplan-Meier curves. On the

contrary, the results from all the statistical tests of the two survival functions are

very different between the two models, indicating that the Bayesian calibrated MSM

is more accurate than the Empirically calibrated one. The difference between the

158

Table 4.3: Assessment of the predictive accuracy of the two calibrated MSMs com-pared to the simulated truth from toy model 1: Summary statistics of the estimatesof six different predictive accuracy measures.

Bayesian Calibrated MSM

C-statistic (mean±sd)* Test (%)**

VCramer -

Harrell Uno Log-Rank Renyi - von Mises(Z) (Q) (Q1) (Q2)

200 0.7808±0.0099 0.6746±0.0605 80.50 80.00 49.50 79.00

400 0.7806±0.0095 0.6740±0.0560 79.75 82.75 52.75 83.00

600 0.7804±0.0096 0.6740±0.0559 80.33 82.17 52.50 82.83

800 0.7801±0.0096 0.6740±0.0555 80.50 83.25 50.88 84.88

1000 0.7802±0.0095 0.6741±0.0557 80.00 82.40 53.60 84.10

Empirically Calibrated MSM

C-statistic (mean±sd)* Test (%)**

VCramer -


200 0.7804±0.0092 0.6683±0.0587 73.50 73.00 49.00 98.50

400 0.7794±0.0090 0.6730±0.0567 71.25 71.00 48.00 98.50

600 0.7791±0.0089 0.6729±0.0555 71.50 73.17 45.50 99.00

800 0.7787±0.0090 0.6722±0.0546 71.13 73.25 45.13 99.13

1000 0.7787±0.0090 0.6718±0.0548 71.30 73.50 45.00 98.60

* Means and standard deviations of the C-indices estimates, from the V

implementations.

**Percentage of times, in the V implementations, that the test did not reject

the H0 at α = 5%.

159

two models is more prominent when looking at the results from the log-rank test,

and smaller based on the results from the implementation of the Cramer-von Mises

Q1 test.

Figure 4.8: Kaplan-Meier curves of the predicted (for V=200 vectors of the calibratedMSM parameters) versus the observed (simulated by the second toy model) survival.

Figure 4.9: Kaplan-Meier curves of the predicted (for V=400 vectors of the calibratedMSM parameters) versus the observed (simulated by the second toy model) survival.

160

Figure 4.10: Kaplan-Meier curves of the predicted (for V=600 vectors of the cali-brated MSM parameters) versus the observed (simulated by the second toy model)survival.


161


4.4 Discussion

Given that MSMs usually can predict, among other outcomes, actual survival time

and censoring status for each individual, we consider them as a special type of survival

predictive models. In this chapter we implement two concordance indices broadly

used for assessing the predictive accuracy of survival models. Furthermore, we sug-

gest and implement four different hypothesis tests, the log-rank test, a Renyi type

test, and two Cramer-von Mises tests, as alternative methods to assess the predictive

accuracy of an MSM. These tests compare the observed with the predicted survival

curve.

It is important to note here that the suggested hypothesis testing methods account

for the effect of censoring in a competing risks setting, as is the case in the prediction

of lung cancer incidence and mortality given smoking. The MSM takes into account

the presence of competing risks when modeling mortality and consequently the KM

162

Table 4.4: Assessment of the predictive accuracy of the two calibrated MSMs com-pared to the simulated truth from toy model 2: Summary statistics of the estimatesof six different predictive accuracy measures.

Bayesian Calibrated MSMC-statistic (mean±sd)* Test (%)**

VCramer -


200 0.7943±0.0083 0.7298±0.0307 29.50 27.00 29.00 62.50400 0.7946±0.0079 0.7295±0.0306 26.00 24.25 25.25 58.00600 0.7944±0.0078 0.7304±0.0308 26.33 24.00 26.17 58.17800 0.7942±0.0076 0.7301±0.0307 24.75 22.50 24.25 57.251000 0.7943±0.0077 0.7305±0.0308 26.10 24.10 25.50 59.50

Empirically Calibrated MSMC-statistic (mean±sd)* Test (%)**

VCramer -


200 0.7932±0.0079 0.7308±0.0296 3.00 9.00 12.50 29.00400 0.7927±0.0081 0.7300±0.0323 3.50 8.00 15.75 27.75600 0.7928±0.0081 0.7292±0.0322 2.50 7.50 15.83 27.50800 0.7930±0.0080 0.7293±0.0323 2.38 7.63 16.13 28.631000 0.7928±0.0080 0.7292±0.0319 2.00 7.10 15.90 28.90

* Means and standard deviations of the C-indices estimates, from the V

implementations.

**Percentage of times, in the V implementations, that the test did not reject

the H0 at α = 5%.

163

curve of the predicted survival times is adjusted accordingly. In the simulation study

we compared the predictions obtained by the MSM with the simulated truth, namely

a hypothetical observed KM curve that has been adjusted for the competing risks

problem. In practice, when implementing the hypothesis tests, it is advisable to

adjust the observed survival in order to account for the presence of competing risks,

so as to avoid bias in the survival estimates of the event of interest (54).

Summarizing the main findings from the simulation study first of all we note that

a single implementation of the MSM, for a randomly selected vector of parameter

values (V=1) is not sufficient for the comparison of the predictive accuracy of two

similar MSMs. Furthermore, as already indicated in section 3.2.4, MSM outputs

based on more than one sets of calibrated values for the model parameters, allow

for conveying parameter uncertainty in the final results. For these reasons multiple

runs of the model are recommended instead. Based on the results presented in this

chapter, a number of V=400 runs of the model is deemed adequate to draw safe

conclusions about the relative predictive accuracy of the two models.

In addition, concordance indices, although useful to measure the overall discrimi-

nation ability of a model, sometimes they may not be able to capture differences

between distinct observed and predicted survival times. The reason for this is that

concordances indices are based on the relative ranks of the observed and the predicted

values rather than the actual magnitudes. The estimates of the two C-statistics ap-

plied in the simulation study, are almost identical for the two model in all cases, thus

non-informative about the discrepancies observed, especially between MSMs predic-

tions and simulated “truth” from the second toy model. In the context of MSMing

other statistical measures are preferable to signify this characteristic of an MSM,

such as estimates of the mean squared error of the individual predictions.

In this chapter we also investigated the performance of several hypothesis tests for

164

survival data. These tests aim at comparing observed and predicted survival distri-

butions, and can provide an indication about the predictive accuracy of the model

with respect to the overall survival estimates for the event of interest.

The simulation study showed that the hypothesis tests result in the same conclusions

when there are relatively large differences between the observed and the predicted

survival, as in the case of the comparisons with the simulated truth from the second

toy model, where all tests indicated the same MSM to be more accurate. However, for

less prominent differences it is possible the tests to result in contradictory conclusions.

The reason for that lies on the specifics of each test, namely which differences (earlier

or later) each test weighs more in the calculations, as well as whether or not they

perform well in crossing hazard situations. In such a case, it is unclear whether

the individual predictions from one MSM is more accurate than the respective ones

from the other. Therefore further investigation of the situation is required, and the

final conclusions will also depend on the type of differences we are more interested

in detecting.

Furthermore, the log-rank and Renyi type tests lead to similar results about the

predictive accuracy of the two models. However the log-rank test proved a little bit

more sensitive in detecting more prominent differences between the observed and the

predicted survival curves, compared to the Renyi type test.

In future work of high priority we will apply the suggested methods to assess the

predictive accuracy of the two calibrated MSMs using real data from the National

Lung Screening Trial (NLST) (28; 1). This is a large scale, randomized, multicenter

study aimed at comparing the effect of two different screening tests, i.e., low-dose

helical computed tomography (CT), and chest radiography on the lung cancer mor-

tality of current and heavy smokers. Another very interesting application will be the

comparison of two structurally different, yet comparable, MSMs using the methods

165

suggested in this chapter. Special attention and additional work is required on the

correct incorporation of between-subjects variability in the assessment, as well as

the expansion of the methods to base assessment results on multiple outcomes of

interest. Finally, another interesting objective for further research is the considera-

tion of censoring in MSM individual predictions, as well as in the assessment of the

predictive performance of this type of models.

Finally, another very interesting objective for further research is the construction and

use of a predictive accuracy measure focused on the predictions obtained for each

specific individual. Such a measure would be based on the mean squared differences

of the individual predictions (MSEP) from the observed data (35; 36). These squared

differences could refer to estimates of predicted versus observed survival probabilities

or times to events for each individual.

166

Chapter 5

Conclusions

The main objective of this thesis was to study statistical methods for the develop-

ment and evaluation of micro-simulation models. In this chapter we summarize the

findings, as well as future work related to this research.

We began the work for this dissertation by developing an MSM that describes the

natural history of lung cancer. This model was then used as a tool for the implemen-

tation and comparison of a Bayesian and an Empirical calibration method, aimed

at specifying sets of MSM parameter values that provide good fit to the observed

quantities of interest. Finally, we have adapted tools from survival data analysis to

evaluate the predictive accuracy of a calibrated MSM.

The streamlined MSM, developed in Chapter 2, combines some of the best practices

followed in the modeling of the natural history of lung cancer and can be used for

valid predictions about the course of the disease. The development of this MSM in an

open source statistical software (R.3.0.1), enhances the transparency of the model,

facilitates research on the statistical properties of MSMs in general, and promotes

the improvement and expansion of the model to describe the course of lung cancer

in more detail, with the collaboration of scientists from several fields.

The comparative analysis presented in Chapter 3 showed that both calibration meth-

167

ods result in extensively overlapping results with respect both to sets of values for

the calibrated parameters, as well as predictions obtained by each model. However

only the Bayesian calibration method provides a sound theoretical background for

the incorporation of prior beliefs in the model and the interpretation of the results

from this procedure. The ultimate goal of this method is to draw values for the joint

posterior distribution of the MSM parameters.

Furthermore, the Bayesian method results in an MSM that performs better in the

prediction of rare events compared to the Empirical one. The predictions from the

Empirically calibrated MSM, on the other hand, are less dispersed. In addition, the

Empirical method is more efficient with respect to the computational time required

for the entire calibration.

The Bayesian approach, when focused on estimation, may not serve the purpose of

model calibration. Actually, the performance of the Bayesian calibration method can

be considerably improved by adding in the procedure a “refinement” step, aimed at

the selection of those subset of parameter values which provide better fit of the MSM

to observed data, according to some pre-specified GoF measure.

Finally Chapter 3 emphasizes on the imperative need to use High Performance Com-

puting techniques in order to undertake a rather complicated task, as the calibration

of an MSM, in R. That is because, the implementation of a calibration procedure

involves multitudinous, independent micro-simulations, which can be carried out in

parallel, thus reducing the total required running time. The R package facilitates

parallel processing via special designed libraries that can set up and distribute the

task to large computer clusters.

According to the simulation study conducted in Chapter 4, concordance statistics,

although useful for assessing the overall discrimination ability of an MSM, may not

capture differences between observed and predicted survival. The accuracy of an

168

MSM, with respect to overall predicted survival, can be better assessed by apply-

ing hypothesis tests, used in survival analysis, to compare observed with predicted

survival curves. These tests account for the effect of censoring in a competing risks

setting, as in the case of the survival estimates of lung cancer incidence and mortality

given smoking. All the tests suggested in this chapter result in the same conclusion,

when the predictions, obtained by the MSM are very different from the respective

actual observations. Furthermore, the log-rank test proved more sensitive than the

other tests in detecting more prominent differences.

We intend to continue and extend our work in a number of important directions.

First, we plan to extend the original MSM in order to incorporate more detailed

information, as well as screening and treatment components, thus making it compa-

rable to existing models about lung cancer. We also plan the the publication of the

MSM in the form of a library into the CRAN package repository of the R statistical

software.

We used the two methods presented in Chapter 3 in order to calibrate the MSM

on data about males, current smokers. We plan to perform a complete calibration

of this MSM, that is, to calibrate the parameters so that the model will be able

to predict individual trajectories within narrower subgroups defined by covariates

beyond gender and smoking status. Furthermore, we will expand the methods so as

to account for multiple calibration targets.

We also intend to apply the methods suggested in Chapter 4 for the assessment of the

predictive accuracy of the MSM using actual data from the NLST study. It would

also be informative to study how measures of predictive performance can be used to

compare two completely different models, such as two structurally different MSMs

for lung cancer. More research is also required for the expansion of the methods so

as to account for multiple outcomes of interest, as well as for the incorporation of

169

the between-subject variability in the calculations.

Another very interesting topic for further consideration would be the construction of

a predictive accuracy measure, focusing on discrepancies of the individual predictions

from the observed data. This measure would be an estimate of the mean squared

error of the MSM predictions (MSEP). The quantities involved in the calculation

could be estimates of the survival probabilities for each particular individual, as well

as times to event or censoring.

170

Bibliography

[1] Aberle, D. R., Adams, A. M., Berg, C. D., Black, W. C., Clapp, J. D., Fager-

strom, R. M., Gareen, I. F., Gatsonis, C., Marcus, P. M., Sicks, J. D., and

Team, N. L. S. T. R. (2011), “Reduced Lung-Cancer Mortality with Low-Dose

Computed Tomographic Screening,” New England Journal of Medicine, 365,

395–409.

[2] Antolini, L., Boracchi, P., and Biganzoli, E. (2005), “A time-dependent dis-

crimination index for survival data,” Statistics in Medicine, 24, 3927–3944.

[3] Baker, R. (1998), “Use of a mathematical model to evaluate breast cancer

screening policy,” Health Care Management Science, 1, 103–113, 1386-9620.

[4] Berry, D. A., Inoue, L., Shen, Y., Venier, J., Cohen, D., Bondy, M., Theriault,

R., and Munsell, M. F. (2006), “Chapter 6: Modeling the Impact of Treatment

and Screening on U.S. Breast Cancer Mortality: A Bayesian Approach,” JNCI

Monographs, 2006, 30–36.

[5] Blower, S. M. and Dowlatabadi, H. (1994), “Sensitivity and Uncertainty Anal-

ysis of Complex Models of Disease Transmission: An HIV Model, as an Ex-

ample,” International Statistical Review / Revue Internationale de Statistique,

62, 229–243.

[6] Breslow, N. (1970), “Generalized Kruskal-Wallis Test for Comparing K Samples

171

Subject to Unequal Patterns of Censorship,” Biometrika, 57, 579–594, h9895

Times Cited:1156 Cited References Count:11.

[7] Briggs, A. H., O’Brien, B. J., and Blackhouse, G. (2002), “Thinking outside

the box: Recent advances in the analysis and presentation of uncertainty in

cost-effectiveness studies,” Annual Review of Public Health, 23, 377–401.

[8] Campbell, K. (2006), “Statistical calibration of computer simulations,” Relia-

bility Engineering & System Safety, 91, 1358–1363.

[9] Chen, H. C., Kodell, R. L., Cheng, K. F., and Chen, J. J. (2012), “Assess-

ment of performance of survival prediction models for cancer prognosis,” Bmc

Medical Research Methodology, 12.

[10] Cheng, S. C., Wei, L. J., and Ying, Z. (1995), “Analysis of transformation

models with censored data,” Biometrika, 82, 835–845.

[11] Chia, Y. L., Salzman, P., Plevritis, S. K., and Glynn, P. W. (2004),

“Simulation-based parameter estimation for complex models: a breast cancer

natural history modelling illustration,” Statistical Methods in Medical Research,

13, 507–524.

[12] Clyde, M. and George, E. I. (2004), “Model uncertainty,” Statistical Science,

19, 81–94.

[13] Cronin, K. A., Legler, J. M., and Etzioni, R. D. (1998), “Assessing uncertainty

in micro simulation modelling with application to cancer screening interven-

tions,” Statistics in Medicine, 17, 2509–2523.

[14] De Angelis, D., Sweeting, M., Ades, A. E., Hickman, M., Hope, V., and Ram-

say, M. (2009), “An evidence synthesis approach to estimating Hepatitis C

Prevalence in England and Wales,” .

172

[15] Detterbeck, F. C. and Gibson, C. J. (2008), “Turning gray: The natural history

of lung cancer over time,” Journal of Thoracic Oncology, 3, 781–792.

[16] Deutsch, J. L. and Deutsch, C. V. (2012), “Latin hypercube sampling with

multidimensional uniformity,” Journal of Statistical Planning and Inference,

142, 763–772.

[17] Doubilet, P., Begg, C. B., Weinstein, M. C., Braun, P., and McNeil, B. J.

(1985), “Probabilistic Sensitivity Analysis Using Monte Carlo Simulation,”

Medical Decision Making, 5, 157–177.

[18] Draisma, G., Boer, R., Otto, S. J., van der Cruijsen, I. W., Damhuis, R. A. M.,

Schr...o der, F. H., and de Koning, H. J. (2003), “Lead Times and Overdetection

Due to Prostate-Specific Antigen Screening: Estimates From the European

Randomized Study of Screening for Prostate Cancer,” Journal of the National

Cancer Institute, 95, 868–878.

[19] Eddelbuettel, D. (2013), “CRAN Task View: High-

Performance and Parallel Computing with R,” http://cran.r-

project.org/web/views/HighPerformanceComputing.html, [Online; Retrieved:

15-March-2013].

[20] Fine, J. P. and Gray, R. J. (1999), “A proportional hazards model for the

subdistribution of a competing risk,” J Am Stat Assoc, 94, 496–509.

[21] Fleming, T. R. and Harrington, D. P. (1981), “A Class of Hypothesis Tests

for One and 2 Sample Censored Survival-Data,” Communications in Statistics

Part a-Theory and Methods, 10, 763–794, ls917 Times Cited:73 Cited Refer-

ences Count:22.

[22] Foy, M., Spitz, M. R., Kimmel, M., and Gorlova, O. Y. (2011), “A smoking-

173

based carcinogenesis model for lung cancer risk prediction,” International Jour-

nal of Cancer, n/a–n/a, 1097-0215.

[23] Fryback, D. G., Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Kuru-

chittham, V., and Remington, P. L. (2006), “Chapter 7: The Wisconsin Breast

Cancer Epidemiology Simulation Model,” JNCI Monographs, 2006, 37–47.

[24] Gampe Jutta, Z. S. (2009), “The Microsimulation tool of the MicMac project,”

2nd General Conference of the International Microsimulation Association, (Ot-

tawa, Canada).

[25] Garber, A. M. and Tunis, S. R. (2009), “Does Comparative-Effectiveness Re-

search Threaten Personalized Medicine?.” New England Journal of Medicine,

360, 1925–1927.

[26] Garg, M. L., Rao, B. R., and Redmond, C. K. (1970), “Maximum-Likelihood

Estimation of the Parameters of the Gompertz Survival Function,” Journal of

the Royal Statistical Society. Series C (Applied Statistics), 19, 152–159.

[27] Gatsonis, C. (2010), “The promise and realities of comparative effectiveness

research,” Statistics in Medicine, 29, 1977–1981.

[28] Gatsonis, C. A. and Team, N. L. S. T. R. (2011), “The National Lung Screening

Trial: Overview and Study Design,” Radiology, 258, 243–253.

[29] Geddes, D. M. (1979), “The natural history of lung cancer: a review based on

rates of tumour growth,” Br J Dis Chest, 73, 1–17.

[30] Gehan, E. A. (1965), “A Generalized Wilcoxon Test for Comparing Arbitrarily

Singly-Censored Samples,” Biometrika, 52, 203–223.

[31] Gerds, T. A., Cai, T. X., and Schumacher, M. (2008), “The performance of

risk prediction models,” Biometrical Journal, 50, 457–479.

174

[32] Gerds, T. A., Kattan, M. W., Schumacher, M., and Yu, C. (2013), “Estimat-

ing a time-dependentconcordance index for survival prediction models with

covariate dependent censoring,” Statistics in Medicine, 32, 2173–2184.

[33] Goldwasser, D. L. (2009), “Parameter estimation in mathematical models of

lung cancer [doctoral thesis],” Ph.D. thesis.

[34] Gonen, M. and Heller, G. (2005), “Concordance probability and discriminatory

power in proportional hazards regression,” Biometrika, 92, 965–970.

[35] Gorfine, M., Hsu, L., Zucker, D. M., and Parmigiani, G. (2013), “Calibrated

predictions for multivariate competing risks models,” Lifetime Data Anal.

[36] Graf, E., Schmoor, C., Sauerbrei, W., and Schumacher, M. (1999), “Assess-

ment and comparison of prognostic classification schemes for survival data,”

Statistics in Medicine, 18, 2529–2545.

[37] Gray, R. J. (1988), “A Class of K-Sample Tests for Comparing the Cumulative

Incidence of a Competing Risk,” Annals of Statistics, 16, 1141–1154.

[38] Habbema, J. D. F., van Oortmarssen, G. J., Lubbe, J. T. N., and van der

Maas, P. J. (1985), “The MISCAN simulation program for the evaluation of

screening for disease,” Computer Methods and Programs in Biomedicine, 20,

79–93.

[39] Harrell, F. E., Lee, K. L., and Mark, D. B. (1996), “Multivariable prognostic

models: Issues in developing models, evaluating assumptions and adequacy,

and measuring and reducing errors,” Statistics in Medicine, 15, 361–387.

[40] Hazelton, W. D., Clements, M. S., and Moolgavkar, S. H. (2005), “Multistage

carcinogenesis and lung cancer mortality in three cohorts,” Cancer Epidemiol-

ogy Biomarkers & Prevention, 14, 1171–1181.

[41] Hazelton, W. D., Luebeck, E. G., Heidenreich, W. E., and Moolgavkar, S. H.175

(2001), “Analysis of a historical cohort of Chinese tin miners with arsenic,

radon, cigarette smoke, and pipe smoke exposures using the biologically based

two-stage clonal expansion model,” Radiation Research, 156, 78–94.

[42] Heagerty, P. J. and Zheng, Y. Y. (2005), “Survival model predictive accuracy

and ROC curves,” Biometrics, 61, 92–105.

[43] Heidenreich, W. F., Jacob, P., and Paretzke, H. G. (1997), “Exact solutions

of the clonal expansion model and their application to the incidence of solid

tumors of atomic bomb survivors,” Radiation and Environmental Biophysics,

36, 45–58.

[44] Heidenreich, W. F., Luebeck, E. G., and Moolgavkar, S. H. (1997), “Some

properties of the hazard function of the two-mutation clonal expansion model,”

Risk Analysis, 17, 391–399.

[45] Henderson, R., Jones, M., and Stare, J. (2001), “Accuracy of point predictions

in survival analysis,” Statistics in Medicine, 20, 3083–3096.

[46] Hielscher, T., Zucknick, M., Werft, W., and Benner, A. (2010), “On the prog-

nostic value of survival models with application to gene expression signatures,”


[47] Howlader, N., Noone, A., Krapcho, M., Neyman, N., Aminou, R., Waldron,

W., Altekruse, SF. Kosary, C., Ruhl, J., Tatalovich, Z., Cho, H., Mariotto,

A., Eisner, M., Lewis, D., Chen, H., Feuer, E., and Cronin, K. (posted to

the SEER web site, April 2012), “SEER Cancer Statistics Review, 1975-2009

(Vintage 2009 Populations),,” National Cancer Institute. Bethesda, MD.

[48] Hunink, M. G. M., Koerkamp, B. G., Weinstein, M. C., Stijnen, T., and

Heijenbrok-Kal, M. H. (2010), “Uncertainty and Patient Heterogeneity in Med-

ical Decision Models,” Medical Decision Making, 30, 194–205.

176

[49] Jit, M., Choi, Y. H., and Edmunds, W. J. (2008), “Economic evaluation of

human papillomavirus vaccination in the United Kingdom,” BMJ (Clinical

research ed.), 337, a769.

[50] Karnon J, Goyder E, T. P. M. S. T. I. B. J. e. a. (2007), “A review and cri-

tique of modelling in prioritising and designing screening programmes,” Health

Technology Assessment, 11.

[51] Kennedy, M. C. and O’Hagan, A. (2001), “Bayesian calibration of computer

models,” Journal of the Royal Statistical Society Series B-Statistical Method-

ology, 63, 425–450.

[52] Kim, J. J., Kuntz, K. M., Stout, N. K., Mahmud, S., Villa, L. L., Franco,

E. L., and Goldie, S. J. (2007), “Multiparameter Calibration of a Natural

History Model of Cervical Cancer,” American Journal of Epidemiology, 166,

137–150.

[53] Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983), “Optimization by

Simulated Annealing,” Science, 220, 671–680.

[54] Klein, J. P. and Moeschberger, M. L. (2003), Survival analysis: techniques for

censored and truncated data.

[55] Koerkamp, B. G., Stijnen, T., Weinstein, M. C., and Hunink, M. G. M. (2011),

“The Combined Analysis of Uncertainty and Patient Heterogeneity in Medical

Decision Models,” Medical Decision Making, 31, 650–661.

[56] Kopec, J. A., Fines, P., Manuel, D. G., Buckeridge, D. L., Flanagan, W. M.,

Oderkirk, J., Abrahamowicz, M., Harper, S., Sharif, B., Okhmatovskaia, A.,

Sayre, E. C., Rahman, M. M., and Wolfson, M. C. (2010), “Validation of

population-based disease simulation models: a review of concepts and meth-

ods,” Bmc Public Health, 10.

177

[57] Korn, E. L. and Simon, R. (1990), “Measures of explained variation for survival

data,” Statistics in Medicine, 9, 487–503.

[58] Koscielny, S., Tubiana, M., Le, M. G., Valleron, A. J., Mouriesse, H., Contesso,

G., and Sarrazin, D. (1984), “Breast-Cancer - Relationship between the Size of

the Primary Tumor and the Probability of Metastatic Dissemination,” British

Journal of Cancer, 49, 709–715.

[59] Koscielny, S., Tubiana, M., and Valleron, A. J. (1985), “A simulation model of

the natural history of human breast cancer,” Br J Cancer, 52, 515–524.

[60] Kullback, S. and Leibler, R. A. (1951), “On Information and Sufficiency,”

Annals of Mathematical Statistics, 22, 79–86.

[61] Laird, A. K. (1964), “Dynamics of Tumor Growth,” British Journal of Cancer,

18, 490–502.

[62] L’Ecuyer, P., Simard, R., Chen, E. J., and D., K. W. (2002), “An object-

oriented random-number package with many long streams and substreams.”

Operations Research, 50, 1073–1075.

[63] Leydold, P. L. and J. (2005), “rstream: Streams of Random Numbers for

Stochastic Simulation,” R News, 5, 16–20.

[64] Luebeck, E. G., Heidenreich, W. F., Hazelton, W. D., Paretzke, H. G., and

Moolgavkar, S. H. (1999), “Biologically based analysis of the data for the Col-

orado uranium miners cohort: Age, dose and dose-rate effects,” Radiation

Research, 152, 339–351.

[65] Mandelblatt, J., Schechter, C. B., Lawrence, W., Yi, B., and Cullen, J. (2006),

“Chapter 8: The SPECTRUM Population Model of the Impact of Screening

and Treatment on U.S. Breast Cancer Trends From 1975 to 2000: Principles

and Practice of the Model Methods,” JNCI Monographs, 2006, 47–55.

178

[66] Mannion, O., Lay-Yee, R., Wrapson, W., Davis, P., and Pearson, J. (2012),

“JAMSIM: a Microsimulation Modelling Policy Tool,” Jasss-the Journal of

Artificial Societies and Social Simulation, 15.

[67] Matloff, N. (2013), “Programming on Parallel Machines,”

http://heather.cs.ucdavis.edu/ matloff/158/PLN/ParProcBook.pdf, [On-

line; Retrieved: 13-March-2013].

[68] McCallum, Q. E. and Weston, S. (2012), “Parallel R,” O’Reilly.

[69] McKay, M. D., Beckman, R. J., and Conover, W. J. (2000), “A Comparison

of Three Methods for Selecting Values of Input Variables in the Analysis of

Output from a Computer Code,” Technometrics, 42, 55–61.

[70] McMahon, P. M. (2005), “Policy assessment of medical imaging utilization:

methods and applications [doctoral thesis],” Ph.D. thesis.

[71] McMahon, P. M., Kong, C. Y., Johnson, B. E., Weinstein, M. C., Weeks, J. C.,

Kuntz, K. M., Shepard, J. A. O., Swensen, S. J., and Gazelle, G. S. (2008),

“Estimating long-term effectiveness of lung cancer screening in the Mayo CT

screening study,” Radiology, 248, 278–287.

[72] Meza, R., Hazelton, W. D., Colditz, G. A., and Moolgavkar, S. H. (2008),

“Analysis of lung cancer incidence in the nurses’ health and the health pro-

fessionals’ follow-up studies using a multistage carcinogenesis model,” Cancer

Causes & Control, 19, 317–328.

[73] Moeschberger, M. L. and Klein, J. P. (1995), “Statistical methods for depen-

dent competing risks,” Lifetime Data Analysis, 1, 195–204.

[74] Moolgavkar, S. H. and Luebeck, E. G. (2003), “Multistage carcinogenesis and

the incidence of human cancer,” Genes Chromosomes Cancer, 38, 302–6.

[75] Moolgavkar, S. H. and Luebeck, G. (1990), “Two-Event Model for Carcinogen-179

esis: Biological, Mathematical, and Statistical Considerations,” Risk Analysis,

10, 323–341.

[76] Mountain, C. F. (1997), “Revisions in the International System for Staging

Lung Cancer,” Chest, 111, 1710–1717.

[77] Nelder, J. A. and Mead, R. (1965), “A Simplex Method for Function Mini-

mization,” The Computer Journal, 7, 308–313.

[78] Nielsen, B. (1997), “Expected survival in the Cox model,” Scandinavian Jour-

nal of Statistics, 24, 275–287.

[79] Nieto, F. J. and Coresh, J. (1996), “Adjusting survival curves for confounders:

A review and a new method,” American Journal of Epidemiology, 143, 1059–

1068.

[80] Oakley, J. E. and O’Hagan, A. (2004), “Probabilistic sensitivity analysis of

complex models: a Bayesian approach,” Journal of the Royal Statistical Society

Series B-Statistical Methodology, 66, 751–769.

[81] O’Hagan, A., Stevenson, M., and Madan, J. (2007), “Monte Carlo probabilistic

sensitivity analysis for patient level simulation models: Efficient estimation of

mean and variance using ANOVA,” Health Economics, 16, 1009–1023.

[82] Orcutt, G. H. (1957), “A New Type of Socio-Economic System,” Review of

Economics and Statistics, 39, 116–123, cgb69 Times Cited:26 Cited References

Count:4.

[83] Parmigiani, G. (2002), “Measuring uncertainty in complex decision analysis

models,” Statistical Methods in Medical Research, 11, 513–537.

[84] Pencina, M. J. and D’Agostino, R. B. (2004), “Overall C as a measure of dis-

crimination in survival analysis: model specific population value and confidence

interval estimation,” Statistics in Medicine, 23, 2109–2123.

180

[85] Peto, R. and Peto, J. (1972), “Asymptotically Efficient Rank Invariant Test

Procedures,” Journal of the Royal Statistical Society Series a-General, 135,

185–207.

[86] Plevritis, S. K., Salzman, P., Sigal, B. M., and Glynn, P. W. (2007), “A nat-

ural history model of stage progression applied to breast cancer,” Statistics in

Medicine, 26, 581–595.

[87] Plevritis, S. K., Sigal, B. M., Salzman, P., Rosenberg, J., and Glynn, P. (2006),

“Chapter 12: A Stochastic Simulation Model of U.S. Breast Cancer Mortality

Trends From 1975 to 2000,” JNCI Monographs, 2006, 86–95.

[88] Poole, D. and Raftery, A. E. (2000), “Inference for deterministic simulation

models: The Bayesian melding approach,” J Am Stat Assoc, 95, 1244–1255.

[89] Rossini, A. J., Tierney, L., and Li, N. (2007), “Simple parallel statistical com-

puting in R,” Journal of Computational and Graphical Statistics, 16, 399–420.

[90] Rutter, C. M., Miglioretti, D. L., and Savarino, J. E. (2009), “Bayesian Cali-

bration of Microsimulation Models,” J Am Stat Assoc, 104, 1338–1350.

[91] Rutter, C. M. and Savarino, J. E. (2010), “An Evidence-Based Microsimulation

Model for Colorectal Cancer: Validation and Application,” Cancer Epidemiol-

ogy Biomarkers and Prevention, 19, 1992–2002.

[92] Rutter, C. M., Zaslavsky, A. M., and Feuer, E. J. (2011), “Dynamic Microsim-

ulation Models for Health Outcomes,” Medical Decision Making, 31, 10–18.

[93] Saha-Chaudhuri, P. and Heagerty, P. J. (2013), “Non-parametric estimation of

a time-dependent predictive accuracy curve,” Biostatistics, 14, 42–59.

[94] Salomon, J. A., Weinstein, M. C., Hammitt, J. K., and Goldie, S. J. (2002),

“Empirically calibrated model of hepatitis C virus infection in the United

States,” American Journal of Epidemiology, 156, 761–773.181

[95] Santner, T. J., Williams, B. J., and Notz, W. (2003), The Design and analysis

of computer experiments, Springer series in statistics, New York: Springer.

[96] Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., and

Mansmann, U. (2009), “State of the Art in Parallel Computing with R,” Jour-

nal of Statistical Software, 31, 1–27.

[97] Schumacher, M. (1984), “2-Sample Tests of Cramer-Vonmises-Type and

Kolmogorov-Smirnov-Type for Randomly Censored-Data,” International Sta-

tistical Review, 52, 263–281.

[98] Shi, L., Tian, H., McCarthy, W., Berman, B., Wu, S., and Boer, R. (2011),

“Exploring the uncertainties of early detection results: model-based interpre-

tation of mayo lung project,” BMC Cancer, 11, 92.

[99] Siegel, R., Naishadham, D., and Jemal, A. (2012), “Cancer statistics, 2012,”

CA Cancer J Clin, 62, 10–29.

[100] Simon, R. M., Subramanian, J., Li, M. C., and Menezes, S. (2011), “Using

cross-validation to evaluate predictive accuracy of survival risk classifiers based

on high-dimensional data,” Briefings in Bioinformatics, 12, 203–214.

[101] Sonnenberg, F. A. and Beck, J. R. (1993), “Markov-Models in Medical

Decision-Making - a Practical Guide,” Medical Decision Making, 13, 322–338.

[102] Spratt, J. S. and Spratt, T. L. (1964), “Rates of Growth of Pulmonary Metas-

tases and Host Survival,” Annals of Surgery, 159, 161–171.

[103] Steel, G. G. (1977), Growth kinetics of tumours : cell population kinetics in

relation to the growth and treatment of cancer, Oxford: Clarendon Press.

[104] Stein, M. (1987), “Large Sample Properties of Simulations Using Latin Hyper-

cube Sampling,” Technometrics, 29, 143–151.

182

[105] Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obu-

chowski, N., Pencina, M. J., and Kattan, M. W. (2010), “Assessing the Perfor-

mance of Prediction Models A Framework for Traditional and Novel Measures,”

Epidemiology, 21, 128–138.

[106] Stout, N. K., Knudsen, A. B., Kong, C. Y., McMahon, P. M., and Gazelle,

G. S. (2009), “Calibration Methods Used in Cancer Simulation Models and

Suggested Reporting Guidelines,” Pharmacoeconomics, 27, 533–545.

[107] Tan, S. Y. G. L., van Oortmarssen, G. J., de Koning, H. J., Boer, R., and

Habbema, J. D. F. (2006), “Chapter 9: The MISCAN-Fadia Continuous Tumor

Growth Model for Breast Cancer,” JNCI Monographs, 2006, 56–65.

[108] Tarone, R. E. and Ware, J. (1977), “Distribution-Free Tests for Equality of

Survival Distributions,” Biometrika, 64, 156–160.

[109] Department of Health and Human Services (2009), “Draft definition of Com-

parative Effectiveness Research for the Federal Coordinating Council,”

http://www.hhs.gov/recovery/programs/cer/draftdefinition.html.

[110] Thames, H. D., Buchholz, T. A., and Smith, C. D. (1999), “Frequency of first

metastatic events in breast cancer: Implications for sequencing of systemic and

local-regional treatment,” Journal of Clinical Oncology, 17, 2649–2658.

[111] Tierney, L. (2008), Implicit and Explicit Parallel Computing in R, Physica-

Verlag HD, chap. 4, pp. 43–51.

[112] Tunis, S. R., Benner, J., and McClellan, M. (2010), “Comparative effectiveness

research: Policy context, methods development and research infrastructure,”


[113] Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B., and Wei, L. J. (2011), “On

183

the C-statistics for evaluating overall adequacy of risk prediction procedures

with censored survival data,” Statistics in Medicine, 30, 1105–1117.

[114] Vanni, T., Karnon, J., Madan, J., White, R. G., Edmunds, W. J., Foss, A. M.,

and Legood, R. (2011), “Calibrating models in economic evaluation: a seven-

step approach,” Pharmacoeconomics, 29, 35–49.

[115] Vanni, T., Legood, R., Franco, E. L., Villa, L. L., Luz, P. M., and Schwarts-

mann, G. (2011), “Economic evaluation of strategies for managing women with

equivocal cytological results in Brazil,” International Journal of Cancer, 129,

671–679.

[116] Wakelee, H. A., Chang, E. T., Gomez, S. L., Keegan, T. H., Feskanich, D.,

Clarke, C. A., Holmberg, L., Yong, L. C., Kolonel, L. N., Gould, M. K., and

West, D. W. (2007), “Lung cancer incidence in never smokers,” Journal of

Clinical Oncology, 25, 472–478.

[117] Welton, N. J. and Ades, A. E. (2005), “A model of toxoplasmosis incidence in

the UK: evidence synthesis and consistency of evidence,” Journal of the Royal

Statistical Society: Series C (Applied Statistics), 54, 385–404.

[118] Yamaguchi, N., Tamura, Y., Sobue, T., Akiba, S., Ohtaki, M., Baba, Y.,

Mizuno, S., and Watanabe, S. (1991), “Evaluation of Cancer Prevention Strate-

gies by Computerized Simulation Model: An Approach to Lung Cancer,” Can-

cer Causes & Control, 2, 147–155.

184

statistical methods in micro-simulation modeling

Documents