statistical methods in micro-simulation modeling
TRANSCRIPT
Statistical Methods in
Micro-Simulation Modeling:
Calibration and Predictive
Accuracy
by
Stavroula ChrysanthopoulouB.S., Athens University of Economics and Business, 2003
Sc. M., University of Athens, 2007
A Dissertation submitted in partial fulfillment of the
requirements for the Degree of Doctor of Philosophy
in Biostatistics, at Brown university
Providence, Rhode Island
May 2014
c© Copyright 2014 by Stavroula Chrysanthopoulou
This dissertation by Stavroula Chrysanthopoulou is accepted in its present form
by the SPH department of Biostatistics as satisfying the
dissertation requirement for the degree of Doctor of Philosophy.
Date
Constantine Gatsonis, PhD (Advisor)
Recommended to the Graduate Council
Date
Carolyn Rutter, Reader, PhD (Reader)
Date
Xi Luo, PhD (Reader)
Date
Matthew Harrison, PhD (Reader)
Approved by the Graduate Council
Date
Peter Weber, Dean of the Graduate School
iii
Curriculum Vitæ
Stavroula Chrysanthopoulou was born on May 2, 1980, in Athens, Greece.
She received her BSc degree in Statistics from Athens University of Economics and
Business (AUEB), in September 2003, and her MSc degree in Biostatistics from
University of Athens (UOA), in February 2007.
In September 2008 she was admitted to the PhD program in Biostatistics, at Brown
University, from where she received her second MSc degree in Biostatistics in 2010.
She successfully defended her PhD Dissertation entitled ”Statistical Methods in
Micro-Simulation Modeling: Calibration and Predictive Accuracy”, on September
13, 2013.
During her five years career as a PhD candidate, she was appointed as a teaching
assistant in the following courses, offered by the department of Biostatistics at Brown
University:
• Introduction to Biostatistics (Fall semester, 2008)
• Applied Regression Models (Spring semester, 2009)
• Analysis of Life Time Data (Spring semester, 2012)
She presented a poster entitled ”Relationship between breast biopsies and family
histrory of breast cancer”, at the Brown University Public Health Research Day, in
Spring semester 2010.
She also presented part of her dissertation work as an invited speaker in the ”Micro-
iv
simulation Models for Health Policy: Advances and Applications” session, at the
Joint Statistical Meetings (JSM) 2013 conference in Montreal, Canada.
She has several years of working experience as:
⇒ 2003-2005: Consulting Biostatistician, mainly involved in the design and con-
duct of statistical analysis for biomedical papers.
⇒ 2005-2008: Statistical Consultant at Agilis SA-Statistics and Informatics, in-
volved with research on methods for official statistics in projects conducted by
the European Statistical Service (Eurostat)
Her research interests are focused on statistical methods for complex predictive mod-
els, such as Micro-simulation Models (MSMs) used in medical decision making, as
well as on High Performance Computing (HPC) techniques for complex statistical
computations using the open source statistical package R.
v
Acknowledgements
The five years of my life as a PhD candidate were full of valuable experiences, ex-
ceptional opportunities to improve myself both as a scientist and as a human being,
and of course a lot of challenging moments. In this beautiful “journey” I was blessed
by God to be surrounded by very important people, without the support of whom I
would never be able to achieve my goal.
First and foremost I would like to thank my advisor, Professor Constantine Gatsonis,
for his willingness to work with me in this very interesting field, and his continuing
support and guidance that helped me to overcome all the obstacles and conduct this
important research. His intelligence, ethos, and integrity render him the perfect role
model for young scientists. I want to also express my gratitude to Dr Carolyn Rutter
for her valuable feedback as an expert in micro-simulation modeling, as well as for
the exceptional opportunities she provided me with to present my work and exchange
opinions with experts in the field. I would also like to thank Dr Matthew Harrison
for his felicitous comments and insight that helped me to improve the Empirical
calibration method, as well as to better organize and carry out the daunting task
of calibrating a micro-simulation model. Thanks also to Dr Xi Luo for serving as a
reader in my thesis committee.
I am also grateful to people from the Brown Center for Computation and Visual-
ization support group, especially Mark Howison and Aaron Shen for always being
very responsive and effective in helping me with the implementation of exhaustive
parallel processes in R. I also thank Dr Samir Soneji for his assistance in estimating
vi
Cumulative Incidence Functions from the National Health Interview Survey data.
I also thank all the faculty, staff, and students of the Brown School of Public Health.
Especially I want to thank all my professors from the Biostatistics department, the
staff of the Center of Statistical Sciences (CSS), and my classmates. Special thanks
go to Denise Arver and Elizabeth Clark for always being very responsive and con-
siderate.
Besides the people in the Academic environment, I was also blessed to have a beau-
tiful family and some wonderful friends that were always there for me in all the ups
and downs of my career as a PhD candidate. To all these people I owe a great deal
of my achievement.
I have no words to express how blessed I am for growing up in a very loving and
caring family who always believed in and supported me. I want to thank my father
for the first nine, full of love years of my life, as well as for being my good angel since
the day he passed away. There is no way to thank my wonderful mother enough, for
dedicating her life to my brother and me, and holding very successfully both parental
roles the past twenty four years of my life. She has been without exaggeration the
best mother ever! I owe her all the good (if any) elements of my personality and a
large portion of the success in my life until now. For all these reasons I will always
be very grateful and proud of being her daughter.
I would also like to thank my brother Vassilios, for always being a good example
for me and undertaking a large portion of the burden as the protector of our family
after the loss of our father. I am also grateful to my brother’s family, his wife Ioanna
Andreopoulou, who I consider a true sister, and my two little “Princesses” Katerina
and Antonia, for the positive effect they have on me.
God has indeed been very generous with me by sending invaluable friends in my life.
I would first like to thank Dr Jessica Jalbert, Dr Dhiraj Catoor and Dr Sinan Karaveli
vii
for considerably helping me with my installation here in Providence. Special thanks
also go to the Perdikakis family, the parents Ann and Costas, and the children Rhea
and Damon Ray, Giana, and Dean for their support, caring and love. I am very
grateful for meeting and being part of this amazing family.
Last but not least I would like to express my gratitude to my heart friend Nektaria
for the continuing support, her kindness and thoughtfulness, and most importantly
for the great honor she did me to baptize her first born, Anna.
Unfortunately, due to space constraints, I have to finalize my list by thanking from
the bottom of my heart all the aforementioned people as well as many other valuable
friends, relatives and important persons in my life. Truly and deeply thankful for
their positive effect in my life, I dedicate my accomplishment to them all.
viii
Abstract of “Statistical Methods in Micro-Simulation Modeling:
Calibration and Predictive Accuracy”
by Stavroula Chrysanthopoulou, Ph.D., Brown University, May 2014
This thesis presents research on statistical methods for the development and evalu-
ation of micro-simulation models (MSM). We developed a streamlined, continuous
time MSM that describes the natural history of lung cancer, and used it as a tool for
the implementation and comparison of methods for calibration and assessment of pre-
dictive accuracy. We performed a comparative analysis of two calibration methods.
The first employs Bayesian reasoning to incorporate prior beliefs on model parame-
ters, and information from various sources about lung cancer, to derive posterior dis-
tributions for the calibrated parameters. The second is an Empirical method, which
combines searches of the multi-dimensional parameter space using Latin Hypercube
Sampling design with Goodness of Fit measures to specify parameter values that pro-
vide good fit to observed data. Furthermore, we studied the ability of the MSMs to
predict times to events, and suggested metrics, based on concordance statistics and
hypothesis tests for survival data. We conducted a simulation study to compare the
performance of MSMs in terms of their predictive accuracy. The entire methodology
was implemented in R.3.0.1. Development of an MSM in an open source statistical
software enhances the transparency, and facilitates research on the statistical prop-
erties of the model. Due to MSMs complexity, use of High Performance Computing
techniques in R is essential to their implementation. The analysis of the two cali-
bration methods showed that they result in extensively overlapping set of values for
the calibrated MSM parameters, and MSM outputs. However, the Bayesian method
performs better in the prediction of rare events, while the Empirical method proved
more efficient in terms of the computational burden. The assessment of predictive
accuracy showed that among the methods suggested here, hypothesis tests outper-
form concordance statistics, since they proved more sensitive for detecting differences
ix
between predictions, obtained by the MSM, and actual individual level data.
x
To my beloved family.
xi
Contents
Abstract ix
1 Introduction 1
1.1 Micro-Simulation Models (MSMs) . . . . . . . . . . . . . 3
1.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Applications in health care research . . . . . . . . . . . 3
1.1.3 Development of an MSM . . . . . . . . . . . . . . . . . . 7
1.2 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Micro-simulation model describing the natural
history of lung cancer 12
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Model description . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Model components . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Simulation Algorithm . . . . . . . . . . . . . . . . . . . . 26
2.2.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.1 Ad-hoc values for model parameters . . . . . . . . . . . 32
2.3.2 MSM output - Examples . . . . . . . . . . . . . . . . . . 37
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
xii
3 Calibration methods in MSMs - a comparative
analysis 54
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1.1 Calibration vs estimation in statistical theory . . . . 55
3.1.2 Calibration methods for MSMs . . . . . . . . . . . . . . 57
3.1.3 Assessing calibration results . . . . . . . . . . . . . . . . 58
3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 Bayesian Calibration Method . . . . . . . . . . . . . . . 61
3.2.3 Empirical Calibration Method . . . . . . . . . . . . . . 62
3.2.4 Calibration outputs: interpretation and use . . . . . . 69
3.3 High Performance Computing in R . . . . . . . . . . . . 71
3.3.1 Software for MSMs . . . . . . . . . . . . . . . . . . . . . . 71
3.3.2 Example: computational burden of two MSM cali-
bration methods . . . . . . . . . . . . . . . . . . . . . . . . 72
3.3.3 Parallel Computing . . . . . . . . . . . . . . . . . . . . . . 74
3.3.4 Code architecture . . . . . . . . . . . . . . . . . . . . . . . 76
3.3.5 Algorithm efficiency: Bayesian vs Empirical Cali-
bration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3.3.6 Concluding remarks . . . . . . . . . . . . . . . . . . . . . 79
3.4 Comparative Analysis . . . . . . . . . . . . . . . . . . . . . . 82
3.4.1 Input Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.4.2 MSM parameters to calibrate . . . . . . . . . . . . . . . 84
3.4.3 Calibration Targets . . . . . . . . . . . . . . . . . . . . . . 85
3.4.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . 87
3.4.5 Terms of comparison . . . . . . . . . . . . . . . . . . . . . 96
3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
xiii
3.5.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.5.2 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.6 Calibration Methods Refinement . . . . . . . . . . . . . . 118
3.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4 Assessing the predictive accuracy of MSMs 133
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
4.1.1 Assessment of MSMs . . . . . . . . . . . . . . . . . . . . . 134
4.1.2 Predictive accuracy of MSMs . . . . . . . . . . . . . . . 135
4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
4.2.2 Concordance statistics . . . . . . . . . . . . . . . . . . . . 141
4.2.3 Hypothesis testing . . . . . . . . . . . . . . . . . . . . . . 145
4.2.4 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . 148
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.3.1 Single run of the MSM . . . . . . . . . . . . . . . . . . . 150
4.3.2 Multiple runs of the MSM . . . . . . . . . . . . . . . . . 154
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5 Conclusions 167
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
xiv
List of Tables
2.1 MSM simulation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 MSM ad-hoc parameter estimates: Onset of the first malignant cell . 35
2.3 SEER data on lung cancer at diagnosis . . . . . . . . . . . . . . . . . 36
2.4 MSM ad-hoc parameter estimates: Lung cancer progression . . . . . . 37
2.5 Predicted times to events: Males - Non smokers . . . . . . . . . . . . 39
2.6 Predicted times to events: Females - Non smokers . . . . . . . . . . . 39
2.7 Predicted times to events: Males - Current smokers . . . . . . . . . . 40
2.8 Predicted times to events: Females - Current smokers . . . . . . . . . 41
2.9 Predicted times to events: Males - Former smokers, quitting smoking
at age 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Predicted times to events: Males - Former smokers, quitting smoking
at age 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Predicted times to events: Males - Former smokers, quitting smoking
at age 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.12 Predicted times to events: Females - Former smokers, quitting smok-
ing at age 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.13 Predicted times to events: Females - Former smokers, quitting smok-
ing at age 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.14 Predicted times to events: Females - Former smokers, quitting smok-
ing at age 60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
xv
3.1 Code efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2 Reference population age distribution . . . . . . . . . . . . . . . . . . 84
3.3 Observed lung cancer incidence rates . . . . . . . . . . . . . . . . . . 86
3.4 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.5 Number of microsimulations . . . . . . . . . . . . . . . . . . . . . . . 90
3.6 Summary Statistics - parameters . . . . . . . . . . . . . . . . . . . . 105
3.7 Summary statistics - predictions . . . . . . . . . . . . . . . . . . . . . 114
3.8 Assessing MSM predictions . . . . . . . . . . . . . . . . . . . . . . . . 118
3.9 Discrepancy - predictions . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.10 Summary statistics - Box plots . . . . . . . . . . . . . . . . . . . . . . 120
3.11 Summary Statistics - parameters (sub-analysis) . . . . . . . . . . . . 122
3.12 Summary statistics - predictions (sub-analysis) . . . . . . . . . . . . . 127
3.13 Discrepancy - predictions (sub-analysis) . . . . . . . . . . . . . . . . . 128
4.1 Assessment (toy.1, V=1) . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.2 Assessment (toy.2, V=1) . . . . . . . . . . . . . . . . . . . . . . . . . 153
4.3 Assessment (toy.1, V=200, 400, 600, 800, 1000) . . . . . . . . . . . . 159
4.4 Assessment (toy.2, V=200, 400, 600, 800, 1000) . . . . . . . . . . . . 163
xvi
List of Figures
2.1 Markov State diagram of the lung cancer MSM . . . . . . . . . . . . 16
2.2 Lung cancer mortality: Non-smokers . . . . . . . . . . . . . . . . . . 39
2.3 Lung cancer mortality: Current smokers . . . . . . . . . . . . . . . . 42
2.4 Lung cancer mortality: Former smokers . . . . . . . . . . . . . . . . . 50
3.1 LHS implementation (N=5) . . . . . . . . . . . . . . . . . . . . . . . 66
3.2 LHS implementation (N=20) . . . . . . . . . . . . . . . . . . . . . . . 66
3.3 Micro-simulation size . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.4 Density Plots - parameters . . . . . . . . . . . . . . . . . . . . . . . . 103
3.5 Mahalanobis distances - parameters . . . . . . . . . . . . . . . . . . . 104
3.6 Bayesian method: Contours of calibrated parameters . . . . . . . . . 106
3.7 Empirical method: Contours of calibrated parameters . . . . . . . . . 107
3.8 Density plots - predictions (internal validation) . . . . . . . . . . . . 112
3.9 Density plots - predictions (external validation) . . . . . . . . . . . . 113
3.10 Mahalanobis distances - predictions . . . . . . . . . . . . . . . . . . . 115
3.11 Calibration plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.12 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.13 Density Plots - parameters (sub-analysis) . . . . . . . . . . . . . . . . 121
3.14 Bayesian method (sub-analysis): Contours of calibrated parameters . 123
3.15 Empirical method (sub-analysis): Contours of calibrated parameters . 124
3.16 Density plots - predictions - sub (internal validation) . . . . . . . . . 125
xvii
3.17 Density plots - predictions - sub (external validation) . . . . . . . . . 126
3.18 MH algorithm flow chart . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.19 Bayesian Calibration flow chart . . . . . . . . . . . . . . . . . . . . . 132
4.1 KM curves - Observed vs Predicted survival (toy.1, V=1) . . . . . . . 151
4.2 KM curves - Observed vs Predicted survival (toy.2, V=1) . . . . . . . 153
4.3 KM curves - Observed vs Predicted survival (toy.1, V=200) . . . . . . 156
4.4 KM curves - Observed vs Predicted survival (toy.1, V=400) . . . . . . 156
4.5 KM curves - Observed vs Predicted survival (toy.1, V=600) . . . . . . 157
4.6 KM curves - Observed vs Predicted survival (toy.1, V=800) . . . . . . 157
4.7 KM curves - Observed vs Predicted survival (toy.1, V=1000) . . . . . 158
4.8 KM curves - Observed vs Predicted survival (toy.2, V=200) . . . . . . 160
4.9 KM curves - Observed vs Predicted survival (toy.2, V=400) . . . . . . 160
4.10 KM curves - Observed vs Predicted survival (toy.2, V=600) . . . . . . 161
4.11 KM curves - Observed vs Predicted survival (toy.2, V=800) . . . . . . 161
4.12 KM curves - Observed vs Predicted survival (toy.2, V=1000) . . . . . 162
xviii
Chapter 1
Introduction
Comparative Effectiveness Research (CER), a novel research framework aimed at
developing broad-based comparative evidence on the outcomes of diagnostic and
therapeutic procedures, has recently attracted significant scientific attention. An
important component of CER is the development of new methodologies for empir-
ical and modeling studies that generate information appropriate for health policy
decisions. Within this context, a class of predictive models, the micro-simulation
models (MSMs), has attracted considerable attention among researchers. MSM’s
use information from various sources of medical research and clinical expertise to
simulate individual disease trajectories, i.e., trajectories that describe events asso-
ciated with the development of the target disease. The summarized results from
these individual trajectories are used to make predictions about long term effects of
a health policy intervention on a given population.
Micro-simulation models have been widely used in several fields. However, the sys-
tematic investigation of their statistical properties is only recently getting under way.
The main objective of this thesis is to address two of the key elements in the devel-
opment and evaluation of an MSM, namely, model calibration and prediction, from a
statistical point of view. To this end we first develop a streamlined micro-simulation
model that describes the natural history of lung cancer, and use it as a tool to explore
1
the statistical aspects of calibration and prediction for MSMs.
The thesis is divided into five chapters. The first chapter provides an introduction
and overview of the thesis. The second chapter focuses on the development of a
streamlined, continuous time MSM that describes the natural history of lung cancer
in the absence of screening and treatment interventions. This MSM serves as a
tool for the study of the statistical properties of MSMs in subsequent chapters.
In particular, the third chapter provides a comparative analysis of two calibration
methods, a Bayesian and an Empirical one, with application to this MSM for lung
cancer. The fourth chapter discusses the assessment of the predictive accuracy of an
MSM, using the lung cancer model. The dissertation concludes with a fifth chapter
which summarizes the main findings and concusions, and outlines the plans for future
work on the study of the statistical properties of MSMs.
2
1.1 Micro-Simulation Models (MSMs)
1.1.1 Overview
Micro-simulation models (MSMs) are complex models designed to simulate individual
level data using Markov Chain Monte Carlo methods. The first applications of
MSMs were in social policy in the late 1950s (Orcutt (1957)). In recent years,
MSMs are beginning to be used extensively in health policy and medical decision
making. MSMs in health policy problems are used to describe the natural history of
a disease in individual members of a cohort, usually in conjunction with the effect of
some intervention. To this end MSMs use mathematical equations with stochastic
assumptions to describe in detail complex observed and latent characteristics of the
underlying process. The inherent intricacy of MSMs posed serious time and cost
constraints in their development and implementation, especially during the first years
of their use. However, the advances in scientific computing in recent years have
contributed considerably to the improvement and expansion of new methodologies
and applications of MSMs in general, and to medical decision making in particular.
1.1.2 Applications in health care research
Rutter et al. (2011), provide a comprehensive review of micro-simulation models
used to predict health outcomes. The review highlights the usefulness of MSMs and
their continuously expanding role in medical decision making. It also indicates the
key steps in the development of a new MSM and discusses the essential checks of the
validity of the model. Finally the review points to the need for additional research on
the statistical properties of MSMs, especially the incorporation and characterization
of the model uncertainty.
Another very important application of MSMs is in the context of the Comparative
3
Effectiveness Research (CER), a rapidly growing area of research aimed at improving
health outcomes while reducing related costs. CER has recently attracted a great
deal of attention in the medical and scientific community. According to the American
Health and Human Services (HHS) department (109) CER is defined as:
“ the conduct and synthesis of systematic research comparing different inter-
ventions and strategies to prevent, diagnose, treat and monitor health condi-
tions.The purpose of this research is to inform patients, providers and decision-
makers, responding to their expressed needs, about which interventions are
most effective for which patients under specific circumstances. To provide this
information, CER must assess a comprehensive array of health-related out-
comes for diverse patient populations. Defined interventions compared may
include medications, procedures, medical and assistive devices and technologies,
behavioral change strategies, and delivery system interventions. This research
necessitates the development, expansion, and use of a variety of data sources
and methods to assess comparative effectiveness.”
Tunis et al. (112) provide a comprehensive introduction to CER in the context of
the recently enacted USA health care reform, and discuss the statistical challenges
in carrying out this research. The authors highlight the need for sufficient, credible,
relevant and timely evidence in the conduct of CER, and emphasize that ”the primary
purpose of CER is to help health-care decision makers make informed decisions at
the level of individual care for patients and clinicians, and at the level of policy
determinations for payers and other policymakers”. The conduct of CER comprises
a great variety of novel and existing methods in medical research, all of which can
be classified in five broad categories, i.e., systematic reviews, decision modeling,
retrospective analysis, prospective observation studies and experimental studies.
A key example of the use of CER in medical decision making, mentioned in both
the Tunis et al. (112) paper as well as the commentary by Gatsonis (27), is the
4
evaluation of diagnostic modalities for cancer. Both papers indicate the necessity for
individual-level information to assist decisions. However this type of information can
prove very costly, time-consuming or even totally impracticable due to the complex-
ity of the health-care setting. Therefore micro-simulation has risen to prominence as
a promising tool that can make projections about the impact of interventions (such
as screening) when applied to population cohorts, and inform health policies and
medical decision making. A characteristic example of the application of new mod-
eling techniques in Medical Decision Making (MDM) (including micro-simulation
modeling) is the research conducted by the Cancer Intervention and Surveillance
Modeling Network (CISNET) of NCI (http://cisnet.cancer.gov). The CISNET group
is a consortium of NCI-sponsored investigators with research interest focused on the
development and application of advanced statistical modeling. Its main objective
is to use advanced modeling techniques to better understand the effects of cancer
control interventions (prevention, screening, treatment, etc.) on individuals as well
as on population trends (incidence and mortality rates). The CISNET consortium
currently comprises five large groups focusing their research on five different types of
cancer: breast, colorectal, esophagus, lung and prostate cancer. Models developed
to describe each one of these types of cancer, can be used to guide health research
and priorities.
The complexity of an MSM can make its development a daunting task. However,
a valid MSM can be useful to many stakeholders. In particular, it can be used to
inform patients, providers and decision-makers and assist them in deciding on the
most effective and efficient intervention under certain circumstances. Despite their
complexity, MSMs hold some very “attractive” features that have distinguished them
from other useful tools for the conduct of CER. First, MSMs are designed to describe
and evaluate complicated processes when analytical formulas are not available. The
models focus on making predictions about individual patient trajectories rather than
5
describing the average patient. This, as already mentioned, is a key element of any
statistical tool used for the conduct of CER which is essentially patient-centered.
In addition, MSMs provide an easy way of representing time dependent transition
probabilities between major states of the disease course while, at the same time,
they facilitate the explicit incorporation of different sources of uncertainty intrinsic
to the system (stochastic, parameter, structural, etc). Furthermore they compile
and sometimes even reconcile contradictory facts about the disease process derived
from different sources (e.g. experimental studies, observational studies, expert opin-
ions, etc). MSMs also provide short or long-term predictions about the course of a
disease and the effect of interventions (e.g., screening schedule, treatment, etc) on a
population. In the case of simulating results from longitudinal studies, MSM based
projections can be available well in advance of the actual study conclusion. Finally,
MSMs can be used to produce large pseudo-samples, a very important feature es-
pecially in cases where the conducting of large, well designed studies (e.g. large
scale clinical trials, etc) is prohibitive by time and/or cost constraints or even ethical
considerations.
An example of the application of MSMs in health care is their wide use to evaluate
and compare cancer screening programs. In this setting an MSM is used to describe
the main stages of the natural history of the specific type of cancer and to model
the effect of screening on several aspects of a patient’s lifetime (e.g., survival time,
quality of life, etc). In many instances, the course of cancer can be divided into five
main stages: the disease free state, the onset of the malignancy (local state), the
involvement of detectable lymph nodes metastases (regional state), the involvement
of distant metastases and the death either from cancer or from other causes. Modelers
may be interested in all or only some of these stages. Several papers have studied
each of these disease states separately and have tried to fit complex mathematical
models on real data (41; 40; 43; 72; 75; 15; 61; 26; 33; 58; 59; 70; 102; 103). These
6
models aim to combine information from the biological process of the disease with
observed outcomes and describe the entire phenomenon in as much detail as possible.
Micro-simulation modeling can be used to combine all the models that describe the
essential parts of a disease process, and use the Monte Carlo method to simulate
individual patients’ trajectories.
1.1.3 Development of an MSM
The development of a micro-simulation model is a complex undertaking involving,
as any other statistical predictive model, three major building blocks, namely the
model specification, calibration and assessment.
Model specification refers to defining the structure of the model that will be used to
describe, analyze, and/or simulate the phenomenon of interest, including the nature
of the model (e.g., regression, Markov, etc), as well as the set of rules and assumptions
imposed. For a new MSM, in particular, describing the natural history of a disease,
model specification entails identification of the major distinct states of the disease
as well as stipulation of the transition rules among them, including the relevant
mathematical and distributions to describe the underlying stochastic process.
Calibration is the process of determining values of the parameters so as the model to
provide good fit to available data about the phenomenon of interest. In the context
of MSMing, calibration is analogous to parameter estimation followed in ordinary
statistical models (e.g., GLM).
Assessment, pertains to the model’s predictive performance, comprising overall model’s
performance and discrimination ability (105). Overall performance can be expressed
as the percentage of the explained, variation of the system (R2 statistics) as well
as proximity between observed and predicted quantities of interest (GoF statistics).
Discrimination, on the other hand, is the model’s ability to correctly classify sub-
7
jects (e.g. patients) with different characteristics based on the individual predictions
about the outcome of interest. The goal of this thesis is to explore these building
blocks through the development of a new, streamlined MSM describing the natural
history of lung cancer.
The main purpose of an MSM is to predict individual trajectories for the phenomenon
it describes (in MDM disease trajectories). These individual trajectories can be
point estimates of several quantities of interest (outputs) including time to events
(e.g., time to the development of lung cancer), binary responses (e.g., death from
lung cancer), or even estimates of continuous quantities (e.g., tumor diameter at
diagnosis).
As in any other type of statistical analysis, it is important to accompany point esti-
mates with some measures of variability, so as to give an idea about their precision.
In order to do so in the context of MSMing, it is very important to understand all
possible sources of uncertainty inherent in the model, and find a way to incorporate
them in the model estimates. Rutter et al. (92), identify the following sources of
uncertainty in MSMs:
• population heterogeneity : differences between individuals in the population
of interest, with a significant effect on the observed outcomes
• parameter uncertainty : variability due to the estimation of unknown model
parameters
• selection uncertainty : incorporation of information based solely on a small
portion of studies from the pool of available studies on the specific topic
• sampling variability : variability owing to the fact that the calibration data
are summary statistics estimated from a finite sample from the population of
interest
8
• stochastic uncertainty : variability due to the random numbers generation
procedure followed in the Monte Carlo approach for the evaluation and imple-
mentation of the MSM
• structural uncertainty : variability caused by the ignorance about the exact
procedure of the phenomenon described by the MSM and related to the model
assumptions (incertitude about the functional form of the model)
All the methods presented in this thesis, take into account the problem of the iden-
tification and characterization of MSM’s uncertainty.
1.2 Thesis Outline
The remainder thesis is divided into four chapters. Chapter 2 presents the develop-
ment of a streamlined continuous-time micro-simulation model (MSM) that describes
the natural history of lung cancer in the absence of screening and treatment compo-
nents. The chapter begins with an extensive literature review on the subject matter
of lung cancer history modeling and surveys use of MSMs in this area. The chap-
ter continues with the determination of the major distinct stages of the disease and
description of the set of rules and assumptions governing the MSM.
We kept the number of covariate classes to a minimum in order to achieve a man-
ageable level of model complexity. Therefore, the set of covariates in the model
comprises the gender, age, smoking history (age at beginning and quitting smoking)
and smoking habits (smoking intensity based on the average number of cigarettes
smoked per day) of each individual. Published results on several stages of lung can-
cer course are used for an ad-hoc specification of the model parameters. MSM’s
functionality is depicted using characteristic examples of model’s output given cer-
tain real life scenarios. The chapter also describes in detail the simulation algorithm
followed for the implementation of the model, and illustrates MSM’s performance by
9
running the model for several, characteristic, real life scenarios, and comparing MSM
predictions to knowledge attained in the field (i.e., lung cancer research). The main
objective in building this MSM is to serve as a tool for the comparative evaluation
of statistical methodologies for the model calibration, validation and assessment of
predictive accuracy described in subsequent parts of the thesis.
The second chapter discusses with the calibration of an MSM. Here, the literature
review includes references to methods used for the calibration of MSMs in medical
decision making specifically. The main objective of this chapter is to provide a
comparative analysis of two calibration methods for MSMs. To this end a simulation
study is designed and conducted, the results of which, comprise the basis of the
comparative analysis.
The first method is the Bayesian calibration developed by Rutter et al. (90) and
implemented on an MSM for colorectal cancer. The second method is a new empirical
calibration method. The idea underlying this method, is to combine some of the best
modeling practices currently applied for the empirical calibration of several types of
MSMs, including search algorithms of possible values from the multidimensional
parameter space, GoF statistics to assess model’s overall performance, convergence
criteria, stopping rules, etc. A key component of the new method relies on the
incorporation of the broadly used Latin Hypercube Sampling (LHS) design in the
searching algorithm for more efficient (compared to simple random sample) search
of the multidimensional parameter space of a (usually rather involved) MSM.
Both the Bayesian and the empirical calibration methods are implemented on the
continuous-time MSM for the natural history of lung cancer, described in the first
chapter. The comparison of the models uses both qualitative (e.g., efficiency, prac-
ticality, interpretation of calibration results, etc), as well as quantitative measures
assessing overall model’s performance (GoF statistics) including both internal and
10
external validation. Internal validation pertains to assessing model’s performance us-
ing exactly the same data that were used during the calibration procedure, whereas
for external validation purposes different data are used. In addition, graphical ways
for assessing model’s performance are also provided. The results from this compari-
son are used for recommendations regarding the use of these two, as well as similar
approaches in practice.
Although very widely used, at least to our knowledge, no systematic work has been
carried out yet on the assessment of an MSM’s predictive accuracy. The fourth chap-
ter is concerned with the assessment of the predictive accuracy of a “well” calibrated
MSM. Micro-simulation models are considered here as a special type of predictive,
survival models, since they predict actual survival times unlike other, broadly used
survival models, which predict hazard rates, or ratios (e.g., Cox Proportional Haz-
ards, Accelerated Failure time, etc., models). The extensive literature review aims at
identifying measures of predictive accuracy used in the context of survival modeling
that could also be applied for the assessment of an MSM.
Two broadly used methodologies are applied to the two calibrated MSMs resulted
from Chapter 3, namely, concordance statistics and methods aimed at comparing
predicted with observed survival curves. These approaches are adapted to the par-
ticularities of MSMs. The chapter compares the two methodologies, summarizes
findings from a simulation study, and concludes with suggestions about useful statis-
tics for the assessment of the predictive accuracy of an MSM.
In the last chapter of this thesis (Chapter 5) we summarize the main findings and
include future work related to our research.
11
Chapter 2
Micro-simulation model describing the natural his-
tory of lung cancer
In the first chapter we develop a new, streamlined, continuous time micro-simulation
model (MSM) that describes the natural history of lung cancer in the absence of
any screening or treatment component. This is a predictive model that simulates
individual patient trajectories given a certain set of covariates, namely the age, gender
and smoking history. The model structure is in line with methods , and combines
findings from several sources, related to lung cancer research. This new MSM predicts
the course of lung cancer for each individual, from the initiation of the first malignant
cell, to the tumor progression to regional and distant stages, until death from lung
cancer (or some other cause), or the end of the prediction period. The main goal is
for the model to serve as a tool to explore, in subsequent chapters, some properties
of the MSMs from a statistical point of view. In particular, the research focus will
be on model calibration and assessment of the model’s predictive accuracy. The
model is developed using the open source statistical software R.3.0.1, in order to
enhance its transparency and explore the potentiality of this software to be used for
the development of MSMs in general.
The chapter begins with background information regarding MSMs currently used
12
to describe the natural history of lung cancer. The main part of the chapter is
dedicated to the description of the new, streamlined MSM for lung cancer that we
develop here. The second section describes the main model components, namely the
distinct disease states, the set of transition rules between them, the distributions
and mathematical equations describing the particularities of the process as well as
an account of the model parameters. Thereafter, we present in detail the simulation
algorithm followed to predict individual trajectories. The next section pertains to the
explanation of the process followed for the determination of some ad-hoc values for
the model parameters in conjunction with a brief description of the data used for this
purpose. Model performance is exemplified by running the MSM under hypotheti-
cal scenarios, i.e., for different individual baseline characteristics including smoking
habits. The chapter concludes with discussion on the overall model’s performance,
advances, and shortcomings, as well as future work on this topic.
13
2.1 Background
Micro-simulation models (MSMs) are complex models designed to simulate individual
level data using Markov Chain Monte Carlo methods. Several micro-simulation
models have been developed in order to describe the natural history of lung cancer.
Two of the most comprehensive and widely used ones are the Lung Cancer Policy
Model (LCPM) developed by McMahon (70), and the MIcro-simulation ScCreening
ANalysis (MISCAN) model by Habbema et al. (38). Other simplified versions of
MSMs for lung cancer can be also found in the literature (Goldwasser (33), Hazelton
et al. (40), etc).
The LCPM is a discrete time epidemiological MSM that combines information related
to multiple stages of lung cancer mainly based on epidemiological models. The
MISCAN model on the other hand is a continuous time MSM that in addition takes
into account the biology of the tumor cells (latent process). Noteworthy is the fact
that all the MSMs that have been developed to describe the course of lung cancer
take into account the smoking history and smoking habits in the prediction of lung
cancer risk and mortality.
McMahon et al. (71) and Shi et al. (98) present two representative applications of
the aforementioned models in medical decision making. The first paper presents
the application of the Lung Cancer Policy Model (LCPM) to assess the long-term
effectiveness of lung cancer screening in the Mayo CT study, an extended, single-arm
study aiming to evaluate the effect of helical CT screening for lung cancer on current
and former smokers. Here, the LCPM micro-simulation model is used to simulate
the end results of interest for pseudo-individuals of a hypothetical control arm, i.e.
in the absence of any screening program.
The second paper refers to the application of the MISCAN micro-simulation model
14
for lung cancer to explore a number of hypotheses that could potentially explain the
controversial finding of the Mayo Lung Project (MLP), namely the increase in lung
cancer survival since the time of diagnosis without a corresponding reduction in lung
cancer mortality. In this case, the authors modify the MISCAN model parameters
accordingly so as to simulate pseudo-individuals under different tested scenarios that
could possibly explain that controversial finding, such as, over-diagnosis, screening
sensitivity, and population heterogeneity. They subsequently fit each model on real
data from the MLP randomized clinical trial and compare their goodness of fit (GoF)
to that of the simplest model, i.e. the one in which the model parameters related to
the hypotheses of interest are set to their neutral values. For instance, a parameter
for indolent cancers is introduced in the model to account for possible effect of over-
diagnosis. Only a notable improvement of the GoF measure (deviance) could strongly
support the validity of the scenario under consideration. For example, if the model
with the indolent cancer parameter does not decrease the deviance measure resulted
from the simpler model, then the micro-simulation result does not support over-
diagnosis as the reason for the controversial finding of the Mayo Lung Project.
In both papers noteworthy is the fact that results from the MSM application are
presented only as point estimates of the quantities of interest, lacking any measure of
precision. This is very typical in studies involving use of micro-simulation modeling.
15
2.2 Model description
We have developed a new, streamlined, continuous time MSM that describes the
natural history of lung cancer in the absence of any screening or treatment compo-
nent. This is a Markov model in the sense that it satisfies the Markovian property,
i.e., the transition to any subsequent state depends exclusively upon the state the
process currently resides.
The Markov state diagram in figure 2.1 depicts the five distinctive states of the
model, i.e. the disease free state (S0), the onset of the first malignant cell (local
state, S1), the beginning of the regional (lymph node involvement, S2), and distant
stage (involvement of distant metastases, S3), and eventually the death (S4) state. In
the same figure hij denotes the hazard rate characterizing the transition from state
i to state j.
Death can be attributed to either lung cancer or other causes. In order to consider
that a lung cancer death occurred, the individual has to move from state S3 to S4.
That is, the model assumes that death from lung cancer can occur only after the
tumor is already in distant stage.
Figure 2.1: Markov State diagram of the lung cancer MSM
The model essentially consists of the absorbing state of death (S4) and four “tunnel”
states. The “tunnel” states are consecutive states stipulating the specific course of
16
the phenomenon described in the Markov state diagram (101). According to the
Markov state diagram presented in figure 2.1, from a disease free state at some time
point the first malignant cell initiates (local stage), and proliferates up to the point of
lymph nodes involvement (transition to regional stage). The tumor progresses from
this stage to the involvement of distant metastases, and eventually causes death from
lung cancer unless death from some other cause precedes. As already mentioned, a
key model assumption is that it is very unlikely to observe death from lung cancer
without previous involvement of distant metastases.
The development and course of lung cancer in a person’s lifetime according to this
model is stipulated by a set of transition rules described in detail hereafter. Estimates
of the model parameters are obtained from a thorough literature review on the topic
including various sources (e.g. RCTs, case-control and cohort studies, meta-analyses,
expert opinions, etc). These estimates are used in the present chapter as ad-hoc
values for working examples of MSM’s performance, while, in subsequent chapters
they will serve as guidance for the specification of plausible values for the MSM
parameters. Simulations on individual level basis are carried out using the Monte
Carlo method. In particular, this approach involves the generation of a fair amount
of individual trajectories resulting in a large number of independent and identically
distributed natural histories in each covariate class. This trajectories are summarized
so as to get an indication of the predicted quantities of interest, e.g. lung cancer
incidence and mortality rates overall and by covariate group, etc.
2.2.1 Model components
Onset of the first malignant cell
We model the onset of the first malignant cell using the exact solutions, for the
expression of the hazard rates and the survival probabilities, of the biological two-
17
stage clonal expansion (TSCE) model (75). For piecewise constant parameters, the
hazard function for the development of the first malignant cell is (44):
h(t) =νµX(e(γ+2B)t − 1)
γ +B(e(γ+2B)t + 1)(2.1)
that for piecewise constant parameters, γ and B can be determined using the follow-
ing parameterization:
with γ = α− β − µ and B = 12(−γ +
√γ2 + 4αµ)
where X is total number of normal cells, ν is the normal cell initiation rate, α is the
division rate of initiated cells, β is the apoptosis rate (death or differentiation) of
initiated cells, and µ is the malignant conversion rate of initiated cells.
Following equation 2.1, the cumulative hazard function is:
H(t) =νµX
γ +B·(− t+
1
B· log
(γ +B +B · e(γ+2B)t
))(2.2)
Previous empirical data analyses with the TSCE model, exploring the dose-response
relationship of smoking with lung cancer incidence indicated that power laws are good
approximations to this relationship (40; 41). In the same studies X=107 has been
provided as a plausible figure for the total number of normal stem cells. Furthermore,
the TSCE multistage model allows tests for differences in the initiation, promotion
and malignant conversion rates of the course of lung cancer between population sub-
groups. Previous analyses of lung cancer incidence data in the nurses (NHS) and
the health professionals (HPFS) follow-up studies revealed a significant difference
in tobacco-induced promotion and malignant conversion rates between males and
females (72).
18
We incorporate these findings, about the effect of smoking on the onset of the first
malignant cell, in our model. In particular, if q(t) denotes the smoking intensity at
age t, expressed as average cigarettes smokes per day, the effect it has on the α and
γ rates is described by the following, power law relationships:
α = α0(1 + α1q(t)a2) and γ = γ0(1 + α1q(t)
a2)
where γ0 and α0 are the coefficients for the non-smokers. To account for differences
between men and women, as well as between current smoking habits we assume dif-
ferent hazards (function of age t) corresponding to all possible combinations of gender
(male/female), and smoking (never/former/current smoker). For each individual the
time period from birth (t=0) to the onset of the first malignant cell can be split into
k intervals in which the hazard rate is constant and depends on the person’s smoking
status (smoking or not) within this interval. For simplicity reasons we only assume
up to two possible changing time points in a lifetime; time at starting (τ1) and time
at quitting (τ2) smoking, where relevant.
The survival function S(t) for the development of lung cancer is:
S(t) = exp{−H(t)} = exp
{−∫ t
0
h(x)dx
}(2.3)
Depending on the smoking status of each person we discern the following three
possible scenarios:
• Never smoker:
S(t) = exp
{−∫ t
0
h(x)dx
}(2.4)
• Current smoker:
S(t) = exp
{−∫ τ1
0
h(x)dx−∫ t
τ1
h(x)dx
}(2.5)
19
• Former smoker:
S(t) = exp
{−∫ τ1
0
h(x)dx−∫ τ2
τ1
h(x)dx−∫ t
τ2
h(x)dx
}(2.6)
Tumor growth
Several studies have showed an inverse correlation between the tumor growth and its
size, namely the tumor growth rate is usually non-constant, but decreases steadily.
According to these studies the Gompertz function provides a good approximation
of the tumor growth for most cancer types and describes the specific process more
efficiently than, e.g., the exponential distribution (15). The Gompertz model suggests
the proliferation of tumor cells by a modified exponential process in which successive
doubling times occur at increasingly longer time intervals (61). Hence the Gompertz
function stipulates a shorter pre-clinical period than the exponential model, and
longer survival after diagnosis.
The model assumes a Gompertzian (61) tumor growth, i.e. the tumor volume at age
t is:
V (t)
V0
= esm
(1−e−mt) (2.7)
where V0 and V(t) represent the initial tumor volume (volume of the first malignant
cell) and the tumor volume at age t respectively and m, s are the location and scale
parameters of the Gompertz function.
The hazard rate of the Gompertz distribution as a function of time t is (26):
r(t) = s · e−mt (2.8)
The time at which the tumor has reached volume V(t) can be found using the inverse
20
of the Gompertz function:
t = log
[1− m
slog
(V (t)
V0
)]− 1m
(2.9)
In order this equation to be defined values for the Gompertz parameters (m, s) should
be chosen so as:
1− m
slog
V (t)
V0
> 0, ∀ s⇒ s > m · log
(VmaxV0
)(2.10)
This limitation is very important especially in the specification procedure of the
model parameters either as ad-hoc values or in a regular calibration setting.
Moreover, assuming a spherical tumor growth (i.e. symmetric towards all directions),
the tumor size at age t is a function of its diameter at that age (d(t)), and is calculated
using the sphere volume formula:
V (t) =π
6[d(t)]3 (2.11)
The tumor volume limits are stipulated from the minimum and maximum possi-
ble diameter. The minimum diameter (diameter of one cancerous cell) is set to
d0=0.01mm (70; 29; 15) while the maximum diameter (tumor diameter that causes
death) is set to dmax=13cm (15).
In order to keep the model parameterization to a minimum, so as the model to be
more flexible and easily handled for the purposes of subsequent analyses (calibration
and assessment), we assume the same Gompertz distribution for all tumors irrespec-
tive of lung cancer type.
21
Disease progression
Disease progression of an existing lung cancer can occur via nodal involvement and
distant metastases (70). Current MSMs for lung cancer(70; 33) adopt, in their disease
progression parts, methodologies developed to describe the progress of breast cancer
(26; 86; 59; 110).
Previous studies (59; 58; 102; 103) have shown that, given a Gompertzian tumor
growth, the distribution of tumor volumes at specific time points can be adequately
described using the log-Normal distribution. In particular, let (Vreg, Treg), (Vdist,
Tdist) and (Vdiagn, Tdiagn) the pairs of tumor volume and age at the beginning of
regional or distant stage, and at time to diagnosis (clinical detection) respectively.
We use distributions logNormal (µreg, σ2reg), logNormal (µdist, σ
2dist), and logNormal
(µdiagn, σ2diagn) to simulate tumor volumes Vreg, Vdist, and Vdiagn respectively.
In addition, the simulated tumor volumes are subject to the following restrictions:
V0 < Vreg < Vdist < Vmax and V0 < Vdiagn < Vmax (2.12)
Given the tumor volume and its growth rate we can find the time (age) at which
the tumor has reached the specific volume. The tumor progression according to
the MSM for lung cancer proposed here, relies on several key assumptions. First
of all there is a positive correlation between the tumor size and the probability of
symptomatic detection, i.e. the larger the tumor size, the higher the probability to
be clinically detected. The beginning of the local stage is when the first malignant
cell develops. The transition from regional to distant stage is defined to occur at
the moment distant metastatic disease first becomes detectable by usual clinical
care. In addition the transition to the distant stage presupposes a tumor already
at regional stage which in turn develops only after the transition to a local stage.22
Finally, another very important assumption implied by this model is that there are
no large differences in the growth rates and the tumor size and stage distributions
across different covariate classes (age-gender-smoking status group).
The disease progression model also implies that no symptom detection was possible
due to lymph node involvement or benign lesions whilst patients with symptom
detected distant metastases are by assumption M1 (according to the TNM staging
sytstem (76) ) with probability equal to 1. Furthermore, the conditional distribution
of the tumor stage given its size at clinical diagnosis is considered multinomial. When
defining the ad-hoc values for the model parameters we use the observed, in SEER
data, frequencies of local, regional and distant cancers by size at diagnosis presented
in Table 2.3. According to this table there are no large differences between males and
females, hence we assume the same tumor volume distributions for the two genders,
and try to fit the overall size information.
Survival
Competing risks
In a multi-state model as the MSM for the natural history of lung cancer presented
here, calculation of the survival probabilities is a rather complicated task due to the
presence of competing risks. The competing risks issue arises when individuals are
subject to risk factors the can cause two or more mutually exclusive events (54).
Smoking, for instance, is strongly associated with both lung cancer and other cause
death. Hence, when modeling lung cancer mortality, by taking into account risk
factors such as age, smoking habbits, etc, death from other causes is the competing
risk since it precludes death from lung cancer.
A significant amount of work has been done on the problem of competing risks, a
concise summary of which can be found in Moeschberger and Klein (1995). The
usual practice is to assume independence among the competing risks and use some
23
conventional non-parametric (e.g. Kaplan-Meier estimator) or semi-parametric (e.g.
Cox Proportional Hazards model) method to estimate the survival probabilities.
In cases where the independence assumption is not valid more complicated methods
should applied. The reason is that simple Kaplan-Meier estimators of the net survival
probabilities by cause of death are not enough to describe the mortality rates in this
setting. Crude probabilities that express the probability of death from a specific
cause after adjusting for other causes of death should be used instead. One way
of expressing these crude probabilities is by using a cause-specific, sub-distribution
function, i.e., the Cumulative Incidence Function (CIF).
In the natural history model for lung cancer each person faces the risk of dying from
lung cancer (main event of interest) or dying from some other cause (competing
risk). In order to express the lung cancer survival probability accounting for the
competing risk, we employ the CIF techniques described in Gray (1988), and Fine
and Gray (1999) that have also been incorporated in the R statistical package library
“cmprsk”. In particular, let Yi be the number of individuals at risk, li the number
of those who died from lung cancer, and oi the number of those who have died from
some other cause by time ti. Here t1 <t2 <...tk represent all the distinct times at
which a competing risk occurs. In this setting, li+oi is the total number of individuals
experiencing any of the competing risks (here death from any cause) at time ti.
The CIF in this case is defined as:
CIF (t) =
∑
ti≤t
{∏i−1j=1
[1− lj+oj
Yj
]}ljYj
, if t1 ≤ t
0 , otherwise(2.13)
Note here that, for t1 ≤t, CIF (t) =∑
ti<tS(t−i)
liYi
, where S(t−i) is the Kaplan-Meier
estimator, evaluated at time just before ti, considering death from causes other than
lung cancer as the event of interest. Hence, the CIF estimates the probability that
24
the event of interest (death from lung cancer) will occur before time t, and before
the occurrence of any competing risk (death from other causes).
Note here that, as already mentioned, a very important assumption made here is that
death from lung cancer is unlikely to occur without the previous detection of distant
metastases (symptomatic or not). We compute the CIF using combined information
from the NHIS and the SEER data (section 2.3.1).
25
Other cause mortality
Given the main covariates of interest, namely age, gender, smoking status (current,
former or never smoker), and smoking intensity expressed as average number of
cigarettes smoked per day, we use the non-parametric estimates we get for the CIF
using the observed NHIS data. The MSM simulations depend on the strong assump-
tion that the death patterns observed in these data do not change dramatically over
time. Hence they are also relevant to the prediction period we are interested in.
Lung cancer mortality
Using the SEER data we get non-parametric estimates of the CIF given the individ-
ual’s characteristics at the time of clinical (symptomatic) detection of lung cancer.
In particular, the CIF estimates are grouped by age (5 years age bins), gender, tumor
size (tumor diameter: ≤2cm, 2-5cm, and >5cm) at diagnosis. Given these estimates
we can simulate the time to death from lung cancer after the symptomatic detection
of lung cancer using an inverse CIF search approach.
2.2.2 Simulation Algorithm
In this section we describe in detail the algorithm we follow in order to run a single
micro-simulation, i.e., to predict the lung cancer trajectory of an individual with
certain baseline characteristics.
Simulate baseline characteristics
For each person we either have access or simulate some baseline characteristics that
will be used as input in the model to make predictions. In particular, from each
sample, for which predictions regarding the course of lung cancer are to be made, we
either have the individual records or some information regarding the distribution of
the main covariates of interest, i.e. age, gender and smoking history. The smoking
26
history includes the age at starting and quitting smoking (where relevant) as well
as the smoking intensity expressed as the average number of cigarettes smoked per
day. Given the form of the available information (individual records or overall sam-
ple distributions) we simulate the baseline characteristics using bootstrap method
(randomly drawing with replacement from the available data). The set of baseline
characteristics stipulates the covariate class g each individual belongs to.
Time to death from other causes
→ Draw uo1 ∼ Unif(0, 1) and uo2 ∼ Unif(0, 1)
→ Compare uo1 to the non-parametric estimate CIFg(t) from the NHIS data and
find the closest estimate to uo1 in order to specify the time interval during
which death from other causes can occur for this person. That is, for {t :
min |u01−CIF (t)|}, we assume that death from a cause other than lung cancer
for that person may occur between [t, min(ti) ≥ t].
→ Use the uo2 to assign the specific time point (age) at which death occurs within
the pre-specified [t, min(ti) ≥ t] time interval (key assumption: the time at
which death from other causes occurs is uniformly distributed within the pre-
specified interval).
Time to the onset of the first malignant cell
Given the baseline covariates we simulate the time (age) to the first malignant cell
(Tmal) based on the exact formulas of the hazard function according to the TSCE
model, as described in section 2.2.1. In particular:
→ Draw um1 ∼ Unif(0, 1)
→ Use numerical integration1 to find age t such that S(t) = um1 ⇒ t = S−1(um1)
1Given the S(t) we use the ”uniroot” function in R to solve the expression exp{−∫ t
0h(x)dx} -
S(t)=0 for t, where t is the age at the onset of the first malignant cell in years.
27
where S(t) is the survival function (eq.2.3), and h(t) is the respective hazard rate
(eq.2.1). Depending on the smoking status the survival probability is given by the
equations (2.4 - 2.6). For each patient we either have the detailed smoking history,
i.e., the exact τ1 and τ2 ages or we can estimate the average age at starting and
quitting smoking from available data (e.g., McMahon (2005)).
Disease progression
Assuming the same parameters for the tumor growth, volume and stage at diagnosis
across the covariate classes of interest, we simulate the tumor progression as follows:
→ Draw Vreg ∼ logNormal(µreg, σ2reg), and Vdist ∼ logNormal(µdist, σ
2dist)
→ Repeat the previous step until drawing the first pair (Vreg, Vdist) with:
V0 < Vreg < Vdist < Vmax
→ Draw Vdiagn ∼ logNormal(µdiagn, σ2diagn) with V0 < Vdiagn < Vmax
→ Calculate the tumor diameters dreg, ddist, and ddiagn using the sphere volume
formula (eq.2.11).
→ Find the times (ages) treg, tdist, and tdiagn using the inverse Gompertz function
(eq. 2.9).
→ Simulate ages at the beginning of the regional (Treg) and distant stage (Tdist),
as well as age at diagnosis, given age at the onset of the first malignant cell
(Tmal), as:
– Treg = Tmal + treg
– Tdist = Tmal + tdist
– Tdiagn = Tmal + tdiagn
→ Find the tumor stage at diagnosis comparing Vdiagn to Vreg and Vdist (or,
28
alternatively, Tdiagn to Treg and Tdist)2
Time to death from lung cancer
Given the age (Tdiagn), tumor size (ddiagn) and tumor stage at diagnosis we can
simulate the time to death from lung cancer using the non-parametric estimates
CIF(t, g) we get for the CIF from the SEER data as follows:
→ Draw ul1 ∼ Unif(0, 1) and ul2 ∼ Unif(0, 1)
→ Compare ul1 to the non-parametric estimate CIF(t, g) from SEER data, and
find the closest estimate to ul1 in order to specify the time interval during which
the death from lung cancer can occur for this person.
→ Use the ul2 to assign the specific time point (age) at which death occurs within
the pre-specified time interval3 (key assumption: the time at which death from
lung cancer occurs is uniformly distributed within the pre-specified interval).
Comparing the simulated times resulting in from the aforementioned simulation pro-
cedure, we ”tell the story” for each individual with certain characteristics regarding
the development and course of lung cancer in his lifetime. This ”story” is the pre-
dicted individual trajectory resulting after completing one micro-simulation. Table
2.1 recapitulates the main steps of the simulation algorithm to be followed in order
to predict the trajectory of an individual with certain baseline characteristics.
2The decision about the quantities compared for the specification of the tumor stage at diagnosismay be very important when, for example, improvement of the algorithm’s efficiency is a key issue,as it is the case with MSM’s calibration (chapter II).
3The length of the pre-specified time intervals varies, and is related to the discontinuity in thenon-parametric estimate of the CIF
29
Table 2.1: Continuous time MSM for lung cancer: simulation algorithm to predictthe lung cancer trajectory of an individual.
1. Simulate baseline characteristics g=(age, gender, smoking history1).
2. Simulate age to death (Td other) from a cause other than lung cancer
given age, gender, and smoking status.
3. Simulate age to the onset of the first malignant cell (Tmal), given gender,
smoking status, smoking history (age at starting and quitting smoking),and smoking intensity.
4. Simulate ages at the beginning of the regional (Treg) and the distant
stage (Tdist) given the tumor growth rate.
5. Simulate age (Tdiagn) and tumor diameter (ddiagn) at diagnosis. Find
tumor stage comparing Tdiagn with Treg and Tdist.
6. Simulate age to death from lung cancer (Td lung) given the simulated
individual’s characteristics at diagnosis (Tdiagn and tumor stage).
7. Compare the simulated ages Tdother, Tmal, Treg, Tdist, Tdiagn, and Td lung
to ”tell” a story for the specific individual with g set of covariates, i.e.,
to predict that individual’s trajectory.
1Smoking history includes: smoking status (never, former or current smoker), and smoking intensity (average
number of cigarettes smoked per day)
30
2.2.3 Software
To enhance transparency the model is developed in the open source statistical soft-
ware R (version 2.15.2). A comprehensive R code describes the model structure (set
of transition rules and assumptions). Given the model parameters (either ad-hoc
or calibrated values) for an individual with specific characteristics (set of covariate
values) the model stipulates the times to the transition to each state. Combining all
the simulated times together, gives the predicted trajectory of this specific individual
in regards to the development of lung cancer.
Handling random numbers
The implementation of a large number of simulations, required for the evaluation of
a complex process using micro-simulation modeling, necessitates a special consider-
ation and treatment of the massive quantity of random numbers generated. For this
purpose we use the methodology described in Leydold and J. (2005) regarding the
generation of independent streams of random numbers for stochastic simulations,
that was motivated by the work on the object-oriented random number generator
(RNG) with streams and substreams presented in the L’Ecuyer et al. (2002) pa-
per. The adoption of the specific methodology, among other things, ascertains the
generation of “statistically independent“ streams- i.e., independent random numbers
despite the enormous size of random numbers produced - thus avoiding unintended
correlations between the several parts of the simulation algorithm. For the implemen-
tation of this methodology, we use the built-in functions included in the “rlecuyer“
package in the R library.
31
2.3 Application
2.3.1 Ad-hoc values for model parameters
The MSM for lung cancer we propose here comprises a set of parameters repesenting
both latent and observable variables as well as describing the distribution of certain
characteristics of the underlying process. Typically the stipulation of MSM parame-
ters involves extensive calibration procedures (chapter II). The goal of this section is
simply to exemplify the model’s performance by running MSM under hypothetical
scenarios. For this purpose, in this chapter, we use some ad-hoc point estimates
for the model parameters. In this section we describe the determination of those
ad-hoc values that can be used as model inputs to run micro-simulations and predict
individual trajectories of lung cancer patients.
Onset of the first malignant cell
Several studies have tried to elucidate the biological process of lung carcinogenesis
by fitting the TSCE model on real data (75; 64; 41; 40; 72). As ad-hoc values for the
TSCE model parameters we use the point estimates reported in Hazelton et al. (40)
resulting from the analysis of the second Cancer Prevention Study (CPS II). Table
2.2 provides the complete list of parameters related to the specification of the age at
the onset of the first malignant cell, depicts the ad-hoc values (point estimates along
with 95% CIs) used for some of these and indicates the type and order of calculations
used for the determination of the rest of them.
Tumor growth and disease progression
The ad-hoc values for the location and scale parameters of the logarithmic distri-
bution describing the tumor volume distribution at clinical detection come from the
Koscielny et al. (1985) study. This paper studies the initiation of distant metastasis
in breast cancer. In particular, it compares two different patterns of tumor growth,
32
that is an exponential and a Gompertzian one, with respect to their fit on avail-
able data concerning distributions of tumor volume at diagnosis, as well as tumor
doubling times. Results from this paper agree with findings from previous studies
(103) indicating that the tumor growth in humans can be better described using the
Gompertz function rather than assuming an exponential curve.
The relationship between the Gompertz distribution parameters (m, s) describing
the tumor growth, results from the restriction related to the definition of the inverse
Gompertz function (eq. 2.9) for the specification of age t when tumor reaches size
V(t). According to this:
1− m
slog
(V (t)
V0
)> 0⇒ s > m · log
(V (t)
V0
), ∀ V (t)
s > m · log
(VmaxV0
)(2.14)
Given the tumor volume at diagnosis (Vdiagn) we can calculate the age (Tdiagn) at
which the tumor reached this volume using again (2.9). The doubling time as a
function of age Tdiagn is:
DT = − 1
mlog[1− m
s· log(2) · exp(m · Tdiagn)
](2.15)
For m=0.00042, and s=31·m, the mean doubling time is close to the observed one
recorded in previous studies (70), while (2.14) is satisfied. Finally the logNormal
location and scale parameters for Vreg and Vdist are specified so as to reproduce
distributions of tumor stage at diagnosis by size similar to what has been observed
in SEER data (Table 2.3).
Mortality data
Estimates of lung cancer and other cause mortality rates are based on data from two
major sources: the National Health Interview Survey NHIS and the Surveillance,
33
Epidemiology and End Results SEER data respectively.
Both databases are representative of the US population and constitute the main
source of information about baseline characteristics, health risk factors as well as
incidence and mortality rates in the entire population. The NHIS is a national
cross-sectional survey aimed at monitoring the national health patterns since 1957.
NHIS collects data about several demographic characteristics, risk factors and health
statuses of the US population. It also provides information about the age and cause
of death. From the large pool of available NHIS data we worked with the Integrated
Health Interview Series (IHIS) harmonized set of data. The IHIS variables are given
consistent codes and have been thoroughly documented to facilitate cross-temporal
comparisons. The SEER program provides information on cancer statistics in an
effort to reduce the burden of cancer among the US population. In particular, SEER
data record information regarding the incidence and mortality cancer rates by certain
demographic characteristics of a geographic sample representing the 28 percent of
the US population since 1973.
We based our estimates on lung cancer incidence and mortality on SEER data cov-
ering the interval from 1973 to 2008 and IHIS data from 1986 to 2004.
The model is structured so as to predict the main events of interest, i.e. lung cancer
incidence and mortality, based on the gender, age, and smoking history of a person
including the average time at starting and quitting smoking as well as the average
smoking intensity. The NHIS data only provide information about the age, gender,
smoking, and, when relevant, cause of death. Information about smoking, in partic-
ular, includes, for current smokers at the time of the study, the number (“heaviest
amount”) of cigarettes smoked per day, grouped in four categories, namely “less than
15”, “15-24”, “’25-34’, and “35 or more” cigarettes. On the other hand, the SEER
data also record the age and gender while in addition they provide information re-
34
garding the age, tumor size and stage at clinical diagnosis as well as the age and
cause of death. Therefore we need a way to combine the information coming from
these two, representative of the US population, datasets in order to simulate the
time and cause of death given the age, gender, smoking history and tumor stage at
diagnosis. As already mentioned, the cause of death is classified as lung cancer or
other cause.
Table 2.2: Ad-hoc values and calculations for the MSM parameters relatedto the onset of the first malignant cell.
ParameterGender
TypeMales Females
X 107 107 fixedv0 7.16·10−8 1.07·10−7 fixed
(4.6·10−8, 1.21·10−7) (6.97·10−8, 1.62 · 10−7)α0 7.7 15.82 fixed
(6.45, 12.99) (13.39, 42.12)γ0 0.09 0.071 fixed
(0.071, 0.106) (0.055, 0.088)v1 0.00 0.02 fixed
(0.00, 1.76) (0.00, 12.5)α1 0.6 0.5 fixed
(0.43, 0.91) (0.27, 0.86)α2 0.22 0.32 fixed
(0.12, 0.30) (0.14, 0.40)v v0(1-v1) calculatedγ γ0(1+α1 · [q(t)]α2) calculatedα α0(1+α1 · [q(t)]α2) calculatedµ0 v0 calculatedµ µ0 calculatedβ0 α− µ− γ calculatedPoint estimates are extracted from the analysis of the CPS II study data (40).
Hazard function: h(t) = [νµX(e(γ+2B)t − 1)]/[γ +B(e(γ+2B)t + 1)]
where, B=(1/2)(−γ +√γ2 + 4αµ)
35
Table 2.3: Tumor stage by size at diagnosis (SEER data).
SizeStage
Overalllocal regional distant Total
≤ 2cm 6031(48%) 2705(21%) 3868(31%) 126042-5cm 7050(24%) 8348(29%) 13894(47%) 29292≥ 5cm 1387(9%) 4803(29%) 10112(62%) 16302
Males≤ 2cm 2518(44%) 1238(22%) 1957(34%) 57132-5cm 3445(23%) 4352(29%) 7228(48%) 15025≥ 5cm 810(8%) 2921(31%) 5857(61%) 9588
Females≤ 2cm 3513(51%) 1467(21%) 1911(28%) 68912-5cm 3605(25%) 3996(28%) 6701(47%) 14302≥ 5cm 577(9%) 1882(28%) 4255(63%) 6714
Table 2.4 provides a complete list with the ad-hoc values for the MSM’s parameters
related to the tumor growth and disease progression. As already mentioned in the
simulation procedure of the specific parts of the model, non-parametric estimates of
the CIF from the NHIS and SEER data are used as fixed model inputs.
36
Table 2.4: Ad-hoc values and calculations for the model parameters related to thelung cancer progression
Quantity Value
Tumor growth
Diameter of one malignant cell* d0 = 0.01mmDiameter of one malignant cell* dmax = 130mmTumor volume of diameter d** v = π
6d3
Parameters of the Gompertz m = 0.00042distribution** s = 31 · m
Disease progression**
Parameters of the logNormal distribution for tu-mor volume at the beginning of the regional stage
µreg = 1.1 , σreg = 1.1
Parameters of the logNormal distribution for tu-mor volume at the beginning of the distant stage
µdist = 2.8 , σdist = 2.8
Parameters of the logNormal distribution for tu-mor volume at diagnosis
µdiagn = 3.91 , σdiagn = 3.91
* Values stipulated from the lung cancer literature.
** Values specified by the modeler to match data.
2.3.2 MSM output - Examples
In this section we present predictions after multiple runs of the MSM under different
scenarios. The focus is on lung cancer incidence and mortality of people 65 years
old at the beginning of the prediction period, that covers the entire lifespan. We
compare MSM outputs between males and females, for never, former, and current
smokers separately. For current smokers we also compare results given three different
average smoking intensities, i.e., for 10, 30, and 50 cigarettes per day. Furthermore,
for former smokers we also include comparisons for different quitting smoking ages,
i.e., 40, 50, and 60 years old. Current and former smokers are assumed to have
started smoking at the age of 20 years.
For each of these cases, we present the distributions (mean, standard deviation,
quartiles, min and maximum value) of the course of ages at the major states in
the lung cancer, namely the age (T mal) at the onset of the first malignant cell,
37
which also indicates the beginning of the local stage, the age at the beginning of
the regional (T reg) and distant stage (T dist), the age at diagnosis (T diagn), and
the age of death from lung cancer (T death). These age distributions pertain to
people for whom the model predicted development and death from lung cancer. We
indicatively present the distributions for the aforementioned characteristic scenarios,
highlighting the effect of gender and smoking on the development and death of lung
cancer. Lung cancer mortality is depicted using survival curves. In addition, we
report estimates of the probabilities of lung cancer death (Pd) and diagnosis (Pdiagn).
All the results presented in this section are based on sets of 100,000 micro-simulations
for each scenario.
Never smokers
Tables 2.5 and 2.6 compare lung cancer mortality (Pd) and distributions of times to
each of the main lung cancer states of the MSM, developed in this chapter, between
males and females that have never smoked in their lives. According to these tables,
men have higher (almost double) probability of dying from lung cancer (0.218%)
than women (0.120%). Overall, the distributions of the predicted times are very
similar for the two genders, although slightly shifted to earlier ages for woman. That
is to say, for those cases, for which the model predicted death from lung cancer,
all the events of main interest in the lung cancer course, happened in younger ages
for the woman than for the man in the examples. These finding is in agreement
with recent findings on lung cancer incidence and mortality in never smokers (116),
indicating that women are more likely than men to have non−smoking associated
lung cancer. Figure 2.2 confirms the small difference in lung cancer survival between
the two genders.
38
Table 2.5: Male, 65 years old, never smoker.
(Pd = 0.218%)Mean ± SD Min Q1 Median Q3 Max
T mal 66.98 ± 8.50 44.42 60.59 66.93 73.88 85.38T reg 74.49 ± 8.62 50.53 68.14 74.42 81.39 93.44T dist 75.79 ± 8.53 52.31 69.51 75.52 82.59 94.48T diagn 76.76 ± 8.08 51.88 70.58 76.88 83.39 92.26T death 78.48 ± 7.49 65.15 72.47 78.56 84.63 92.77
Table 2.6: Female, 65 years old, never smoker.
(Pd = 0.120%)Mean ± SD Min Q1 Median Q3 Max
T mal 64.67 ± 10.32 37.44 58.48 65.26 72.72 83.55T reg 72.22 ± 10.20 44.78 65.63 72.57 80.63 91.14T dist 73.51 ± 10.33 47.25 66.65 73.83 81.59 92.31T diagn 74.65 ± 10.51 45.26 68.85 75.26 82.54 92.12T death 78.25 ± 7.62 65.15 71.74 78.40 84.52 92.37
Figure 2.2: MSM predicted lung cancer survival for non-smokers, 65 years old.
39
Current smokers
In all the working examples for current smokers we examine different scenarios (de-
pending on the average smoking intensity, i.e., 10, 30 and 50 cigarettes per day) for
a person 65 years old, who started smoking at the age of 20 years old. Table 2.7 and
table 2.8 present the results for a male and a female respectively. As it was expected
(99; 47), we overall observe higher proportions of predicted lung cancer deaths for
males than for females. These proportions also increase with the smoking intensity,
namely the heavier the smoker, the more probable the development and death of
lung cancer is. In addition, the entire course of the lung cancer is shifted towards
earlier ages as the average smoking intensity increases, i.e., the onset of the local,
regional and distant stages as well as the diagnosis and finally the death of lung
cancer, occur in younger ages for heavy smokers.
Table 2.7: Male, 65 years old, current smoker, started smoking at age 20.
Mean ± SD Min Q1 Median Q3 Max
Average smoking intensity: 10 cigarettes per day (Pd = 6.91%).T mal 65.87 ± 9.05 33.96 59.20 65.14 72.21 92.10T reg 73.37 ± 9.08 41.07 66.61 72.65 79.83 98.56T dist 74.71 ± 9.07 42.24 68.01 74.01 80.98 100.80T diagn 75.97 ± 9.12 44.46 69.22 75.33 82.56 99.43T death 78.93 ± 8.40 65.01 71.95 78.28 85.25 99.84
Average smoking intensity: 30 cigarettes per day (Pd = 8.75%).T mal 63.98 ± 9.19 32.32 57.39 62.84 70.04 92.49T reg 71.47 ± 9.23 38.02 64.86 70.35 77.60 100.60T dist 72.82 ± 9.20 41.94 66.20 71.69 78.80 101.40T diagn 74.08 ± 9.22 44.10 67.63 73.07 80.28 99.39T death 77.41 ± 8.39 65.00 70.35 76.18 83.45 99.82
Average smoking intensity: 50 cigarettes per day (Pd = 9.30%).T mal 62.99 ± 9.25 34.19 56.67 61.85 68.98 93.90T reg 70.49 ± 9.26 40.15 64.10 69.35 76.47 102.60T dist 71.83 ± 9.27 42.23 65.51 70.64 77.80 102.60T diagn 73.15 ± 9.39 44.13 66.87 72.10 79.28 99.56T death 76.85 ± 8.34 65.00 69.79 75.50 82.62 99.66
40
No large differences are noted in the time distributions between males and females
of the same smoking intensity group (Tables 2.7 and 2.8).
Plots in figure 2.3 verify the difference in lung cancer survival between the two
genders. In addition, according to these plots, the survival probability dicreases with
increasing average smoking intensity from 10 to 30 cigarettes/day, while it remains
almost unchanged between 30 and 50 cigarettes/day.
Table 2.8: Female, 65 years old, current smoker, start smoking at age 20.
Mean ± SD Min Q1 Median Q3 Max
Average smoking intensity: 10 cigarettes per day (Pd = 3.53%).T mal 66.59 ± 9.34 33.07 60.00 66.25 73.23 92.44T reg 74.09 ± 9.34 40.89 67.42 73.74 80.90 99.66T dist 75.42 ± 9.35 42.32 68.81 75.08 82.13 101.40T diagn 76.60 ± 9.40 44.10 69.95 76.43 83.30 98.99T death 79.74 ± 8.43 65.01 72.91 79.17 86.05 99.63
Average smoking intensity: 30 cigarettes per day (Pd = 5.27%).T mal 63.85 ± 9.54 34.13 57.25 63.00 70.37 94.28T reg 71.35 ± 9.56 41.20 64.75 70.56 77.88 102.40T dist 72.71 ± 9.57 42.23 66.09 71.84 79.19 103.30T diagn 73.99 ± 9.76 44.14 67.39 73.34 80.71 99.68T death 77.66 ± 8.40 65.00 70.59 76.54 83.61 99.83
Average smoking intensity: 50 cigarettes per day (Pd = 5.95%).T mal 62.72 ± 9.96 31.79 56.21 61.77 69.35 92.33T reg 70.24 ± 9.95 39.15 63.70 69.20 76.86 99.04T dist 71.59 ± 9.95 41.15 65.02 70.57 78.14 101.10T diagn 72.81 ± 10.2 44.14 66.42 71.98 79.48 99.55T death 77.23 ± 8.43 65.00 70.12 75.85 83.28 99.81
41
Figure 2.3: MSM predicted lung cancer survival for current smokers, 65 years old.42
Former smokers
In the examples for former smokers, we investigate the effect of smoking intensity
(10, 30, and 50 cigarettes smoked per day on average) and quitting smoking age (40,
50 and 60 years old) on the lung cancer course of a male (tables 2.9 to 2.11) and a
female (tables 2.12 to 2.14) 65 years old who started smoking at the age of 20 years.
As in the case of current smokers, the predicted proportions of lung cancer death are
higher for men compared to women in the same smoking category. This tendency
is verified in several observational studies on lung cancer (99; 47), namely men,
with exactly the same characteristics, are in general more susceptible to lung cancer
than women. Furthermore, we observe a positive correlation between the predicted
probability of death from lung cancer and the duration of smoking. This correlation
is more pronounced in heavier smokers (higher average smoking intensity). Similar
patterns hold for women. No large differences were found in the predicted times to
the main events of interest between males and females with the same characteristics.
Noteworthy is the fact that lung cancer survival for people (men or women) who
smoked for only 20 years in their lives (i.e., start and quit smoking at ages 20 and 40
respectively) is very similar to lung cancer survival for non-smokers. Furthermore,
the negative effect of smoking on lung cancer survival is more prominent for longer
duration of smoking.
Survival plots in figure 2.4 confirms the similarity in the survival curves of former
smokers who started and quit smoking at ages 20 and 40 respectively, with those of
non-smokers. These plots also confirm the effect smoking has on the deterioration of
lung cancer survival, which become more pronounced as the total number of years
of smoking, as well as the average number of cigarettes smoked per day increase.
43
Table 2.9: Male, 65 years old, former smoker, starting and quitting smoking at 20and 40 years old respectively.
Mean ± SD Min Q1 Q2 Q3 Max
Average smoking intensity: 10 cigarettes per day
(Pd = 0.23%)
T mal 63.92 ± 12.47 32.49 57.78 65.11 72.95 84.82
T reg 71.45 ± 12.37 39.94 65.82 72.56 80.55 92.14
T dist 72.76 ± 12.49 40.62 66.47 74.38 81.86 93.17
T diagn 73.56 ± 12.30 44.21 67.43 75.59 82.16 93.54
T death 77.22 ± 8.10 65.08 69.55 76.70 83.85 93.78
Average smoking intensity: 30 cigarettes per day
(Pd = 0.26%)
T mal 61.23 ± 14.11 30.96 53.80 64.36 71.72 84.44
T reg 68.76 ± 14.14 39.16 62.01 72.13 79.02 92.74
T dist 70.04 ± 14.03 40.43 63.49 73.24 80.93 93.26
T diagn 71.08 ± 14.25 44.22 65.14 74.06 81.68 92.78
T death 76.83 ± 8.15 65.06 69.49 75.85 83.53 93.17
Average smoking intensity: 50 cigarettes per day
(Pd = 0.27%)
T mal 59.01 ± 14.81 33.45 39.80 61.99 70.86 85.32
T reg 66.51 ± 14.75 39.52 47.74 69.47 78.45 92.67
T dist 67.79 ± 14.80 41.58 49.02 70.95 79.82 94.05
T diagn 68.81 ± 14.88 44.10 51.97 72.72 80.40 92.44
T death 75.74 ± 7.91 65.02 68.53 74.51 81.66 92.99
44
Table 2.10: Male, 65 years old, former smoker, starting and quitting smoking at 20and 50 years old respectively
Mean ± SD Min Q1 Q2 Q3 Max
Average smoking intensity: 10 cigarettes per day
(Pd = 0.37%)
T mal 56.27 ± 12.93 35.60 46.00 49.98 68.16 84.39
T reg 63.75 ± 12.86 42.85 53.53 58.59 75.43 92.62
T dist 65.10 ± 12.81 44.38 54.96 59.60 76.88 93.01
T diagn 65.66 ± 13.30 44.60 54.37 64.74 77.83 92.42
T death 75.65 ± 6.97 65.07 70.01 75.24 80.08 92.48
Average smoking intensity: 30 cigarettes per day
(Pd = 0.54%)
T mal 52.38 ± 11.84 32.84 44.76 48.53 59.58 84.69
T reg 59.93 ± 11.93 40.85 52.08 55.98 66.87 92.34
T dist 61.30 ± 11.86 41.40 53.60 57.12 68.30 95.15
T diagn 61.92 ± 12.41 44.43 52.54 57.46 69.80 92.89
T death 74.78 ± 6.52 65.04 69.67 74.41 78.70 93.11
Average smoking intensity: 50 cigarettes per day
(Pd = 0.70%)
T mal 50.05 ± 10.65 32.26 43.84 47.21 49.86 84.42
T reg 57.75 ± 10.68 40.57 51.40 54.82 58.59 93.35
T dist 58.99 ± 10.68 40.79 52.52 56.33 59.25 95.33
T diagn 59.29 ± 11.28 44.22 51.38 55.80 64.72 91.19
T death 74.17 ± 5.91 65.01 69.40 74.03 78.01 92.26
45
Table 2.11: Male, 65 years old, former smoker, starting and quitting smoking at 20and 60 years old respectively.
Quit smoking at 60 years old
Mean ± SD Min Q1 Q2 Q3 Max
Average smoking intensity: 10 cigarettes per day
(Pd = 2.04%)
T mal 56.45 ± 5.74 30.93 54.19 56.79 58.72 84.85
T reg 64.00 ± 5.84 38.41 61.56 64.29 66.24 92.80
T dist 65.33 ± 5.78 41.39 62.96 65.57 67.57 94.42
T diagn 66.84 ± 6.21 44.09 64.79 66.89 69.37 92.79
T death 71.48 ± 6.08 65.01 67.23 69.33 73.28 93.08
Average smoking intensity: 30 cigarettes per day
(Pd = 3.33%)
T mal 55.57 ± 5.45 31.83 53.43 56.20 58.27 84.73
T reg 63.11 ± 5.48 40.94 60.89 63.62 65.73 93.42
T dist 64.46 ± 5.49 41.27 62.26 65.04 67.02 95.02
T diagn 66.04 ± 6.11 44.26 64.38 66.55 69.04 93.36
T death 71.05 ± 5.75 65.00 67.02 69.21 72.68 93.47
Average smoking intensity: 50 cigarettes per day
(Pd = 3.93%)
T mal 55.07 ± 5.77 33.56 52.85 55.89 58.08 85.28
T reg 62.60 ± 5.80 40.81 60.32 63.34 65.58 91.77
T dist 63.95 ± 5.78 43.24 61.61 64.66 66.94 94.25
T diagn 65.53 ± 6.55 44.13 64.04 66.37 68.87 92.21
T death 71.14 ± 5.78 65.00 67.02 69.19 72.99 92.37
46
Table 2.12: Female, 65 years old, former smoker, starting and quitting smoking at20 and 40 years old respectively
Mean ± SD Min Q1 Q2 Q3 Max
Average smoking intensity: 10 cigarettes per day
(Pd = 0.17%)
T mal 62.70 ± 12.12 34.02 56.86 63.35 72.07 82.97
T reg 70.25 ± 12.17 40.38 63.76 71.31 79.84 91.54
T dist 71.56 ± 12.20 44.28 64.84 72.03 81.12 92.89
T diagn 72.23 ± 12.27 44.40 66.72 73.99 82.43 92.56
T death 76.93 ± 7.85 65.09 69.67 75.81 84.00 93.45
Average smoking intensity: 30 cigarettes per day
(Pd = 0.19%)
T mal 60.38 ± 13.62 33.13 53.55 62.48 71.32 85.05
T reg 67.93 ± 13.52 39.81 60.69 69.60 78.71 91.77
T dist 69.21 ± 13.45 42.34 62.65 70.98 80.07 93.86
T diagn 70.42 ± 13.57 44.33 64.65 72.39 81.60 91.65
T death 76.23 ± 7.92 65.16 69.34 74.61 83.36 92.25
Average smoking intensity: 50 cigarettes per day
(Pd = 0.24%)
T mal 55.47 ± 15.15 31.80 38.86 58.37 69.26 82.28
T reg 63.10 ± 14.99 40.25 46.75 65.66 76.62 89.58
T dist 64.44 ± 15.01 41.03 48.12 67.09 77.73 90.86
T diagn 65.51 ± 15.37 44.12 48.10 68.82 78.87 91.66
T death 74.28 ± 7.45 65.04 67.78 72.33 80.72 91.91
47
Table 2.13: Female, 65 years old, former smoker, starting and quiting smoking at 20and 50 years old respectively.
Mean ± SD Min Q1 Q2 Q3 Max
Average smoking intensity: 10 cigarettes per day
(Pd = 0.24%)
T mal 56.67 ± 12.29 32.71 46.66 54.35 66.48 85.00
T reg 64.28 ± 12.24 40.34 54.11 61.94 73.96 92.43
T dist 65.67 ± 12.24 41.13 55.42 63.54 75.32 94.96
T diagn 66.21 ± 12.82 44.24 54.70 65.43 76.50 92.51
T death 75.72 ± 6.82 65.11 70.37 75.19 79.83 92.61
Average smoking intensity: 30 cigarettes per day
(Pd = 0.37%)
T mal 52.59 ± 12.07 32.50 44.02 48.34 61.33 85.63
T reg 60.10 ± 12.02 40.52 51.48 55.92 68.72 93.58
T dist 61.45 ± 12.08 40.93 53.02 57.03 70.03 94.06
T diagn 61.97 ± 12.69 44.09 52.13 57.28 72.62 93.68
T death 74.43 ± 6.51 65.00 68.81 74.06 78.27 93.68
Average smoking intensity: 50 cigarettes per day
(Pd = 0.58%)
T mal 50.30 ± 11.44 34.03 42.40 47.11 54.07 84.17
T reg 57.88 ± 11.45 40.30 49.91 54.73 61.67 92.65
T dist 59.14 ± 11.44 42.00 51.28 56.00 63.38 93.55
T diagn 59.68 ± 12.12 44.12 50.60 55.71 65.86 93.61
T death 73.65 ± 6.17 65.03 68.67 73.24 77.43 93.72
48
Table 2.14: Female, 65 years old, former smoker, starting and quitting smokingsmoking at 20 and 60 years old respectively
Mean ± SD Min Q1 Q2 Q3 Max
Average smoking intensity: 10 cigarettes per day
(Pd = 1.01%)
T mal 56.64 ± 6.41 35.88 54.16 56.70 58.84 85.77
T reg 64.14 ± 6.39 43.36 61.33 64.21 66.39 91.84
T dist 65.49 ± 6.42 44.03 62.83 65.53 67.63 93.95
T diagn 66.85 ± 6.90 44.46 64.59 66.89 69.59 92.70
T death 71.63 ± 6.11 65.00 67.28 69.60 73.80 93.24
Average smoking intensity: 30 cigarettes per day
(Pd = 1.94%)
T mal 55.09 ± 6.13 32.88 53.07 55.93 58.24 83.79
T reg 62.61 ± 6.13 41.27 60.30 63.37 65.69 91.65
T dist 63.92 ± 6.12 42.08 61.81 64.76 66.99 92.46
T diagn 65.44 ± 6.89 44.15 63.58 66.45 68.90 92.70
T death 71.27 ± 5.60 65.01 67.34 69.49 73.30 93.07
Average smoking intensity: 50 cigarettes per day
(Pd = 2.49%)
T mal 54.38 ± 6.33 29.45 51.98 55.50 57.86 83.55
T reg 61.98 ± 6.38 36.55 59.36 63.06 65.40 93.28
T dist 63.29 ± 6.34 37.82 60.85 64.36 66.71 93.38
T diagn 64.74 ± 7.13 44.14 62.34 65.98 68.62 91.61
T death 71.34 ± 5.79 65.00 67.01 69.50 73.57 92.41
49
Figure 2.4: MSM predicted lung cancer survival for former smokers, 65 years old,starting smoking at age 20.
50
2.4 Discussion
The main purpose of the natural history MSM for lung cancer, that we developed in
this thesis, was for this model to serve as a tool to the exploration of the statistical
properties of micro-simulation models in general. To this end we developed a sim-
plified yet valid model that follows current practices in micro-simulation modeling
while at the same time adequately describes the natural history of the disease.
The MSM aimed at combining some of the best practices currently followed in this
domain, while it remains simple enough to serve as an efficient tool for the exploration
of the statistical properties of this type of models. It is a continuous time MSM,
namely the events can take place at any time point. Depending on the degree of
discretization, this assumption is sometimes more reasonable than the very restrictive
one of fixed time lengths imposed by a discrete time MSM .
Furthermore it combines some of the most widely used models for the description
of several distinctive stages of the natural history of lung cancer, including both
biological and epidemiological models. More specifically, it uses the biological Two
Stage Clonal Expansion (TSCE) model (75) to describe the risk for the onset of
the first malignant cell. In particular, the model employs the exact solutions for
the expression of the hazard rates and the survival probabilities. Moolgavkar and
Luebeck (74) comment on the inaccuracy of the approximations that can lead to
serious data misinterpretation and they emphasize on the need to use the exact
solutions instead.
Moreover, the model employs the Gompertz function to simulate the tumor growth.
Several studies have shown that this distribution fits available data well, hence it is
preferable for simulating this process compared to other distributions found in the
51
literature (e.g., exponential). Finally the model breaks down the time from the local
stage to death (lag time) into three time sub-intervals (local to regional, regional to
distant, distant to death) instead of just assuming, e.g. fixed or Gamma distributed
lag time (40; 72). This approach enables a more detailed representation of the natural
course of lung cancer, hence a more accurate prediction of the times to the events of
interest.
However, perhaps the most attractive feature of this MSM is that it is developed in
R. The development in an open source, statistical package enhances the transparency
of the model and facilitates research on the statistical properties of MSMs in general.
Using ad-hoc estimates for the model parameters (as described in section 4), we make
predictions for hypothetical scenarios by running multiple micro-simulations for each
case. The results seem plausible compared to what was expected based on relevant
studies and reports about lung cancer. We note, for example, higher lung cancer
mortality in men compared to women, as well as positive correlation of the negative
smoking effects, on the course of lung cancer, with the total smoking duration and
intensity.
These examples are only provided as an indication that the model performs rea-
sonably well. Large deviations from the truth are attributed to inadequacy of the
ad-hoc values for the MSM parameters to reproduce real numbers. Thorough cali-
bration exercises are necessary to achieve proximity between MSM predictions and
real data. This is one of the main objectives of the next chapter, where we will
perform a thorough calibration and validation of this MSM using real data.
Some of the most serious limitations of this MSM are that it does not involve any
screening or treatment component, as well as it does not take into account the
detection of benign lesions. Moreover, the complexity of the model was kept to
a minimum because the main objective of this chapter was to develop a streamlined
52
MSM that sufficiently describes the natural history of lung while it can serve as a
handy tool for the exploration of the statistical properties of MSMs in general.
Improvement of the MSM with respect to those limitations were beyond the scope
of this thesis. However, working examples demonstrate potentiality of the model to
be used in real life scenarios. In this perspective future work includes enhancement
of the MSM performance by increasing the level of complexity and incorporating
additional components in it, e.g., screening and treatment components. Another
immediate goal is to refine the R code and publish it in the form of a library into the
CRAN package repository of the open source statistical software, R. This will enhance
the transparency of the model, and will give the opportunity to many potential users
to use it, either a a tool for further research and development of statistical methods
related to micro-simulation models, or in order to simulate entire populations or sub-
groups of patients and assist, for instance, decision making in lung cancer research.
53
Chapter 3
Calibration methods in MSMs - a comparative anal-
ysis
The second chapter of this thesis pertains to the calibration of micro-simulation mod-
els. The main goal is to provide a comparative analysis of two different approaches to
this problem, a Bayesian and an Empirical one. The Bayesian calibration adapts the
methodology described in Rutter et al. (90). The Empirical method aims at com-
bining broadly applied practices for empirically calibrating MSM parameters (92).
Both methods are implemented for the calibration of the streamlined MSM for the
natural history of lung cancer, developed in the previous chapter. The entire proce-
dure is conducted using the open source statistical software R.3.0.1. The comparative
analysis comprises graphical, qualitative and quantitative discrepancy measures of
the results the two methods produce. This is a first attempt of a thorough compar-
ison between two calibration methods in the context of MSMing, with focus on the
statistical aspects of these procedures. The chapter results in suggestions about the
best method, under certain circumstances, based on the overall assessment of the
calibration results according to these measures.
The chapter begins with the description of the two calibration methods that will
be implemented. It continues with detailed discussion on the serious computational
54
restrictions related to the implementation of the calibration of the MSM in R. Em-
phasis is put on the need for HPC techniques to deal with the particularities of the
involved code. It follows the description of the simulation study design conducted
for the purposes of the comparative analysis, along with detailed results from this
analysis. The chapter concludes with some general remarks about the performance
of the two calibration methods with respect to both, MSM validation as well as the
computational requirements and restrictions imposed be each method. We comment
on the advantages and disadvantages of the two methods, and we refer to future
work related to this chapter.
3.1 Background
3.1.1 Calibration vs estimation in statistical theory
Calibration pertains to the specification of model parameters to fit observed quan-
tities of interest. The term has many instances in the statistical literature and is
closely related to the development of stochastic predictive models. Calibration is also
used in the context of fitting complex deterministic mathematical models Kennedy
and O’Hagan (2001); Campbell (2006). The terms ”calibration”, ”estimation” and
”model fitting”, are often used interchangeably in the modeling literature Vanni
et al. (2011). In the context of ordinary statistical modeling (e.g. generalized linear
models), calibration is considered an ”inverse prediction” problem. Simply stated,
the question for a new value of the response variable is what set of values for the
predictor variables in the model could result in the quantity of interest with high
probability. Moreover, this parameter model specification usually refers to point es-
timation rather than distribution characterization of the model parameters. In this
thesis, we consider calibration as a “model tuning” procedure aiming at specifying
those sets of model parameter values which, when used as model inputs, can pre-
55
dict with a desired amount of accuracy the pre-specified target summaries from the
available data.
In the context of the specification of MSM parameters, calibration seems more rel-
evant than estimation. This is because in micro-simulation modeling it is possible
more than one set of parameter values to reproduce results close to the observed
quantities of interest. In addition some of the model parameters represent latent
variables (i.e., unobserved quantities); hence, model identification problems may
arise. Therefore, purely analytical estimation procedures aimed at finding the opti-
mal set of parameter values that fit best the observed data (e.g. MLE) are not useful
in the specific problem. Identifying the optimal set of parameter values instead is
preferable since these sets can provide an idea of the underlying correlation structure
of the model parameters. In addition, those sets can be used in order to capture and
express the model parameter uncertainty in the produced outputs.
According to Vanni et al. (2011) the goal of a calibration process is manifold and in-
cludes specification of unobserved/unobservable model parameters, parameters that
are observed with some level of uncertainty, correlation among the model parameters
(both observable and unobservable) as well as approximation of the joint distribution
of the model parameters. This last goal can be achieved only if the result from the
calibration process is more than one combination of parameter values. The set of
all plausible combinations of values can be used as an approximation of both the
marginal as well as the underlying joint distributions of the MSM’s parameters and
outputs. This result of the calibration procedure is extremely useful in the context
of MSMing. Unlike typical statistical models, e.g., generalized linear models, where
the output variable is directly expressed as a function of the model’s parameters and
covariates, usually in MSMs there is no closed form expression of the relation among
the model input, output, and parameters. On the contrary, it is very difficult to
identify and quantify the correlation mechanisms that govern the model’s structure,
56
because of the complicated relationships dominating the process described by the
MSM. This complexity also often give rise to identifiability problems.
3.1.2 Calibration methods for MSMs
Vanni et al. (2011) provide a systematic overview of the calibration procedure that
should be followed in the development of economic evaluation mathematical models
in general. According to this paper, the calibration procedure comprises decisions on
seven essential steps, i.e., decision on the model parameters to calibrate, calibration
targets, goodness of fit (GOF) measure(s), search strategy among the range of possi-
ble parameter values, convergence criteria, stopping rule for the calibration process
as well as integration, presentation and use of the model calibration results.
Several methods have been proposed in the literature specifically for the calibration
of MSMs in medical decision making. Stout et al. (106) classify model parame-
ter estimation methods currently used in cancer simulation models in two broad
categories: the purely analytical methods and the calibration methods. This clas-
sification is also relevant in the context of micro-simulation modeling calibration.
Purely analytic methods refer to direct estimation of the model parameters (e.g.
MLE(3; 22; 87; 75)) without reference to model fit. On the contrary, calibration meth-
ods result in model parameters based on an efficient search of the parameter space
and can be further categorized into undirected and directed methods. Undirected
methods involve an exhaustive grid search (65; 59) of the parameter space or grid
search using some sampling design (e.g., random sampling (23; 4; 115; 50), Latin Hy-
percube Sampling (LHS)(49; 5; 69; 95), etc.). Directed methods, on the other hand,
aim at finding the optimum set of parameter values using, for example, the Nelder-
Mead(77; 11; 18) or some other optimization algorithm (118; 107; 53; 52). In addition to
the two aforementioned broad calibration categories Bayesian(90; 117; 14) calibration
methods are also often used in micro-simulation modeling.
57
We could further split the various calibration methods of complex models into em-
pirical and theoretical. The characteristics that actually differentiate these two cat-
egories lie on the nature of the searching strategy, the convergence criteria, the stop-
ping rule, as well as on the interpretation of the produced results. In an empirical
method, for instance, the searching strategy usually involves some sort of random
sampling within the multivariate parameter space, the convergence criteria and the
stopping rules are usually arbitrary (sometimes even based on convenience), while
the interpretation of the results (set(s) of values for the calibrated model parame-
ters) is often abstruse. Theoretical methods, on the other hand, involve structured
searching strategies and stopping rules (e.g., optimization algorithms, Gibbs sampler,
etc.), while the interpretation of the results is easier and based on sound theoretical
background (e.g., joint posterior distribution of calibrated parameters in Bayesian
calibration).
3.1.3 Assessing calibration results
Calibration methods aim at resulting in models that fit well observed data. Hence-
forth the evaluation of a calibration method is closely related to model’s validation.
There are several means, qualitative and quantitative, to assess the performance of
a predictive model. Within the scope of MSMing, to our knowledge, no systematic
work has been carried out yet on the assessment of a calibrated model. Usually
the performance of a calibrated MSM is evaluated only with plots that compare the
MSM predictions with the respective observed data (4; 94; 23). In such situations,
the conclusions about the quality and adequacy of the MSM are arbitrary and en-
tail a certain amount of subjectivity. Plots should be rather used in conjunction
with measures (GoF statistics) that quantify this deviation of the MSM outputs
from the observed quantities of interest. The most popular among the quantitative
measures applied for MSM validation is the chi-square GoF statistic. Bayesian cal-
58
ibration methods provide additional means for assessing the overall performance of
a calibrated MSM, i.e. comparison of the observed quantities of interest (calibra-
tion targets) with the corresponding posterior predictive distributions. Other GOF
measures employing, e.g., profile likelihoods etc, are also suggested in the literature
(18).
59
3.2 Methods
3.2.1 Notation
In this section, we introduce some notation that will be used throughout the remain-
ing of this document.
M(θ) : micro-simulation model
θ = [θ1, θ2, . . . , θK ]T : vector of model parameters
Z = [Za, Zg, Zs, Zd]T : vector of covariates (baseline characteristics) with age (Za),
gender (Zg), smoking status (Zs), and smoking intensity (Zd),
as average number of cigarettes smoked per day
Y = [Y1, Y2, ..., YJ ] : vector of data, i.e. summary statistics found in the literature
that describe quantities of interest in the natural history model
π(θ) : joint prior distribution of θ
π(θk) : prior distribution of θk
h(θ|Y,Z) : joint posterior distribution of θ
f(Y|g(θ),Z) : data distribution. This distribution depends on a function g(·)
of the model parameters θ and the model covariates.1
h(θk|θ(−k),Y,Z) : full conditional for parameter θk given θ(−k),Y and Z
θ(−k) : the θ vector excluding the θk component (k = 1, 2, ...K
and K= total number of MSM parameters)
Mm(θ,SN) : MSM predictions after running the model m times in total on
the input sample (SN) of size N, given θ
1Note: the functional form of g(·) is unknown and hard to specify.
60
3.2.2 Bayesian Calibration Method
The first method employs Bayesian reasoning to the calibration of MSMs. The goal
is to use a sound way to incorporate both prior beliefs about the MSM parameters,
and observed data found in the literature of lung cancer natural history, in the
MSM calibration procedure. To this end we apply the Bayesian calibration method
described in detail in Rutter et al. (90), aimed at drawing values from the joint
posterior distribution h(θ|Y ) of the model parameters. This method essentially
involves a sufficiently large number of Gibbs sampler iterations that result in draws
from the full conditional distributions h(θk|Y, Z). Due to the model’s complexity
the algorithm also involves embedded approximate Metropolis-Hastings (MH) steps
within each Gibbs sampler iteration in order to draw from the unknown forms of the
full conditional distributions.
In particular, within each Gibbs sampler step, we implement multiple iterations of a
random-walk Metropolis-Hastings algorithm. Given a symmetric jumping distribu-
tion the MH-algorithm accepts a new value for θ∗i with transition probability:
a(θk, θ∗k) =
min(ri(θk, θ∗k), 1) if πk(θi)
∏Jj=1 fj(yj|g(θ)) > 0
1 if πk(θi)∏J
j=1 fj(yj|g(θ)) = 0(3.1)
Assuming that the micro-simulation model (M(θ)) and the data distributions f(Y , g(θ))
are correctly specified, we use M(θ) to simulate M draws from fj(Yj, gj(θ)), where j
indicates the j-th covariate class. We use the maximum-likelihood estimation (MLE)
to estimate the data distribution parameter (e.g., for Binomial and Poisson counts
the estimate of gj(θ) is the average: gj(θ) = 1M
∑Mi=1 Yij). We then use these esti-
61
mates to calculate the transition probability function α(θ,θ∗) based on:
rk(θk, θ∗k) =
πk(θ∗k)∏J
j=1 fj(yj|gj(θ∗k,θ(−k)))
πk(θk)∏J
j=1 fj(yj|gj(θ))(3.2)
The Bayesian calibration method results in a V×K matrix of calibrated values, de-
noted as ΘBayes, the rows of which represent a random sample from the joint posterior
distribution h(θ|Y ) of the MSM parameters. This sample is used to provide esti-
mates about both the posterior distributions of the calibrated MSM parameters, as
well as the posterior predictive distributions of the quantities of interest.
3.2.3 Empirical Calibration Method
Several empirical calibration methods for micro-simulation models have been sug-
gested in the literature (section 3.1.2). Most of them comprise some type of sampling
for searching the multidimensional parameter space, stipulation of some proximity
measure between observed and predicted quantities of interest and selection of a set
of parameter vectors satisfying pre-specified convergence criteria. In many cases the
result from an empirical calibration procedure is a set rather than a single parameter
vector for the calibrated model.
For the development of our generic empirical calibration method we focus on two key
elements of the procedure, i.e., the searching algorithm and the convergence criteria
involved. In this section we describe how we combine popular practices, found in the
literature of MSMing, so as to create a generic Empirical calibration method that
will be compared with the Bayesian method previously described.
When the model dimensionality permits, it is possible to use extensive grid search
algorithms to search the parameter space (65). For models comprising many param-
eters (as is usually the case in micro-simulation modeling) random sampling rather
62
than extensive grid search algorithms is preferred for searching the parameter space
(23; 4; 115; 50). Alternatively, another more efficient sampling scheme can be used
in order to sample from the multidimensional parameter space, namely the Latin
Hypercube Sampling (LHS) (69; 104).
63
The LHS method was introduced by McKay et al. (69) as an extremely efficient
sampling scheme that outperforms the simple random and the stratified sample.
LHS and its variations (16) increases the realization efficiency of the algorithm while
preventing the introduction of bias and reducing the effect of extreme values in the
resulting estimates. Another very attractive feature of the LHS is that it allows
for characterizing the uncertainty and conducting sensitivity analysis of complex
deterministic or stochastic models
Application of the method is met in several instances of model calibration in medical
research (49). In Blower and Dowlatabadi (5) we find an application of the LHS
on a deterministic complex model as a technique to explore the uncertainty of the
parameter values on the predicted outcomes. Another very interesting application of
the LHS is found in Cronin et al. (13) where this method is used in conjuction with
a response surface analysis as an efficient way to explore the parameter space and
investigate the relationship between the parameter values and the respective model
outputs.
The second very important feature of the calibration procedure we focus on is the
specification of the convergence criteria to identify acceptable parameter sets. The
most commonly used discrepancy measures in the context of calibrating complex
models are χ2 and likelihood statistics (115). These two are also the most typi-
cal measures used for the overall assessment of the calibrated model fit. However,
noteworthy is the fact that in many instances of empirically calibrated complex
models, the assessment of the overall model fit is completely arbitrary and based
solely on graphical comparisons between observed and predicted quantities of inter-
est (94; 23; 4).
64
Latin Hypercube Sample
Before continuing to the description of the Empirical Calibration method, we first
discuss the particularities of the Latin Hypercube Sampling (LHS) design.
Let θk ∈ Rk, where Rk is the range of plausible values for θk. We divide the Rk into
N equiprobable (according to the pre-specified distribution we assume for each θk)
intervals, and we assign the integers 1 through N to each one of them. We create
a sequence of K vectors each of which is a random permutation of the 1, 2, ..., N
integers. For each θk we randomly draw a value from the indexed interval according to
the K vectors of random permutations previously created. Alternatively the middle
point of each interval could be used. The result of this procedure is an MN×K matrix
with columns the k vectors of random values for each of the model parameters. The
mik element of this matrix corresponds to the value extracted from the ith indexed
interval of the θk variable. The ith row of the matrix is a sample point of values from
the parameters space. The MN×K matrix is the Latin Hypercube Sample extracted
from a single replication of this sampling design.
In the Empirical calibration method we implement the LHS design as a more effi-
cient searching algorithm of the multi-dimensional parameter space than the simple
random sampling. The goal from a single implementation of this design is to collect
a sample of NLHS values for K parameters (where NLHS is the size of the LHS de-
sign). To this end, the range of each parameter is divided into NLHS equiprobable
(according to the pre-specified underlying distribution) intervals. For each parameter
we create a different permutation of the NLHS intervals, and we subsequently draw
a value from each corresponding interval, following the underlying distribution. In
particular, we utilize the ’maximinLHS ’ R function (lhs library), aimed to optimize
the collected sample by maximizing the distance between the design points. The set
of NLHS points (i.e.,vectors of parameter values) is the sample extracted from the
65
multivariate parameter space using the LHS design.
Figure 3.1: Single implementation of LHS of size NLHS=5 for extracting values froma 2-dimensional parameter space (θ1 and θ2).
Figure 3.2: Single implementation of LHS of size NLHS=5 for extracting values froma 2-dimensional parameter space (θ1 and θ2).
Figures 3.1 and 3.2 present examples of the application of the LHS design in two66
dimensions. In each of these examples the LHS is used to extract a sample from the
bivariate space stipulated by two of the MSM parameters to be calibrated, i.e., θ1=m
∈ [0.00001, 0.0016], and θ2=mdiagn ∈ [0.0001, 8]. The grid indicates the partition
of the bivariate space based in equiprobable marginal intervals for each parameter.
The dots in each graph represent the set of points of the latin hypercube sample.
The figure depicts four samples for different sizes (NLHS=5, and 20) and different
extracted points from the individual intervals (center vs random).
A limitation of the LHS is that the single implementation of this design can only
result in a restricted number (NLHS) of vectors for the parameter values, hence
rendering it inefficient for searching of the multi-dimensional parameter space of
an MSM. To overcome this obstacle we suggest the recurrent implementation of
the aforementioned design to collect a large enough sample for the purposes of the
calibration procedure.
Description
The second method combines some basic concepts of empirical calibration procedures
found in the literature of MSMs, which are based on random search of the parameter
space. It further suggests the adoption of the LHS design as a more efficient tool
for searching the multi-dimensional parameter space. In particular, this empirical
method implements the LHS design multiple times to extract a large number of sets
of parameter values. This sample is then checked for ”acceptable” sets, i.e., for sets of
parameter values that produce model outputs close to the observed ones. The goal,
is with this method to eventually collect a sample representative of the underlying
population of all the ”acceptable”, according to some convergence criteria, sets of
parameter values.
Let Y ∼ f(Y|Λ) the data of interest (calibration data). We implement the LHS
design L times in total. Since each repetition of the LHS provides NLHS sets of
67
parameter values (where NLHS is the size of the LHS design), this empirical cali-
bration method essentially results in Nemp = NLHS × L sets of parameter values in
total. For each set of parameter values we run the MSM model M(θ) a sufficient
number of times, M, and we calculate estimates of the data distribution parameters
gj(θ) = 1M
∑Mm=1 Ymj (as in the Bayesian calibration method). Given these estimates
we calculate the log-likelihood as:
l(g(θ)|Y ) =J∑j=1
lj(g(θj)|Yj)
We want to check the null hypothesis Ho : Λ = Λ0, where Λ0 is the vector with the
calibration targets versus the alternative H1 : Λ 6= Λ0. For this check we use the
deviance statistic:
D = −2[l(g(θ)|Y )− l(Λs|Y )
]= −2
J∑j=1
[l(g(θj)|yj)− l(λsj|yj)
](3.3)
where l(Λs|Y ) is the likelihood of the saturated model.
Under H0 the deviance statistic D follows a chi-square distribution with ν degrees
of freedom, one for each tested mean in the calibration target vector. Among the
sets of θ values for which H0 is not rejected, we randomly draw V (to match the
Bayesian procedure) vectors of parameter values. Hence, the result of the empirical
calibration method is again a V×K of calibrated values, denoted as ΘEmp, the rows
of which represent a random sample from the population of all ”acceptable” sets
of parameter values according to the pre-specified convergence criterion (e.g., here
the population of parameter values resulting in the higher log-likelihood given the
calibration data). These calibrated values can be used in a way analogous to the one
suggested for the Bayesian calibration results, in order to provide estimates of the
empirical distributions of the calibrated MSM parameters, as well as the empirical
distributions of the predicted quantities of interest.
68
3.2.4 Calibration outputs: interpretation and use
An important aspect of the calibration of a MSM is what the anticipated outputs
of this procedure should be. To answer this question we have to consider both the
conceptual aspect of the problem as well as real life practice. In the comparative
analysis, presented here, we suggest the results from both methods to be a collection
of parameter vectors rather than simple point estimate for each MSM parameter.
This type of calibration output is preferable, especially in the case of complex MSMs
for several reasons.
First of all, the nature of the problem itself dictates this form of calibration output.
As already mentioned (section 3.1.1) MSM’s complexity renders the parameter spec-
ification to be a calibration rather than a point estimation problem. It is possible for
more than one set of parameter values to produce equivalent outputs, i.e., predictions
”close” to what has been observed. Therefore, we rather wish to collect a sample of
these equivalent sets rather than finding the set that maximizes some convergence
criterion. Second, the matrix with the calibrated values can reveal interesting rela-
tionships between the MSM parameters usually representing unobservable (latent)
variables. Understanding these relationships may also be useful for the improvement
of the model’s structure, in order for the MSM to better describe the underlying
process and, therefore, to enhance the predictive ability of the model. Third, by
using a matrix of calibrated values rather than point estimates of the MSM parame-
ters, we are able to capture a major source of MSM uncertainty, i.e., the parameter
uncertainty, and convey the effect it has on the final results.
The Bayesian method results in the ΘBayes matrix of calibrated values representing
a sample from the joint posterior distribution of the MSM parameters given the data
(calibration targets). The Empirical method results in the ΘEmp matrix, essentially
comprising a sample of vectors from the joint distribution of the ”acceptable” param-
69
eter values, namely those fulfilling the convergence criteria. In both cases the matrix
of calibrated parameter values can be used in order to fulfill the aforementioned
purposes of presenting the MSM characteristics (joint and marginal distribution of
the model parameters) as well as model predictions of the quantities of interest. In
particular, for each one of these sets of parameter values (i.e., for each row of the
ΘBayes or ΘEmp matrix) we run the model M times and we summarize the results
in order to estimate the quantities of interest, given a specific input sample SN . We
denote Y = MM(Θ,SN) the predictions from a calibrated MSM with Θ matrix of
values for the calibrated MSM parameters and input sample SN . Averages, medians,
etc, can be used as point estimates, while measures of variability such as variance,
interquartile range etc, provide an indication of the model uncertainty including,
sampling variability, and parameter uncertainty.
70
3.3 High Performance Computing in R
3.3.1 Software for MSMs
There is a wide range of programming languages for the development of MSMs.
Kopec et al. (2010), in their comprehensive review about the quality of MSMs used
in Medicine, provide a list of programming languages and existing toolkits currently
used for the implementation of MSMs. Java, C #, C++ are very popular languages
for the development of such models. Other toolkits, such as TreeAge, are also met
in the MSM bibliography. There are also some MSMs (MicMac(24), JAMSIM(66))
that embed the R statistical programming language, only, though, to provide the
user with the enhanced statistical and graphical capabilities of the R package for
post-simulation processing. This means that the R software is only involved in the
analysis of the MSM outputs rather than the actually micro-simulations.
For reasons explained here, the streamlined MSM for the natural history of lung
cancer is written in R. To our knowledge, this is the very first attempt to develop
and implement any MSM exclusively in R. The R open source statistical software is
being widely used by many statisticians from the entire statistical spectrum. The
implementation of an MSM in R, not only allows the wide use of this new, very
attractive technology in medical decision making, even by people not very familiar
with this field, but it also enhances the transparency of the model and, facilitates
the research and development of statistical methods related to this technology.
The release of the code, e.g., in the form of a special library in the open source
statistical software R, is a feature very attractive, especially to model developers, who
can actually read the codes, and thus understand the particularities and compare
the structure of similar MSMs. Researchers, on the other hand, unfamiliar with
the technical details of an MSM, who intend to use the model as a tool in medical
71
decision making, e.g., to simulate and make predictions for large cohorts, are highly
interested in being able to simulate and compare different scenarios. This is another,
perhaps more powerful aspect of model’s transparency related to the release of the
freely available relevant source code.
The streamlined MSM, that describes the natural history of lung cancer, can provide
a handy tool for exploring the statistical properties of MSMs in general.
Although very exciting and attractive the idea of writing an MSM in R, the im-
plementation can prove to be a daunting task. Even the term ”micro-simulation”
modeling predisposes for extensive computations and rather time consuming pro-
cesses. A simple implementation of the model, e.g., to make predictions for a single
person or even for a relatively small sample of persons (as in the case of the tables
presented in the first chapter) although not instantaneous, is definitely a feasible
and relatively easy task to carry out. However, the development of such a model
from scratch requires, among others, calibration and overall assessment of the model
(goodness of fit tests, validation, etc.), namely processes that can prove hard to
design and implement and extremely time consuming to run. In the following para-
graphs we attempt to give an idea of what the computational burden, in terms of
the required running times, for such processes can be. To this end, we provide as an
example our experience from the implementation of the two calibration methods for
the comparative analysis, described in this chapter.
3.3.2 Example: computational burden of two MSM calibra-
tion methods
The objective of the second chapter is the calibration of the streamlined MSM for
the natural history of lung cancer, with two different methods, a Bayesian and an
Empirical one. Trying to keep this problem as simple as possible, we focus our
72
interest on only four MSM parameters. As outlined in the description of the two
methods section, each calibration procedure aims at the identification of the most
suitable among a total of 100,000 vectors of parameter values for each.
The Empirical calibration method entails the simultaneous check of the values in
a candidate parameter vector. In our case, the whole procedure involves testing
of 100,000 vectors of parameter values in total. In addition, each vector drawn
from the multi-dimensional parameter space is totally independent from the others.
The Bayesian calibration method is a little bit more complicated in that it requires
sequential check of parameter values. That is, each parameter chain update depends
on the suggested parameter values in the previous step. In our case, the Bayesian
calibration method entails testing of 4*100,000 parameter values in total. Therefore,
the architecture of the Bayesian calibration method allows parallelization of the
process to a much more restricted extent.
As described in the simulation study section, in our example, each calibration update
entails the implementation of the micro-simulation model M=10 times on a sample of
N=5000 people. That is, checking one combination of parameter values (Empirical
calibration) or one parameter value (Bayesian calibration) requires 50,000 micro-
simulations in total. Hence, for 100,000 updates of all (four) MSM parameters, we
need 50,000·100,000=5·109 and 50,000·100,000·4=2·1010 micro-simulations for the
Empirical and the Bayesian calibration method respectively. Given these numbers
we realize how time consuming the implementation of just a single calibration method
can be, let alone a comparative analysis between two of them.
These numbers ascertain that the calibration of an MSM falls into the ”embarrass-
ingly parallel” category of computational problems (89), meaning that the entire task
can be split into numerous, completely independent, repeated computations, each of
which can be executed by a separate processor in parallel. Hence, instead of ”end-
73
less” running times, an ”embarrassingly parallel” procedure, as the calibration of an
MSM, can be approached using high performance computing (HPC) techniques, and
run within plausible times. A closer look at table 3.1, that presents the required
times to run M·N micro-simulations under different settings, verifies the fact that in
the absence of HPC the calibration of an MSM is simply impossible.
3.3.3 Parallel Computing
In order to overcome the time limitations posed by the extensive computations in-
volved in calibrating an MSM, we harness the idea of parallel computing. This can
be achieved by distributing the independent computations simultaneously to multi-
ple computer clusters (nodes) that we have set up for this purpose. These clusters
may comprise only a single machine with one or more processors, or even multiple
machines connected by a communications network. Hence we distinguish between
two major types of parallelization; the implicit and the explicit one, depending on
the composition of the computer clusters used. It is crucial to decide upon the
type of parallelization (available in R) to work with, so as to maximize the benefit
from using advanced techniques of high performance computing developed for this
statistical software.
Tierney (2008) describes the notions of the implicit and explicit parallel computing
within the R context. According to this paper, implicit parallelization pertains basi-
cally to exploiting multiple processors of one machine as well as internal R functions
to speed up calculations (e.g., vectorized arithmetic operations, ’apply-’like functions,
etc.). It essentially takes advantage of the parallelism inherent in the program. This
method does not require any special intervention (set up) from the user, hence it is
much easier to implement and can prove very beneficial especially for large vectors
(e.g., n>2000). Nevertheless, as it can be seen from the example provided in table
3.1, the implicit parallelization can only provide the researcher with a limited ability
74
for improving the efficiency of an R program and is definitely not the solution to the
extremely time consuming algorithms problem of the MSM calibration process.
Explicit parallelization on the other hand, provides the user with the ability to set
up computer clusters (multiple computers with multiple processors each) so as to
distribute the independent computations to a wider range of resources in parallel.
Hence, explicit parallelization can substantially improve the efficiency of algorithms
involving ”embarrassingly parallel computations”, as in the case of calibrating an
MSM. This type of parallelization requires more work and certain amount of com-
puter science knowledge to set up the cluster and distribute the algorithm accord-
ingly.
After having improved the required time for one micro-simulation, by using the
most efficient (to the extend of our knowledge) built-in R functions, the next step
is to take advantage of high performance computing techniques so as to carry out
the calibration computations within realistic time intervals. Schmidberger et al.
(2009) provide a comprehensive account of R packages with advanced techiniques for
performing parallel computing in R. According to this paper, the two R packages that
stand out as better serving the implementation of parallel computing on computer
clusters, are ’snow ’ and ’Rmpi ’.
For the purposes of the comparative analysis we will be mainly using the ’snow ’
library to set up computer clusters using the Message-Passing-Interface (MPI) low-
level communication mechanism. This R library has intermediate and high level
functions for parallel computing. For the calibration purposes we make use of the
high level ones, which are basically parallel versions of the ’apply ’ like R-functions.
By using the possibilities the R ’snow ’ package gives for parallel computing, we can
overcome the R single-threaded nature and spread the computational burden across
multiple machines and CPUs (McCallum and Weston (2012)). Information about
75
the ’snow ’ built-in functions can be found in the relevant R documentation for this
package, while some examples for the implementation of parallel computing R using
the snow package can be found in Tierney (2008), and Matloff (2013).
3.3.4 Code architecture
Another very important decision to be made, regarding the problem of improving
the efficiency of the calibration methods, is which chunk of the code should be paral-
lelized. Obviously, the sequential nature of the Bayesian calibration method, leaves
a much smaller scope for parallelization than the Empirical one with the random,
undirected search approach (independent draws of values from the multidimensional
parameter space). This is also the case when comparing the efficiency of undirected
with any directed search method, due to the sequential nature of the later one, since
each step in a directed method depends on the result from the preceding one.
The sequential nature of the Bayesian method drives the decision, for more efficient
results, to perform in parallel the M·N=50,000 micro-simulations involved in each
parameter update. The independent draws, on the other hand, of vectors from the
multi-dimensional parameter space, involved in the Empirical calibration, allow for
greater extent of parallelization, that is only restricted by the size of the induced ta-
bles in comparison to the respective memory limits 2. In our case, we take advantage
of the architecture of the Empirical calibration method to further parallelize the test-
ing of 20 parameter vectors, i.e., M·N·NLHS=50·1000·20=106 micro-simulations in to-
tal (where NLHS is the size of the Latin Hypercube Sampling). Thereafter, in order
to test 100,000 parameter vectors, we need to repeat this procedure 100,000/20=5000
times in total.
However, in order to make the most out of the implementation of parallelization,
2There are several methodologies and respective packages developed in R that harness highperformance computing techniques to deal with large memory, or even out-of-memory data problemsEddelbuettel (2013).
76
we have to make sure that the R code for predicting one trajectory (one micro-
simulation) is the most efficient one. Hence, there is one more step before we move
forward to the implementation of parallel computing, i.e., to improve the efficiency
of the R algorithm for a single micro-simulation. A very helpful R library for this
task is the ’Rprof’. This library provides a set of built-in R functions that enable a
relatively easy profiling of the execution of R expressions.
Tables in Appendix, with R-profiling results, indicate the degree of improvement we
achieved in our code by simply replacing time consuming R structures with their more
efficient counterparts. More specifically, by simply replacing ”data.frame” by ”list”
in all R-code instances we managed to make the program almost twice as fast (e.g.,
from 5.86 the total running time dropped to 2.88 time units). By also replacing the
approximate integration of the hazard function for the onset of the first malignant
cell, by the respective definite cumulative hazard function, we achieved a further
22.2% reduction (from 2.88 to 2.24 time units).
We have described, so far, the gain in the required running time we achieved when
we optimized the R code internally, by simply performing R profiling and improving
its efficiency accordingly. This process involved work on the efficiency of the code’s
architecture, i.e., removing unnecessary computations/parts of the code, replacing
’loops ’ with vectorized R functions, etc. Furthermore, we replaced complicated, time
consuming R functions and structures with more efficient counterparts (e.g., saving
results from a function in ’list ’ instead of ’data.frame’ format). In this way we were
able to substantially reduce the required time to run 50,000 micro-simulations, from
5532.77 secs (≈1.5 hours) to 2114.92 secs (≈35 minutes). However, even with this sig-
nificant improvement, running 50,000·100,000=5·109 micro-simulations to calibrate
just one parameter of the MSM parameters (Bayesian method) or sets of parameter
values requires the absolutely absurd time of 6.7 years (!!!). After optimizing the
efficiency of the algorithm working inside the R code, we drew our attention to HPC
77
techniques, in order to perform parallel computing in R. We focused on the partic-
ularities of each calibration method to reduce the respective computational burden
to the minimum by implementing relevant techniques accordingly.
3.3.5 Algorithm efficiency: Bayesian vs Empirical Calibration
To better understand the gain in the computational burden, as well as to compare
the two methods in terms of their efficiency, we calculate the required running time to
calibrate all four MSM parameters with each method. As already mentioned, in order
to implement the Bayesian calibration method and take one chain of 100,000 values
for the joint posterior of the calibrated MSM parameters we need to repeat the set of
M·N=50,000 micro-simulations, 4·100,000 times in total. Hence the required time for
the Bayesian calibration is 4.26·100,000·4≈19.7 days (table 3.1), if we use a cluster
of 64 nodes (8 computers with 8 cpus each). An analogous task with the Empirical
calibration method requires testing of 100,000 vectors from the parameter space in
total. By further parallelizing the process and simultaneously computing e.g., 20 ·50 ·
1000 = 106 micro-simulations, we can calibrate the four MSM parameters much faster
compared to the Bayesian method3. According to table 3.1, the required time to run
that many micro-simulations in parallel, is 105.4 seconds≈1.7 minutes. To complete
the empirical calibration procedure, we have to run this set of microsimulations
100,000/20=5,000 times in total. Hence the required time to calibrate the four MSM
parameters with the Empirical method is ≈ 6.1 days. Depending on the available
resources we can achieve further reduction in the running times. In our case, for
example, if we further split the Empirical calibration process into three independent
pieces, we can receive the results from this method in ≈ 2 days (i.e., almost 10
times faster than with the Bayesian method). Consequently, we realize that the
3Actually, depending on the available HPC techniques we use and the computer clusters capacity,we can further parallelize this problem and achieve even large reduction in the required time to runthis procedure
78
architecture of this method provides for parallelization to a significant extent, with
corresponding reduction of the required time, unlike the Bayesian calibration method,
or any directed search method. Hence an Empirical method for calibrating an MSM
can be proved much more practical (efficient) than a Bayesian one, with respect to
the required time to run.
Table 3.1 describes improvements in efficiency of an embarrassingly parallel algorithm
involving M·N micro-simulations. This ’journey ’ begins from the completely trivial
case, i.e., performing the computations on a single machine without interfering in the
single-threaded R nature and without taking care of the time consuming R functions
and structures. From this starting point, the process of improving the efficiency of
the algorithm ’travels ’ through the notion of implicit parallelism to eventually reach
the optimum solution by using explicit parallelism and properly set up of computer
clusters (network of multiple computers with multiple processors each) and relevant
HPC R techniques/packages/toolkits. The ultimate gain from this process, in order,
e.g., to perform M·N=50·1000=50,000 micro-simulations, can reach the impressive
number of three orders of magnitude (5532.77/4.26≈1299 (!!!) ). The implementation
of HPC for parallel computing with R was actually what made feasible to calibrate
the MSM for lung cancer using this open source statistical software.
3.3.6 Concluding remarks
The problem of calibrating an MSM in R falls in the category of the ”embarrass-
ingly parallel computations” and necessitates use of high performance computing.
In the previous paragraphs we explain the computational considerations imposed
by the calibration of an MSM R, using as an example the implementation of the
two calibration methods described in this chapter. According to this example the
Empirical calibration method is much more efficient than the Bayesian one, since it
79
Type Nodes M N Time Reduction Notes(secs) Ratio*
- 1
50 1000 5532.77 -
no parallel computingor profiling
100 1000 11065.61 -20 2500 5534.07 -40 2500 11077.26 -
SOCK 1
50 1000 2114.92 2.62
no parallel computingafter profiling
100 1000 4229.86 2.6220 2500 2115.41 2.6240 2500 4234.31 2.62
SOCK 10
50 1000 16.55 334.31implicit parallelcomputing, after
profiling
100 1000 34.82 317.7920 2500 17.24 321.0040 2500 34.25 323.42
SOCK 32
50 1000 15.97 346.45implicit parallelcomputing, after
profiling
100 1000 34.06 324.8920 2500 16.11 343.5240 2500 34.37 322.29
SOCK 50
50 1000 16.02 345.37implicit parallelcomputing, after
profiling
100 1000 33.78 327.5820 2500 16.06 344.5940 2500 34.58 320.34
SOCK 100
50 1000 15.85 349.29implicit parallelcomputing, after
profiling
100 1000 34.20 323.5620 2500 16.10 343.7340 2500 34.41 321.92
MPI 32
50 1000 7.45 742.65explicit parallelcomputing, after
profiling
100 1000 12.93 855.8120 2500 6.19 894.0340 2500 12.91 858.04
MPI 64
50 1000 4.26 1298.77explicit parallelcomputing, after
profiling
100 1000 7.93 1395.411000 1000 105.420 2500 3.31 1671.9240 2500 8.17 1355.85
SOCK: sockets
MPI: Message Passing Interface
* Ratio reduction in the running time achieved compared to no processing (parallel
computing or profiling)
Table 3.1: Algorithm efficiency: Required time (in seconds) to run M·N micro-simulations using different computing capacities.
80
can actually run 10 times faster. This relevant efficiency between the two methods
is also applicable when comparing undirected with directed searching algorithms for
calibration, due to the conceptual similarities they bare with the Empirical and the
Bayesian calibration method respectively.
The most impressive finding from this exercise, was the ultimate gain we achieved in
performing a set of parallel computations, which essentially reaches three orders of
magnitude compared to the initial time, namely the time required before any work in
the architecture of the R code or any type of parallelization. The reported running
times in this section exemplify the imperative need for using HPC methods in order
to render the development of an MSM in R feasible, with all the beneficial effects
such an attempt will have on the overall research in the area.
81
3.4 Comparative Analysis
The main objective of this chapter is the comparison between an empirical and a
Bayesian approach to the calibration problem of micro-simulation models (MSMs)
in medical decision making (MDM). The streamlined MSM, developed in the first
chapter, is used as a tool for the implementation of both calibration methods, de-
scribed in section 3.2 of the thesis. To our knowledge, this is a first attempt of a
comprehensive and systematic comparison of two calibration methods in the context
of micro-simulation modeling. In the following paragraphs we describe the study
design for the quantitative and qualitative comparison of the two methods.
3.4.1 Input Data
The MSM model for the natural history of lung cancer (described in Chapter 2)
takes into account three baseline characteristics, namely the age, gender and smoking
habits in order to predict a person’s trajectory. The smoking habits comprise the
smoking status of the person at the beginning of the prediction period, i.e., current,
former or never smoker, as well as, when relevant, the smoking intensity, expressed as
the average number of cigarettes smoked per day. In order to keep the dimensionality
of the problem to an easily manageable level, we restrict our interest to males, current
smokers.
We combine information found on census data (US 1980 census) and other relevant
statistics (Statistical Abstract of the US, 1980) in order to simulate the baseline
characteristics of a large sample representative of the US population. This large
sample will be the ”pool” from which several sub-samples will be drawn and will be
used as input to the MSM for the purposes of both model calibration and assessment.
Assuming that the entry year is 1980, we predict 26 years ahead and calibrate the
82
MSM to the observed lung cancer incidence rates reported in SEER 2002-2006 data.
We simulate the age distribution based on information found in the US 1980 census
about males, who are current smokers. Given the age group, we simulate the smok-
ing intensity for each individual following the distribution of the average number of
cigarettes smoked per day, as reported in the Statistical Abstract of the US, 1980.
Due to the fact that these tables report the smoking intensity in groups (i.e., <15,
15-24, 25-34, and >34 cigarettes/day) we first draw the smoking intensity category
given age, and then we randomly draw an integer from the selected group assuming
uniform distribution for the smoking intensity within that group. This integer even-
tually expresses the average number of cigarettes smoked per day for the particular
individual.
Following this procedure we simulate a large sample of NL=100,000 individuals rep-
resentative of the 1980 US reference population. This will be our simulated ”true“
population for which predictions about the lung cancer incidence are to be made
using the MSM. For the purposes of both model calibration and validation two sub-
samples will be drawn from this simulated population. In particular, we randomly
draw two sub-samples of size n=5,000 each. The first one will be the input to the
MSM for the implementation of the two calibration methods. We refer to that sam-
ple as ”calibration input” (smpl.C5000). The second one, referred as ”validation
input”, will be used for validating the calibration results (smpl.V5000). Further-
more, other sub-samples will be also randomly drawn from the same, N=100,000,
simulated population, to serve other purposes of the comparative analysis presented
in this chapter (e.g., samples to produce calibration plots, etc).
Table 3.2 presents the age distributions of the samples used as input for the compar-
ative analysis of the two calibration methods. We denote with smpl100,000 the sample
of 100,000 people used for the calculations of the calibration targets (see section
83
Age (years)Input sample
US 1980 Calibration Validation(smpl100,000 ) (smpl.C5000) (smpl.V5000)
17-39 53672 530 55440-44 7078 79 6345-49 6827 65 7050-54 7122 68 8055-59 6699 72 5360-64 5876 60 5465-69 4833 48 5570-74 3503 43 2575-79 2233 22 2380-84 1284 8 16>85 873 5 7
Table 3.2: Age distributions of the samples (input data) used for the comparativeanalysis of the two calibration methods.
3.4.3), smpl.C5000 the sample of size N=1,000 that was used as an input for both cal-
ibration methods, as well as the internal validation of the results, and smpl.V5000 the
sample used for the external validation of the calibrated models. All these samples
are representative of the US 1980 population, i.e., the age and smoking intensity dis-
tributions of these samples resemble the corresponding observed data about males,
current smokers, reported in the 1980 US census and the Statistical Abstract of the
US from the same year.
3.4.2 MSM parameters to calibrate
The streamlined MSM for the natural history of lung cancer, that we developed in the
first chapter, involves numerous parameters describing different parts of the model.
In order to be able to run the procedures in plausible times, instead of performing an
exhaustive calibration, we rather run a restricted one, focusing our interest only on
four MSM parameters. All the rest are kept fixed, according to known relationships
found in the literature or plausible assumptions to simplify the calibration problem
84
(Chapter 2). In particular:
• we keep the MSM parameters pertaining to the onset of the first malignant
cell fixed to the quantities found in the literature about males, current smokers
(Table 2.2).
• from the Gompertz(m,s) distribution for the tumor growth, we only calibrate
m, assuming s=31·m (section 2.3.1)
• from the log-Normal distributions of the disease progression part of the MSM,
we only calibrate the location parameters (i.e mdiagn, mreg and mdist) assum-
ing that location=scale (i.e. means = standard deviations)
• our prior beliefs (i.e., prior distributions and plausible intervals for the MSM
parameters to calibrate) are in accordance with findings in the literature of the
natural history of lung cancer (section 2.3.1)
3.4.3 Calibration Targets
In order to keep the calibration problem as simple as possible, we only calibrate our
model to lung cancer incidence by age group. As reference point we use the observed
rates in SEER 2002-2006 data, so as to reproduce plausible numbers (Table 3.3). The
calibration exercise relies on the strong assumption that the lung cancer incidence
rates, conditional on gender and smoking status, remain unchanged throughout the
26 years prediction period (from 1980 to 2006) and are close to the reported ones in
SEER 2002-2006 rates. Another problem when calibrating the lung cancer natural
history model is the occurrence of rare events especially in ages less than 55 years
old. To overcome this problem, we combine the eleven 5-years age groups presented
in SEER data, into three, i.e. <60, 60-80, and >80 years old. In this way we are able
to observe all the lung cancer incidence rates even when we use as input a sample
of people of moderate size, (e.g., n=500). We assume that the lung cancer cases yj
85
follow Poisson(λj) distribution, where λj is the rate of the jth age group, expressed
as number of cases per 100,000 person-years (PYs).
Age group Observed Predicted
<40 5.2
41.9
11 ± 4.48
41 ± 4.25[40, 45) 10 15 ± 3.3[45, 50) 26.3 21 ± 2.83[50, 55) 56.7 49 ± 4.18[55, 60) 111.3 107 ± 6.46[60, 65) 208.4
387.4
192 ± 9.01
391 ± 15.96[65, 70) 329.3 392 ± 16.32[70, 75) 455.7 481 ± 18.62[75, 80) 556.2 498 ± 19.89[80, 85) 554.5
498.1504 ± 19.93
464 ± 19.36>85 441.7 425 ± 18.79
Table 3.3: Observed (2002-2006 SEER data) and predicted (M=100, N=100,000,θfix) lung cancer incidence rates (cases/100,000 person·years) by age group.
We ran an ad hoc analysis to identify combinations of parameter values that give
plausible predictions, i.e, predictions close to the observed quantities (SEER data
2002-2006). For this purpose we implemented the MSM on the simulated US 1980
population of N=100,00 males, current smokers. Given their simulated baseline
characteristics, i.e., age and smoking intensity, we predicted twenty six years ahead,
that is, we predicted lung cancer incidence in 2006. We implemented the model
M=100 times in order to increase the accuracy in our predictions. At the end of the
prediction period, we combined the results (predicted lung cancer cases per 100,000
person years) by age group. Following the results of this ad-hoc analysis, we identified
a set of values θfix = [θc1, θc2, θ
c3, θ
c4]T=[0.00038, 2, 1.1, 2.8]T , for which the MSM
predicts lung cancer incidence rates per age group close to the observed quantities
(SEER data). Table 3.3 presents the predicted lung cancer incidence rates for these
fixed parameter values. We set these rates Yclbr=[y1−clbr, y2−clbr, y3−clbr]T=[41, 391,
86
464]T , to be the calibration targets for each one of the two calibration methods.
The reason for choosing Yclbr to be the calibration targets, rather than the respective
observed rates in SEER data, is that we wanted to control for the effect the input
sample as well as the structure of the model would have on the MSM predictions.
In this way, any deviations of the predictions from the reference points would be
attributed to a greater extend to the real underlying differences between the two cal-
ibration methods rather than to other nuisance, for the purposes of this comparative
analysis, factors.
We repeated a similar procedure running the model M=2000 times in total, for
θfix, using this time the “calibration input” sample. We again predicted the lung
cancer incidence twenty six years ahead and combined the results by age group, thus
resulting in the vector of rates Yfix=[y1−clbr, y2−clbr, y3−clbr]T=[50, 353, 452]T . We use
this vector later on to validate the results from the two calibration methods. The
reason for that, the output of an MSM depends, to some extend, on the input sample,
therefore, even for the same θfix and the same total number of micro-simulations (i.e.,
M·N=107), the output can be slightly different (Yclbr vs Yfix).
Using the notation introduced in section 3.2.1, we define Yclbr = M100(θfix, smpl100,000),
and Yfix = M2000(θfix, smpl.C5000), to be the two reference points, the predictions from
the two calibrations MSMs will be compared to. Consequently, we have three refer-
ence points when comparing the results from the two methods: θfix for the calibrated
parameters, as well as Yclbr, and Yfix for the predicted lung cancer incidence rates.
3.4.4 Simulation Study
The ultimate goal of this chapter is the quantitative and qualitative comparison of the
two calibration methods for MSMs, the Bayesian and the Empirical one. To this end
we design a simulation study that allows for comparisons of multiple aspects of the
87
calibration procedure. The simulation study pertains to the implementation of both
methods to calibrate the parameters of the streamlined MSM for the natural history
of lung cancer. In particular, we calibrate all four (θ1, θ2, θ3, θ4) MSM parameters, and
compare the results from the two methods using both qualitative and quantitative
measures, as well as graphical means.
Methods comparability
In order to ensure comparability of the two methods we have to calibrate the same
set of MSM parameters θ, to the same calibration targets Yclbr using the same input
data (”calibration sample”). In addition, the prior information about θ in the
Bayesian calibration method has to be consistent with the plausible intervals assumed
in the empirical calibration method while the estimation of the model’s outputs of
interest should be based on the same number of embedded micro-simulation runs
(simulation study size).
The results from each calibration method, i.e. point estimates for MSM parameters
and predicted outputs, are influenced by the several sources of uncertainty (chapter
1) inherent in the model. Failure to recognize this problem and take precautions to
control for it, may cause misleading results and, consequently erroneous conclusions
from the comparative analysis. Structural uncertainty cannot be examined in our
case since both methods are implemented for the calibration of the exactly the same
MSM. We account for selection uncertainty and sampling variability (both related
to the calibration data) by setting the same calibration targets. Moreover, in order
to account for the effect of the simulation (Monte Carlo) variability by implement-
ing the MSM multiple times on the same input sample, and take point estimates
(means, standard deviations) of the outputs of interest (calibration targets or indi-
vidual trajectories). Parameter uncertainty on the other hand, is an integral part of
the calibration method itself, and is captured by the determination of distributions,
88
Characteristics MethodBayesian Empirical
Parameters to calibrate θ=[θ1, θ2, θ3, θ4]T
Calibration Targets Yclbr (relatively easy tocombine more than one sourcesof information)
Yclbr (when more than onecalibration targets need tospecify a rule to combine them)
GoF log-likelihood (inherent in theapproximate MH algorithm)
Deviance
Convergence criteria Trace plotsConvergence Diagnostics
x2 test (a=5%)
Stopping rule V sets of values for θ from the converged setsResult Random draws from the joint
posterior distribution of themodel parameters θ
Random draws from theempirical joint distribution ofthe “acceptable” values for θ
Table 3.4: Implementation of two calibration methods on the MSM for lung canceraccording to the seven-steps approach presented in Vanni et al. (2011)
rather the point estimates, of the resulted calibrated MSM parameters. This char-
acteristic will provide an additional means of comparison between the two methods.
Furthermore, we use the same sample of baseline characteristics (calibration input)
for the implementation of the two calibration methods, in order to eliminate the
effect of the population heterogeneity on the comparative analysis results.
Finally the calibration results from the two methods are integrated following exactly
the same procedure (e.g., using percentiles to describe the distribution of calibrated
parameters and MSM outputs). Table 3.4 juxtaposes the implementation of the two
calibration methods from the seven steps approach point of view presented in Vanni
et al. (2011) .
Simulation study size
The accuracy of the calibration results depends heavily on the total number of micro-
simulations involved in the computations. As already mentioned, we focus our inter-
est on the calibration of an MSM that describes the natural history of lung cancer
for males, current smokers. As already mentioned, in order to account for the effect
89
of simulation variability, we implement the MSM multiple times M on the input
data (calibration sample of baseline characteristics). Each time the model predicts n
trajectories 26 years ahead, one for each person in the input sample. We summarize
the results at the end of the prediction period, i.e., we calculate the lung cancer
rates per age group. This procedure results in M predictions per age group. As
a point estimate of the predicted lung cancer incidence rates we use the averages
of the M predicted values by age group. The accuracy of the calibration results is
highly related to the size of the simulation study, i.e., the total M·n micro-simulations
involved in the calculations.
Age groupM
10 20 30 50 100
n=500<60 34 ± 38.37 35 ± 47.18 44 ± 68.96 56 ± 78.4 38 ± 57.71
[60, 80) 370 ± 202.63 364 ± 208.72 382 ± 240.71 365 ± 218.9 361 ± 216.48>80 369 ± 316.64 434 ± 331.91 460 ± 259.59 466 ± 309.12 486 ± 315.44
n=1000<60 41 ± 35.32 43 ± 44.23 41 ± 41.53 43 ± 49.18 43 ± 45.3
[60, 80) 415 ± 160.57 392 ± 171.92 401 ± 160.01 393 ± 156.06 405 ± 155.36>80 480 ± 218.8 430 ± 174.45 496 ± 195.51 464 ± 185.99 435 ± 172.29
n=2500<60 47 ± 29.72 41 ± 24.63 41 ± 25.41 40 ± 24.75 41 ± 24.78
[60, 80) 386 ± 101.47 372 ± 103.92 406 ± 100.36 395 ± 103.44 390 ± 101.5>80 466 ± 115.75 472 ± 123.32 492 ± 123.07 465 ± 120.89 476 ± 121.27
n=5000<60 44 ± 23.72 40 ± 15.8 41 ± 19.01 43 ± 19.6 40 ± 19.16
[60, 80) 398 ± 82.34 397 ± 67.61 393 ± 73.56 402 ± 69.93 409 ± 73.82>80 454 ± 71.26 456 ± 97.57 444 ± 79.38 478 ± 93.72 464 ± 89.67
M: total number of micro-simulations per individual
n: input sample size
Table 3.5: Predicted lung cancer incidence rates (cases/100,000 person·years) per agegroup, for different study sizes (M·n).Calibration targets: Yclbr=[y1−clbr, y2−clbr, y3−clbr]
T=[41, 391, 464]T
A key issue in the study design is the choice of the M·n combination of total micro-
simulations involved in the calculations of each calibration method. There is a trade
off between the achieved accuracy in the predictions and the required running time.
Our goal was to identify a combination that provides accurate predictions within
90
plausible running times. To this end, we investigated different M·n combinations
in order to specify the one that better serves the purposes of the simulation study.
We randomly extracted sub-samples of size n=500, 1000, 2500, and 5000, from the
N=100,000 simulated 1980 US population. These samples were subsequently used as
input to predict lung cancer incidence rates 26 years ahead, implementing the model
M=10, 20, 30, 50, and 100 times respectively. Table 3.5 presents the predicted
lung cancer incidence rates (average ± sd) per age group for each scenario. Figure
3.3 provides a graphical representation of table 3.5. According to this table, the
combination of M=10 and n=5000 seems adequate to produce sufficient results in
plausible times 4. The focus, when making this decision was both in model model
accuracy (bias and variability of MSM predictions), as well as on the total required
running time.
4The required running time for M·n=50,000 micro-simulations is close to 5secs using 64 cores(8nodes;8cpus)
91
Figure 3.3: Predicted (mean±sd) lung cancer incidence rates (cases/100,000 person·years) by agegroup, for different M·n combinations, given fixed MSM parameter values (θfix=[0.00038, 2, 1.1,2.8]T ).
92
Implementation
We use both the Bayesian (3.2.2) and the Empirical method (3.2.3), to calibrate all
four MSM parameters θ=[θ1, θ2, θ3, θ4]T . We calibrate the MSM to three targets (
Yclbr=[y1−clbr, y2−clbr, y3−clbr]T =[41, 391, 464]T ) i.e., the predicted lung cancer inci-
dence rates per age group for fixed values of the MSM parameters θfix=[0.00038, 2,
1.1, 2.8]T (Table 3.3). For each θk we use Truncated Normal distribution (TN(µθk ,
sdθk), with µθk=sdθk) to specify either the prior for the Bayesian method or the dis-
tribution of plausible parameter values for the Empirical method. In particular, we
set:
• m=θ1 ∼ TN(µ(θ1) = sd(θ1)=0.0008, L(θ1)=0.00001, U(θ1)=0.0016
)• mdiagn=θ2 ∼ TN
(µ(θ2) = sd(θ2)=4, L(θ2)=0.0001, U(θ1)=8
)• mreg=θ3 ∼ TN
(µ(θ3) = sd(θ3)=2.2, L(θ3)=0.0001, U(θ3)=4.4
)• mdist=θ4 ∼ TN
(µ(θ4) = sd(θ4)=5.6, L(θ4)=0.0001, U(θ1)=11.2
)Suppose that we apply the Bayesian method in order to calibrate only θ1=m and
θ2=mdiagn, and produce a chain of length B=100,000 for each parameter. To imple-
ment the Gibbs sampler with the embedded approximate MH algorithm, we follow
the steps:
1. set θ1 = θ01, θ2 = θ0
2 (starting values), and keep θ3 = θc3 = 1.1, θ4 = θc4 = 2.8
(fixed). Denote θ0 = [θ01, θ
02, θ
c3, θ
c4]T the vector with the starting values for the
MSM parameters.
(a) given θ0 run the micro-simulation model M(θ) on the calibration sample
(n=1000) to predict individual trajectories 26 years ahead and calculate
the predicted lung cancer cases ymj by age group j.
(b) repeat step (a) M=50 times (m=1, 2, ..., M) resulting in M predicted lung
93
cancer incidence counts per age group. These ymj counts are considered
random draws from Poisson(λj) distributions
(c) calculate the likelihood:∏J
j=1 fj(yj−clbr|λj = gj(θ0)). These λj are func-
tion of the MSM parameters. Due to the model’s complexity the form
g(·) is hard to derive, therefore we approximate these quantities using the
respective MLEs (sample means), hence, λj = gj(θ) = 1M
∑Mm=1 ymj
2. propose a new value θ∗1
3. repeat steps (a) through (c) for θ∗ = [θ∗1, θ02, θ
c3, θ
c4]T
4. calculate the ratio r1(θ1, θ∗1) =
π(θ∗1)∏Jj=1 fj(yj |gj(θ∗))
π(θ01)∏Jj=1 fj(yj |gj(θ0))
and accept θ∗k with proba-
bility α(θ01, θ∗1) (section 3.2.2)
5. set θ01 =
θ∗1, if we accept θ∗1
θ01, otherwise
6. propose a new value θ∗2
7. repeat steps (a) through (c) for θ∗ = [θ01, θ∗2, θ
c3, θ
c4]T
8. calculate the ratio r2(θ2, θ∗2) =
π(θ∗2)∏Jj=1 fj(yj |gj(θ∗))
π(θ02)∏Jj=1 fj(yj |gj(θ0))
and accept θ∗k with proba-
bility α(θ02, θ∗2) (section 3.2.2)
9. set θ02 =
θ∗2, if we accept θ∗1
θ02, otherwise
The resulting [θ01, θ
02] values from the aforementioned process is one update for the
chains of the calibrated parameter values. We repeat steps (1) through (9), B=100,000
times. From the total of B=100,000 values, we collect for each chain V=1,000 values
selecting every 50th iteration from the last 50,000 values. The resulting V=1,000 vec-
tors comprise a sample representative of the joint posterior distribution of the MSM
parameters and all together correspond to the ΘBayes matrix of calibrated parameter
values of the MSM for lung cancer.
94
We follow an analogous procedure to calibrate one, or any combination of two, three
or all four parameters of the MSM. Figures 3.18 and 3.19 depict in flow charts the
implementation of the Bayesian method to calibrate θ1.
For the Empirical calibration method, we implement an LHS of moderate (NLHS=10)
size, L=10,000 times, thus resulting in Nemp= NLHS· L = 100,000 vectors of param-
eter values in total. For each vector of parameter values we implement the micro-
simulation model M=10 times, and we calculate the corresponding predicted lung
cancer incidence rates per age group. As in the Bayesian calibration method, we
assume that the calibration targets yj (lung cancer cases per age group j, j={1,
2, 3}) are count data from Poisson distributions, i.e., yj ∼ Poisson(λj=gj(θ)).
Since the form g(·) is hard to derive, we use the M=10 draws predicted by the
model, to calculate estimates for the parameter of these Poisson distributions, i.e.,
λj = gj(θ) = 1M
∑Mm=1 ymj. Here, the deviance statistic follows a chi-square distribu-
tion with ν = 3d.f., hence, stipulating a 5% level of statistical significance, we select
those sets satisfying Di < 7.81, thus resulting in Nemp ”acceptable” sets of parameter
values in total.
Among those we randomly extract V=1,000 vectors (with replacement if necessary,
i.e. if the procedure results in less than 1,000 ”acceptable” vectors). The resulting
vectors comprise a sample representative of the joint distribution of the ”acceptable”
MSM parameters, according to the Empirical calibration criteria, and all together
correspond to the ΘEmp matrix of values for the calibrated parameters of the lung
cancer MSM (section 3.2.3).
Figures 3.1 and 3.2 provide a graphical representation of the mechanism for the
extraction of NLHS vectors of values from a two-dimensional parameter space, for
different NLHS sizes. Two sets of graphs are presented in each figure. In the left
graph the extracted value is the center of each selected interval, whilst in the right
95
one the value is randomly chosen from the respective interval.
3.4.5 Terms of comparison
The results from both methods are sets of values describing the joint distribution of
the calibrated MSM parameters. The resulting sets represent random draws from the
joint posterior distribution, or the empirical joint distribution of the values satisfying
the convergence criteria, for the Bayesian and the Empirical method respectively. For
the purposes of this comparative analysis, each method results in V=1000 vectors
from the multi-variate parameter space.
We use these results to make predictions for the quantities of interest, i.e., lung cancer
incidence rates by age group. In particular, for each vector of parameter values we
implement the MSM multiple (M=50) times, and we produce point estimates (means)
of the respective quantities. The resulting (V=1000) mean incidence rates for each
age group represent random draws from the posterior or the empirical predictive
distribution. Depending on the input sample, the predictions can be used for the
purposes of internal (smpl.C5000) or external validation (smpl.V5000).
We compare the two methods using qualitative and quantitative measures as well as
graphical representations of the results produced.
In particular we provide:
1. Density plots (parameters and predictions)
We compare the density plots of the marginal distributions of the calibrated
MSM parameters, as well as the distributions of the predicted calibration data
(lung cancer incidence rates by age group). We use the Kullback-Leibler dis-
tance (60) to assess the relative entropy between the probability distributions,
resulted from the two methods, with respect to either each calibrated parameter
or the predictions by age group. Low values of this distance indicate similarity
96
of the two distributions, provided that they do not present large differences in
the overall shape (e.g., different skewness, and higher order moments in gen-
eral). We also apply the Kolmogorov-Smirnov test to check whether results
from the two methods come from the same underlying distribution. When
the null hypothesis is not rejected (similar results from the two methods), we
include in the graph the respective p-value. In the density plots for the cali-
brated MSM parameters we also include the respective prior distributions. For
the predictions we take by each calibrated MSM, we present two different sets
of results, one for internal and the other for the external validation of the cal-
ibrated MSM, using as input the calibration (smpl.C5000) and the validation
(smpl.V5000) samples respectively.
2. Correlation and contour plots (parameters)
We also provide correlation (scatter) plots of all the pairs of calibrated MSM
parameters, as well as contour plots to identify high density points in the
bivariate resulted distributions. The scatter plots are accompanied by the
Pearson correlation coefficient.
3. Calibration and box plots (predictions)
In an attempt to provide additional means to compare the two methods, we
use the calibration results (sets of values for the calibrated MSM parameters)
to predict lung cancer incidence rates based on different samples of baseline
characteristics (input data). In particular, we extracted 20 different samples
in total, each of size n=5000, representative of the 1980 US population of
males, current smokers. Each sample includes individual level data on age and
smoking intensity. For each one of these 20 samples we apply the model M=50
times, to predict lung cancer incidence rates by age group. We use the sample
mean as a point estimate of the predicted quantity by age group. Repetition
of this process for each set of values for the calibrated MSM parameters results
97
in 1000 predicted rates for each age group.
Using these estimates we produce calibration and box plots to compare the two
methods. In the calibration plots we plot the point estimates of the predic-
tions from the Bayesian method versus the respective ones from the Empirical
method for each one of the 20 different samples, that were used as input data. If
the two methods produce similar results, the points in this plot should be scat-
tered along the x=y line. We juxtapose the box-plots of the predictions from
each calibration method, for each one of the 20 samples, by age group. The
extent of overlapping between the respective box-plots, indicate equivalence of
the results produced by each method.
4. Discrepancy measures
We also provide four quantitative (two univariate and two multivariate) mea-
sures of discrepancy to compare the predictions from the two methods, namely
the mean absolute (MAD) and mean squared (MSD) deviations, as well as the
Euclidean, and the Mahalanobis distances.
The univariate measures of discrepancy are defined as:
MAD =1
V
V∑v=1
|yvj − yj|yj
(3.4)
MSD =1
V
V∑v=1
(yvj − yjyj
)2
(3.5)
where, yvj are the point estimates for the lung cancer incidence rate of the jth
age group given the vth vector of MSM parameters, and yj is the jth component
of the vector used as reference point.
The multivariate distances, on the other hand, given M-dimensional vectors x
98
and a constant vector c (center), is defined as:
DM =√
(x− c)T · S−1 · (x− c) (3.6)
where, c represents the center of the multidimensional space. In the Euclidean
distance S is the identity matrix, while in the Mahalanobis distance S is the
respective covariance matrix of the x vectors.
In the case of the calibrated parameters, this statistic measures the distance of
each x vector of MSM parameter values from the c=θfix vector of fixed values
assumed in the simulation study. When it comes to MSM predictions, these
distances measure the deviation of each vector of predictions from the vector
used as reference point (Yclbr or Yfix).
Multivariate distances are useful to be used in conjunction with the results from
the univariate ones, since they provide an idea about the combined deviation
of the MSM predictions from the reference points (here, lung cancer incidence
rates per age group). Furthermore, Mahalanobis distance adds objectivity in
the comparison of the results from the two MSMs, since it weighs the relevant
deviation based on the underlying covariance matrix. For instance, the distance
of a vector x with large variance, as well as the distances of two vectors (x1
and x2) from c, are downweighted (and vice versa). Hence, the final results are
not distorted by potentially high correlations or different order of magnitude
between the involved quantities of interest.
Both mean deviations regarding the MSM predictions are weighted based on
the size of the respective lung cancer incidence rates by age group. We use
two reference points for these calculations. The first one is the set of calibra-
tion targets Yclbr = M100(θfix, smpl100,000), i.e., the set of MSM predictions for
99
θfix, smpl100,000 and M=100. The second is the set of MSM predictions Yfix
= M2000(θfix, smpl.C5000), i.e., the model’s output again for θfix, but using the
calibration input (smp.Cl5000) and running the model M=2000 times in total.
As already mentioned (section 3.4.3), the reasoning behind the second compar-
ison is that, even if the calibration procedure resulted in the vector θfix, there
would be a deviation between MSM’s predictions and calibration targets, even
for the same number of total micro-simulations (107), if instead of smpl100,000
we used the calibration input (smpl.C5000). This deviation has to do with two
sources of uncertainty inherent in the MSM (Chapter 1), namely, population
heterogeneity (different composition of the two input samples), and stochastic
uncertainty.
By comparing the MSM predictions, resulting from each calibration method
using the calibration input sample (smpl.C5000), with the model’s output for
θfix using exactly the same input sample, we control for the effect of the pop-
ulation heterogeneity in the final results. Therefore, any deviations between
MSM predictions and Yfix can be attributed, to a greater extend, to the real
underlying differences between the two calibration methods, rather than being
distorted by the population heterogeneity.
3.5 Results
3.5.1 Parameters
Table 3.6 and figure 3.4 compare the marginal distributions of the calibrated param-
eters from each method. There is a considerable overlap between the results from
the two calibration methods. This overlap is more prominent in the case of θ3=mreg
and θ4=mdist, where the Kolmogorov-Smirnov test cannot reject the null hypothesis
100
that the respective pairs of distributions represent the same underlying populations
at α = 0.1%. In these two cases, noteworthy also is the proximity between the
marginal distributions of the calibrated parameters and the assumed priors, indicat-
ing potential identifiability problem for these two MSM parameters. Furthermore
the relative entropy assessed by the Kullback-Leibler (symmetric) distance, is very
close to 0.5 for all MSM parameters except θ1=m.
Both methods include the fixed values, assumed in the simulation study (θfix=[0.00038,
2, 1.1, 2.8]T ), within the range of the calibrated MSM parameters. This is an indica-
tion that both methods produce reasonable results. However, the marginal distribu-
tions are centered away from these fixed values. In most of the times, the respective
fixed parameter value lies outside the Interquartile Range (IQR), with the only ex-
ception being θ1=m for the empirical and θ2=mdiagn for the Bayesian method.
Contour plots (figures 3.6, 3.7) reveal bivariate associations between the calibrated
parameters. The underlying patterns are similar for the two methods. There is a
strong correlation between θ1 and θ2 in both methods. Furthermore, changes in the
θ3 and θ4 do not seem to considerably affect the respective θ1 values (points on the
respective plots are gathered around a conceivable line perpendicular to the θ1 axis).
The three parameters θ2, θ3, θ4 seem totally unrelated to each other.
Identifying highly correlated parameters can prove very helpful for the further devel-
opment and improvement of the MSM, as well as an extremely interesting discovery
for experts investigating the described phenomenon (here lung cancer). With re-
spect to the development of the MSM, strong correlations may indicate redundant
parameters and suggest a more parsimonious version of the model by expressing some
parameters as functions of others, they are highly correlated with. Regarding the
true process described by the MSM, strong correlations may reveal relationships be-
tween the underlying mechanisms, previously unknown or disregarded by the experts,
101
and, hence advance the overall research on the phenomenon through new interesting
paths.
The two multidimensional discrepancy measures lead to contradictory conclusions.
According to the Euclidean distance the resulted, from the Bayesian method, values
for the calibrated MSM parameters are closer to the fixed values specified in the
simulation study (θfix) than those from the Empirical method. Noteworthy, however,
is the fact that, although the univariate (figure 3.4, table 3.6) and the bivariate
analysis (figures 3.6-3.7), as well as the Euclidean distance (figure 3.5), suggest that
there are some discrepancies between the calibrated values, when considering the
multidimensional parameter space, centered at θfix, the Mahalanobis distances (figure
3.5) indicate pretty similar results between the two methods. As already mentioned
(section 3.4.5), conclusions based on the Euclidean distance may be misleading since,
this measure can be distorted by several factors, e.g., high correlations or different
order of magnitude between the involved quantities of interest. In the case of the
calibrated MSM parameters, there is high correlation between two of them (θ1=m
and θ2=mdiagn), while θ1 differs by almost four orders of magnitude from each one
of the other MSM parameters.
102
Figure 3.4: Density plots, Kullback-Leibler distance, and Kolmogorov Smirnov p-value, comparing the marginal distributions of the calibrated MSM parameters be-tween the two calibration methods.
103
Figure 3.5: Distributions of multidimensional distances of the calibrated MSM pa-rameters from the fixed values assumed in the simulation study (θfix).
104
Met
hod
θ 1=
mM
inQ
1M
edia
nM
ean
Q3
Max
Fix
edva
lue
Dev
iati
on(±
SD
)(P∗ k)
(%)
Bay
esia
n2.
14·1
0−4
3.15·1
0−4
3.40·1
0−4
3.38·1
0−4
3.64·1
0−4
4.60·1
0−4
3.8·
10−
44·1
0−5
(3.9·1
0−5)
(88)
(11)
Em
pir
ical
2.78·1
0−4
3.71·1
0−4
3.97·1
0−4
3.97·1
0−4
4.20·1
0−4
5.00·1
0−4
3.8·
10−
4−
2·1
0−5
(3.7·1
0−5)
(33)
(4.4
7)
θ 2=
md
iagn
Bay
esia
n1.
49·1
0−3
1.52
2.65
2.87
3.94
7.95
2-0
.87
(1.7
7)(3
6)(4
3.25
)E
mpir
ical
7.42·1
0−3
2.77
4.25
4.36
6.10
7.98
2-2
.36
(2.0
2)(1
3)(1
17.8
)
θ 3=
mre
gB
ayes
ian
0.01
91.
372.
162.
223.
064.
401.
1-1
.12
(1.0
9)(1
8)(1
01.5
)E
mpir
ical
0.01
31.
332.
212.
253.
184.
391.
1-1
.15
(1.1
1)(1
8)(1
04.7
)
θ 4=
md
ist
Bay
esia
n0.
071
3.59
5.62
5.76
8.02
11.1
82.
8-2
.96
(2.7
2)(1
6)(1
05.5
)E
mpir
ical
0.00
113.
245.
735.
638.
1311
.20
2.8
-2.8
3(3
.00)
(22)
(101
.2)
∗P
erce
nti
leof
the
pre
dic
tive
dis
trib
uti
on
,th
efi
xed
valu
eco
rres
pon
ds
to.
Tab
le3.
6:Sum
mar
yst
atis
tics
ofth
eca
libra
ted
MSM
par
amet
ers.
105
Figure 3.6: Contour plots depicting the bivariate parameter distributions of theBayesian calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05 of the bivariatedistribution.
106
Figure 3.7: Contour plots depicting the bivariate parameter distributions of theEmpirically calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05 of the bivariatedistribution.
107
3.5.2 Predictions
The marginal distributions of the predicted lung cancer incidence rates from both
methods include the respective calibration targets in their range (figures 3.8, 3.9).
Moreover, predictions from the Bayesian MSM include the calibration targets in
their IQRs for both internal and external validation, the only exception being the
“60-80yrs”age group (table 3.7). On the contrary, calibration targets lie outside the
respective IQRs of the predictions from the empirically calibrated MSM, the only ex-
ception being the “>80yrs” age group in the external validation case. Consequently,
although there is a large overlap between the two methods regarding the ranges of
the predicted lung cancer incidence rates by age group (table 3.7), the respective dis-
tributions are very different (KS-test p.value<0.001 in every age group). Predicted
values from the Bayesian calibrated model are more dispersed than those from the
Empirical one, while the bias of the methods varies across the age groups and the
type of validation. However, both calibrated models overall predict better the lung
cancer incidence in the “>80yrs” group, i.e., the group with more cases in it.
As already described in section 3.4.4, the predictions from each calibrated model
resulted from running the model M=50 times for each of the V=1000 calibrated
parameter vectors (Θ matrices), given a specific input sample SN . In the case of
the internal validation SN=smpl.C5000, i.e., the sample used in the calibration pro-
cedure, while in the external validation SN=smpl.V5000, i.e., another sample of the
same size N=5000. Both input samples are extracted from the simulated 1980 US
population (N=100,000). We calculated the MAD and MSD discrepancies measures
for the calibrated MSMs under four different scenarios depending on the input sam-
ple used (internal and external validation), as well as the reference point (Yclbr or
Yfix, section 3.4.3). Table 3.8 depicts the predictions involved in calculations of the
MAD and MSD discrepancy measures presented in table 3.9. Note here that, when
108
comparing MSMs results with Yclbr, predictions involved in the calculations resulted
given different MSM input samples. However, when Yfix is the reference point, in
the internal validation predictions refer to the same input sample (smpl.C5000), while
in the external validation predictions refer to samples of the same size (N=5000).
According to the overall MSM and MAD values (table 3.9) when comparing the
predictions to the calibration targets (Yclbr), it is unclear which method outperforms
the other. However, noteworthy is the fact that, when looking at the results by
age group, the Bayesian calibrated MSM predicts lung cancer incidence better than
the Empirically calibrated one, for younger people (”<60yrs”), i.e., for the group
with fewer observed cases in it. This finding holds for both internal and external
validation and indicates that the Bayesian method results in a set of values for the
model parameters that, when using as MSM input, lead to better predictions of rare
events.
When it comes to deviations from Yfix, the Empirically calibrated MSM overall
results in smaller discrepancies than the Bayesian one. This finding, in conjunction
with the note that predictions in this case refer to input samples that are either the
same (internal validation) or of the same size (external validation), suggests that the
Bayesian method is probably more robust to the sample of baseline characteristics
used as input in the calibration procedure.
To better understand this, remember that θfix is a vector of ad-hoc values for the
model parameters, therefore independent of the the input samples used for the pre-
dictions. The matrices ΘBAYES and ΘEMP, on the other hand, depend on the input
sample (smpl.C5000) used in the calibration procedure. Furthermore, the predic-
tions obtained by the MSM depend on the structure of the model, which remain
unchanged, the parameter values and the input sample used. According to Table
3.8, the predictions obtained from each model, depend on the matrices of calibrated
109
values Θ. In addition, in the internal validation case, these predictions also depend
on the input sample (smpl.C5000) used in the calibration procedure, while in the ex-
ternal validation case they depend on a slightly different input sample of the same
size (N=5000), from the same reference population (smpl.V5000).
Therefore, the proximity between the MSM predictions and Yfix provides an indi-
cation of how strongly the results of each calibration method (ΘBAYES and ΘEMP)
depend on the input sample used in the calibration procedure. The stronger this
relationship is, namely the closer the MSM predictions are to the reference vector
Yfix, the less “robust” the method is to the input sample used in the calibration
procedure.
Looking at the multivariate version of the aforementioned four sets of comparisons
and the respective discrepancy measures (figure 3.10), we have a clearer idea of the
combined deviation of the MSM predictions from the reference vectors. According to
the Euclidean distance predictions from the Empirically calibrated MSM are consid-
erably closer to the reference vectors compared to the those from the Bayesian model
in all cases (internal and external validation). This finding was expected because,
according to the respective univariate distributions, (table 3.7, figures 3.8-3.9), pre-
dictions from the Bayesian MSM are much more dispersed than the ones from the
Empirical model in the ”60-80yrs” and ”>80yrs” age groups. Furthermore, although
predictions in the ”<60yrs” group are less dispersed, and centered around the cali-
bration target, this is not reflected in the Euclidean distance, since this measure does
not take into account the relative magnitudes of the quantities of interest.
The Mahalanobis distances change the overall conclusions a lot. According to this
measure, the Bayesian calibrated MSM seems to perform equally well in all instances,
and only marginally better when comparing predictions with Yfix in the external val-
idation case, compared to the Empirically calibrated model. This finding essentially
110
reflects the fact that the superiority of the Bayesian MSM in the “< 60yrs age group
essentially rules out with the better predictions of the Empirical MSM in the other
two age groups, as indicated by the univariate discrepancy measures applied for the
internal validation of the model (table 3.9). On the contrary, in accordance with the
univariate analysis, the Mahalanobis distance suggest that the predictions from the
Empirical MSM are closer to Yfix than respective ones from the Bayesian model.
The calibration graphs (figure 3.11) plot the average predicted values by age group
for each one of twenty different samples (of size N=5000 each) used as input in the
MSM model. As it was expected, these numbers lie on a straight line, denoting that
the results from the implementation of the two calibrated MSMs on the same input
results in analogous outcomes.
The box-plots (figure 3.12) and the respective summary statistics (table 3.10) are
in accordance with the conclusions from the density plots, i.e., the indicate that,
overall, the Empirical methods leads to less dispersed predictions. Noteworthy is
also the fact that, looking at the medians, the predictions from the Empirical MSM
are constantly higher than those from the Bayesian model. However, the Bayesian
calibrated MSM tends to make more accurate predictions (medians closer to the
respective calibration targets) for the ”<60yrs” and ”>80yrs” age groups.
111
Figure 3.8: INTERNAL VALIDATION: Density plots depicting the marginal dis-tributions of the predicted lung cancer incidence rates (cases/100,000 person·years)
by age group, compared to calibration targets Yclbr= M100(θfix, smpl100,000), and
Yfix = M2000(θfix, smpl.C5000). [KL-dist: Kullback-Leibler distance]
112
Figure 3.9: EXTERNAL VALIDATION: Density plots depicting the marginal dis-tributions of the predicted lung cancer incidence rates (cases/100,000 person·years)
by age group, compared to calibration targets Yclbr=M100(θfix, smpl100,000), and
Yfix=M2000(θfix, smpl.C5000). [KL-dist: Kullback-Leibler distance]
113
INTERNAL Validation EXTERNAL Validation
SummaryBayesian Empirical Bayesian Empirical
statistics < 60 years oldMin 14.05 31.22 15.24 30.48Q1 32.70 44.97 32.40 44.02
Median 39.87 49.8 39.55 48.93Mean±Sd 39.36±9.19 49.8±6.61 38.8±9.05 49.0±6.42
Q3 45.94 54.7 45.35 53.7Max 66.32 69.7 64.44 68.9
Target value 41Bias 1.64 -8.79 2.20 -8.00(%) (4) (21.4) (5.4) 19.5
60-80 years oldMin 208.9 313.6 212.5 307.1Q1 308.1 358.1 301.2 350.5
Median 342.2 373.6 335.2 365.7Mean±Sd 336.6±40.45 372.9±19.77 329.4±40.46 365.1±19.72
Q3 369.2 387.6 361.1 380.0Max 426.1 423.1 425.1 415.1
Target value 391Bias 54.4 18.1 61.6 25.9(%) (13.9) (4.63) (15.8) (6.6)
>80 years oldMin 370.6 383.8 361.6 389.3Q1 433.4 458.9 423.2 447.9
Median 458.8 476.4 449.5 467.2Mean±Sd 465.0±41.72 476.1±26.70 453.8±40.46 465.7±26.37
Q3 494.5 495.2 482.5 483.5Max 622.9 562.0 568.0 556.3
Target value 464Bias -1.0 -12.1 10.2 -1.7(%) (0.2) (2.6) (2.2) (0.4)
Bias(%): deviation of the mean from the target value of the calibration procedure
Table 3.7: Summary statistics of the predicted lung cancer incidence rates by agegroup, by implementing the MSM on both the calibration and validation input sam-ple.
114
Figure 3.10: Mahalanobis distances distributions of the calibrated MSM predictionsfrom Yclbr and Yfix (internal and external validation).
115
Figure 3.11: Calibration plots.
116
Figure 3.12: Box plots.
117
Internal ValidationBayesian Calibration Empirical Calibration
Yclbr = M100(θfix, smpl100,000)M50(ΘBayes, smpl.C5000) M50(ΘEmp, smpl.C5000)
Yfix = M2000(θfix, smpl.C5000)
External ValidationBayesian Calibration Empirical Calibration
Yclbr = M100(θfix, smpl100,000)M50(ΘBayes, smpl.V5000) M50(ΘEmp, smpl.V5000)
Yfix = M2000(θfix, smpl.C5000)
Table 3.8: Comparisons: Predictions vs reference points involved in the calculationsof the MAD and MSD discrepancy measures for the two calibrated MSMs (table3.9).
3.6 Calibration Methods Refinement
Another very important finding is that, when applying the Pearson x2 GoF test, only
34.5% and 59.7% of the predictions from the Bayesian calibrated MSM “pass” the
test at a=95% and 99% respectively. The corresponding percentages from the Empir-
ical calibrated MSM are much higher, i.e., 77.8% and 98.8% respectively. Analogous
findings have in the case of the external validation of the models with the percent-
ages of predictions satisfying the GoF test being 31.4% and 54.3% for the Bayesian
method, and 73.1% and 96.8% for the Empirical one. This interesting note drove
the conduct of a complementary sub-analysis, based on N=100 random draws from
the sets of calibrated parameter values (along with their predictions) “passing” the
95% GoF test, from each method.
The results from this supplementary analysis are somewhat different from the main
analysis. The most prominent differences, as it was expected, are related to the
performance of the Bayesian calibrated model. The distributions of the calibrated
parameters as well as the predictions resulted from this model are much less dis-
persed compared to the full analysis. The posterior distribution of θ2=mdiagn is
118
Inte
rnal
Val
idat
ion
Bay
esia
nC
alib
rati
onE
mpir
ical
Cal
ibra
tion
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
Ycl
br
MA
D0.0
400
0.13
920.0
220
0.0
600
0.21
450.0
464
0.02
610.
0957
MSD
0.0
518
0.03
060.
0081
0.03
000.
0720
0.0
047
0.0
040
0.0
269
Yfi
xM
AD
0.21
280.0
465
0.0
288
0.09
600.0
041
0.05
630.
0534
0.0
379
MSD
0.07
900.
0153
0.00
930.
0345
0.0
175
0.0
063
0.0
063
0.0
100
Exte
rnal
Val
idat
ion
Bay
esia
nC
alib
rati
onE
mpir
ical
Cal
ibra
tion
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
Ycl
br
MA
D0.0
563
0.15
750.
0221
0.0
777
0.19
510.0
663
0.0
037
0.08
84M
SD
0.0
516
0.03
550.
0082
0.03
180.
0626
0.0
069
0.0
032
0.0
242
Yfi
xM
AD
0.22
400.
0668
0.0
039
0.09
820.0
200
0.0
342
0.03
030.0
282
MSD
0.08
300.
0176
0.00
820.
0362
0.0
169
0.0
043
0.0
043
0.0
085
Tab
le3.
9:M
easu
res
ofdis
crep
ancy
toas
sess
over
allM
SM
’spre
dic
tive
per
form
ance
.M
AD
san
dM
SD
sof
model
’spre
dic
tion
sfr
omY
clb
r=M
100(θ
fix,s
mpl 1
00,0
00),
and
Yfi
x=M
2000(θ
fix,s
mpl.C
5000).
[Bol
dnum
ber
sin
dic
ate
the
met
hod
wit
hth
esm
alle
rdis
crep
ancy
].
119
Method Summary statisticsMin Q1 Median Q3 Max
< 60 yrsBayesian 16.3 32.9 40.1, 46.2 65.3Empirical 32.2 45.1 50.1 55.0 68.8
60-80 yrsBayesian 215.7 297.7 330.3 355.9 420.5Empirical 304.5 345.6 360.1 374.7 410.7
> 80 yrsBayesian 363.0 436.0 463.2 496.7 584.3Empirical 408.6 461.2 480.2 497.4 548.8
Table 3.10: Mean values of the main summary statistics (minimum, maximum andquartiles) of the predicted lung cancer incidence rates by age group for 20 differentMSM input samples (figure 3.12).
now centered around the respective fixed value. Predictions are improved for the
“60-80’yrs’ age group. The Bayesian calibrated MSM still performs better when it
comes to rare events, while now the overall performance of this model is better than
the Empirical one (table 3.13) when predictions are compared with the calibration
targets.
120
Figure 3.13: Sub-analysis: Density plots comparing the marginal distributions of thecalibrated MSM parameters between the two calibration methods.
121
Met
hod
θ 1=
mM
inQ
1M
edia
nM
ean
Q3
Max
Fix
edva
lue
Dev
iati
on(±
SD
)(P∗ k)
(%)
Bay
esia
n3.
02·1
0−4
3.45·1
0−4
3.65·1
0−4
3.65·1
0−4
3.81·1
0−4
4.50·1
0−4
3.8·
10−
41.
5·1
0−5
(2.6
4·1
0−5)
(72)
(3.9
)E
mpir
ical
3.15·1
0−4
3.71·1
0−4
3.94·1
0−4
3.95·1
0−4
4.21·1
0−4
4.72·1
0−4
3.8·
10−
4−
1.51·1
0−5
(3.8·1
0−5)
(30)
(3.9
7)
θ 2=
md
iagn
Bay
esia
n0.
1058
1.12
1.83
1.74
2.26
3.93
20.
26(0
.82)
(59)
(13)
Em
pir
ical
7.42·1
0−3
2.74
4.60
4.41
6.33
7.98
2-2
.41
(2.2
4)(1
6)(1
20.5
)
θ 3=
mre
gB
ayes
ian
0.40
21.
412.
062.
162.
924.
271.
1-1
.06
(0.9
6)(1
3)(9
6.4)
Em
pir
ical
0.31
61.
372.
142.
213.
094.
351.
1-1
.11
(1.0
9)(1
6)(1
01)
θ 4=
md
ist
Bay
esia
n0.
157
3.76
5.70
5.58
7.68
10.7
2.8
-2.7
8(2
.74)
(17)
(99.
3)E
mpir
ical
0.09
42.
825.
905.
497.
8311
.02.
8-2
.69
(3.0
0)(2
5)(9
6.1)
∗P
erce
nti
leof
the
pre
dic
tive
dis
trib
uti
on
,th
efi
xed
valu
eco
rres
pon
ds
to.
Tab
le3.
11:
Sub-a
nal
ysi
s:Sum
mar
yst
atis
tics
ofth
eca
libra
ted
MSM
par
amet
ers.
122
Figure 3.14: Sub-analysis: Contour plots depicting the bivariate parameter distribu-tions of the Bayesian calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05 ofthe bivariate distribution.
123
Figure 3.15: Sub-analysis: Contour plots depicting the bivariate parameter distribu-tions of the Empirically calibrated MSM. Contours drawn at α=0.95, 0.5 and 0.05of the bivariate distribution.
124
Figure 3.16: INTERNAL VALIDATION (sub-analysis): Density plots depict-ing the marginal distributions of the predicted lung cancer incidence rates(cases/100,000 person·years) by age group, compared to calibration targets Yclbr=
M100(θfix, smpl100,000), and Yfix=M2000(θfix, smpl.C5000).
125
Figure 3.17: EXTERNAL VALIDATION (sub-analysis): Density plots depict-ing the marginal distributions of the predicted lung cancer incidence rates(cases/100,000 person·years) by age group, compared to calibration targets Yclbr=
M100(θfix, smpl100,000), and Yfix=M2000(θfix, smpl.C5000).
126
INTERNAL Validation EXTERNAL Validation
SummaryBayesian Empirical Bayesian Empirical
statistics < 60 years oldMin 37.24 37.24 34.73 37.11Q1 42.96 45.31 41.99 44.73
Median 45.74 50.04 45.08 48.18Mean±Sd 46.35 ±5.09 49.9±6.27 45.61±4.85 49.0±5.77
Q3 50.00 54.4 48.32 53.2Max 59.83 61.9 61.06 60.7
Target value 41Bias -5.35 -8.9 -4.61 -8.04(%) (13) (21.7) (11.2) 19.6
60-80 years oldMin 344.5 344.9 339.7 342.8Q1 357.7 362.5 355.6 355.7
Median 367.2 374.9 362.6 366.3Mean±Sd 368.3±13.71 364.6±13.4 329.4±40.46 368.1±15.53
Q3 379.0 387.8 372.4 379.8Max 401.6 409.2 398.4 402.7
Target value 391Bias 22.7 15.6 26.7 22.9(%) (5.8) (3.99) (6.8) (5.9)
>80 years oldMin 421.6 433.5 428.3 424.5Q1 460.2 460.3 460.1 459.2
Median 481.6 480.0 480.0 472.1Mean±Sd 480.1±24.67 476.8±20.77 478.3±23.68 472.0±17.22
Q3 498.2 495.0 496.0 482.6Max 522.2 514.8 521.7 518.2
Target value 464Bias -16.1 -12.8 -14.3 -8.0(%) (3.5) (2.8) (3.1) (1.7)
Bias(%): deviation of the mean from the target value of the calibration procedure
Table 3.12: Sub-analysis: Summary statistics of the predicted lung cancer incidencerates by age group, by implementing the MSM on both the calibration and validationinput.
127
Inte
rnal
Val
idat
ion
Bay
esia
nC
alib
rati
onE
mpir
ical
Cal
ibra
tion
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
ycl
br
MA
D0.1
305
0.05
820.
0347
0.0
745
0.21
580.0
400
0.0
276
0.09
45M
SD
0.0
323
0.00
460.
0040
0.0
136
0.06
980.0
034
0.0
027
0.02
53
yfi
xM
AD
0.07
300.0
432
0.06
230.
0595
0.0
029
0.06
330.0
548
0.0
403
MSD
0.01
560.0
034
0.00
680.
0086
0.01
560.
0062
0.0
051
0.0
089
Exte
rnal
Val
idat
ion
Bay
esia
nC
alib
rati
onE
mpir
ical
Cal
ibra
tion
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
<60
yrs
60-8
0yrs
>80
yrs
Ove
rall
ycl
br
MA
D0.1
125
0.06
830.
0307
0.0
705
0.19
610.0
584
0.0
173
0.09
06M
SD
0.0
266
0.00
580.
0035
0.0
119
0.05
810.0
049
0.0
017
0.02
16
yfi
xM
AD
0.08
770.0
320
0.05
810.
0593
0.0
192
0.04
290.0
443
0.0
355
MSD
0.01
700.0
024
0.00
610.
0085
0.0
136
0.00
380.0
034
0.0
069
Tab
le3.
13:
Sub-a
nal
ysi
s:M
easu
res
ofdis
crep
ancy
toas
sess
over
allM
SM
’spre
dic
tive
per
form
ance
.M
AD
san
dM
SD
sof
model
’spre
dic
tion
sfr
omyclbr
=M
SM
(sm
pl 1
00,0
00,θ
fix),
and
yfix
=M
SM
(sm
pl.C
5000,θ
fix).
128
3.7 Discussion
In this chapter we presented a comparative analysis of two calibration methods for
micro-simulation modeling. We implemented both methods in the free statistical
software, R. We discussed the computational considerations and compared the results
of the two calibrated MSMs.
The comparative analysis showed that the Empirical calibration method is much
more efficient, regarding the computational burden, since it can be orders of magni-
tude faster than the Bayesian one. This finding is also applicable when it comes to the
comparison of undirected to any directed calibration method due to the structural
similarities those methods respectively bare with the Empirical and the Bayesian
methods presented in this chapter. Furthermore, this chapter emphasizes on the im-
perative need for HPC techniques for calibrating any complicated predictive model
including MSMs.
The two methods produced very similar results with respect to the distributions of
the calibrated MSM parameters, resulted in analogous correlation structures, and
raised the same identifiability issues.
Predictions from the calibrated MSMs differ somewhat between the two methods.
The Bayesian MSM results in more dispersed predictions than the Empirical model,
although there are indications that it predicts better rare events. In addition, the
Bayesian method seems to be more robust to the input sample used in the calibration
procedure.
Finally the supplementary analysis reveals a remarkable improvement in the results
from the Bayesian MSM. This finding is suggestive of two things. First of all, more
work should be done on the collection of the parameter vectors from the Bayesian
129
calibration method (e.g., length of converged chains, sampling rule for each one of
them, etc). Second, the performance of the MSM can be considerably improved if
the Bayesian calibration method is followed by an additional step that would further
refine the collection of the final sets of vectors for the calibrated parameters. As the
supplementary analysis has shown, such an improvement could be achieved if, for
example, we choose a subset of vectors, for the MSM parameters, that provide good
fit of the model to observed data, according to some GoF criterion.
Future work will be towards a more detailed calibration of the streamlined MSM for
lung cancer developed in Chapter 2. We will aim at a complete calibration of the
MSM, so as to be able to predict individual trajectories for all possible combinations
of gender (male/female) and smoking status (never/former/current smokers). Fur-
thermore we envisage the expansion of the two calibration methods so as to account
for multiple calibration targets, i.e., being able to incorporate diverse information
from different stages of the natural history of lung cancer.
130
Figure 3.18: Flow chart of the implementation of the approximate MH algorithm ofthe Bayesian method to calibrate θ1.
131
Figure 3.19: Flow chart of the implementation of the Bayesian method to calibrateθ1. [A(θk)=π(θk) ·
∏Jj=1 fj(yj|λj)]
132
Chapter 4
Assessing the predictive accuracy of MSMs
The third chapter of this thesis is concerned with the assessment of the predictive
accuracy of MSMs, a quality characteristic that has not been studied in the literature
yet. The main outcome of interest for this assessment is individual predicted time to
event, thus our approach is based on techniques applied on survival modeling. We
propose a set of available concordance indices, typically used for the assessment of
the predictive accuracy of survival models. In addition, we study the ability of the
MSMs to predict times to events, and suggest use of hypothesis testing to compare
observed with predicted survival distributions. We implement the suggested methods
in order to assess and compare the predictive accuracy of the two calibrated MSMs,
resulted from the previous chapter, and we make recommendation on those that can
better capture the predictive quality of an MSM.
The chapter begins with background information on methods used for the assessment
of the predictive accuracy of complex models in general, as well as survival models
in particular. It continues with the description of the methods suggested for the
assessment of the predictive accuracy of an MSM. We further describe the simulation
study conducted in order to compare the performance of the suggested methods. For
the purposes of this study, we applied the methods on each one of the two calibrated
MSMs resulted from chapter 3. It follows a detailed analysis of the simulation results
133
accompanied with suggestions on the most appropriate method to be used under
certain circumstances. The chapter concludes with future work in the field.
4.1 Background
4.1.1 Assessment of MSMs
An integral part in the development of a new MSM, as in any predictive model, is
the assessment of the model’s predictive accuracy (92; 105). After having discussed
in detail the two major building blocks in the development of an MSM, i.e. model
specification and calibration, this chapter is concerned with this property of a the
model. Assessment of complex models in general contains the notions of model
validation (internal and external), sensitivity analysis, characterization of uncertainty
and predictive accuracy (92; 105).
The development of an MSM is typically accompanied by a validation analysis. For
example model validation may use empirical approaches (118; 3; 4; 65; 23), chi-
square (94; 18; 70) and likelihood statistics (94), as well as, posterior estimates of
model parameters, and posterior predictive distributions of model outcomes (90).
Validation has been discussed in detail in the previous chapter.
The assessment of uncertainty in MSMs, as in any other complex model, is also of
central concern, with a wide range of relevant references, from a brief introduction to
the problem of measuring uncertainty in complex decision analysis models (83), to the
development and implementation of complicated relevant methods. Such methods
include Bayesian approaches for characterizing uncertainty with emphasis to model
structure (12; 88), expression of patient heterogeneity and parameter uncertainty
(48; 55), applications of Probabilistic Sensitivity Analysis (PSA) (17; 7; 80; 81), etc.
In contrast to the assessment of uncertainty, the assessment of the predictive accuracy
134
of an MSM has not received a systematic attention in the literature. However,
the assessment of this quality characteristic is essential, since, as it is subsequently
noted, one of the most important goal of MSMs, is to accurately predict intervention
effects on individual level, and, consequently, on homogeneous sub-groups of patients.
The study, implementation, and suggestion on statistical measures for assessing the
predictive accuracy of MSM is the main objective of this chapter.
4.1.2 Predictive accuracy of MSMs
Micro-simulation models are broadly used to simulate entire populations with specific
characteristics and, often, under different hypothetical scenarios (interventions) (91).
The ultimate goal is to use these MSMs to make projections about the possible
evolution of the disease or even, when relevant, about the effect of an intervention
on the population, so as to inform health policy decisions (92).
However, there are also examples in the literature where, individual level data are
used to populate MSMs in order to test additional hypothesis or to enhance the
validity of the main findings of the study. McMahon et al. (2008), for instance,
populate the Lung Cancer Policy Model with individual level data from the Mayo
CT screening, single-arm trial, in order to simulate both the observed screening as
well as the missing control arm. They aimed in this way to compare original findings
from the Mayo CT study with estimates about lung cancer incidence and mortality
from a hypothetical control arm with perfectly matched baseline characteristics.
Henderson et al. (2001), on the other hand, emphasize the importance of accurate
point estimates, especially of the predicted survival times, mentioning, among oth-
ers, the effect this accuracy may have on administrating the most efficient treatment,
saving of valuable resources, as well as guiding personal decisions regarding the re-
maining lifespan of each individual. They also refer to other practical needs and
135
pressures imposed by the relevant Health system, which can be vitally assisted by
informed decisions based on accurate survival times predictions. These arguments
coincide with one of the main goals of comparative effectiveness research (CER),
namely the development of adequate methodology to study differences in treatment
response between sub-groups of patients, as well as the enhancement of informed
medical decisions on individual level basis (112; 25). MSMing comprises an essential
tool for predicting intervention effects on individuals, and, consequently, on homo-
geneous subgroups, hence can be an integral part of the conduct of CER studies.
The aforementioned examples of the use of MSMs to inform health decisions, point
out the need for methods to assess the predictive accuracy of MSMs. Perhaps, one
of the most important reasons for lack of references with relevant research, is that,
although very important, the prediction of accurate individual trajectories is a very
complicated task, the intricacy of which increases with the number of individual-level
characteristics involved. In this chapter we suggest methods from the literature that
could be used for the assessment of the predictive accuracy of this type of models.
The simulation study we conducted exemplifies the necessity of this methods in order
to compare two similar, “well” calibrated MSMs.
Predictive accuracy pertains to the ability of a model to correctly predict individ-
ual outcomes. Steyerberg et al. (105) provide an overview of traditional and novel
measures for assessing the performance of prediction models in general. The au-
thors categorize methods into three broad categories, namely, measures of explained
variation (R2-statistics), other quadratic scores of the proximity between predictions
and actual outcomes (GoF statistics such as MSE, Deviance, Brier score, etc.), and
measures of the model’s discrimination ability (C-statistics, ROC curves).
Measures of explained variation (R2-statistics), although very interesting, are hard
to derive in the context of MSM. Such an attempt would require systematic work on
136
identifying of all sources of uncertainty inherent in an MSM, as well as expression of
this uncertainty to the model outcomes. Research on this topic is part of the future
work related to this thesis. We also discussed GoF statistics in the previous chapter,
in the context of the calibration of an MSM. In that setting, we are mostly interested
in the comparison of the overall summary statistics predicted by the model and the
actual data (calibration data) found in the literature of lung cancer, to determine a
”well” calibrated MSM.
In this chapter we focus on the accuracy of individual MSM predictions. The reason
for that is that it is possible for a “good” MSM, according to some overall GoF mea-
sures, to perform poorly when it comes to individual predictions. The streamlined
MSM, for example, may predict lung cancer incidence rate very close to the calibra-
tion target for a specific age group. However, the individuals, for which the MSM
predicted lung cancer, may differ considerably from those who actually did develop
lung cancer.
Depending on the outcome of interest (e.g., continuous, ordinal, binary or survival
data), as well as the type of model’s predictions (e.g., prediction of the actual out-
come, risk score, survival probability, etc) the predictive performance of an MSM
can be assessed using a variety of statistical measures. Since MSMs are designed to
predict individual patient trajectories, and in order to exploit the most comprehen-
sive predicted information, in this chapter we naturally consider MSMs as a special
type of survival models.
Assessing the predictive ability of survival models is a more complicated task than
in models for binary outcomes, such as logistic regression models. The complexity
problem in survival data analysis is due to the presence of censored observations
for which the information about the event of interest is missing. The only thing
known, for these observations, is that, up to the censoring time the subject had not
137
experienced the event of interest. The assessment of the performance of survival
models usually entails comparison of the predicted risk (rather than the predicted
survival times) with the observed ones, usually given a set of covariates. The reason
for this is that predicted survival times are not readily available for this type of
models.
Several measures for the assessment of the predictive accuracy of a survival model
have been suggested in the literature (46; 42; 100; 2; 9; 32; 93). An important class
of measures is that of concordance statistics (C-statistics), which focus on discrimi-
nation, namely the desired property of the model to correctly classify subjects, given
a set of covariates, based on the predicted risk (57; 46).
The most widely used index, due to its simplicity, is the C-index proposed by Harrell
et al. (1996). Pencina and D’Agostino (2004) study the statistical properties of C
and show the relationship between this index and the modified Kendall’s τ . Similar
indices were studied by Gonen and Heller (2005), for the evaluation of Cox propor-
tional hazards models, and Uno et al. (2011). The later is applicable to any type of
survival models that provide an explicit form of the predicted risk as a function of
the model parameters and covariates.
A common characteristic of the C-statistics, proposed for the assessment of a survival
model, is that they all are based on comparisons between actual survival status and
predicted risk score, a closed form expression of which is obtained from the model.
The main reason is that actual predicted survival times are not readily available
from these commonly used survival models, but they rather require some further
processing of the predicted risk, entailing a certain amount of subjectivity in the final
prediction. Furthermore, most of these models (proportional hazards and accelerated
failure time) imply a one-to-one correspondence between the predicted risk and the
expected survival times, therefore, these two quantities can be used interchangeably
138
to express a concordance relationship between observed and predicted outcomes.
Unlike most of the broadly used survival models, MSMs can predict time to events
and censoring status given the baseline characteristics of each individual, rather
than simple risk scores at specific time points. Therefore, assessing MSM predictive
accuracy should not solely involve concordance measures, because, in this way, a
significant portion of the predicted information (the actual predicted survival times)
is ignored. Investigators should rather use discrimination in conjunction with other
measures quantifying the proximity between predictions and actual outcomes on
an individual level basis. Following this reasoning, we suggest here comparisons
between the predicted and the observed survival function as supplementary means
to concordance statistics for assessing the predictive accuracy of an MSM.
We have to note here that assessment of the predictive performance of commonly used
survival models (e.g., Cox proportional hazards) is also possible through comparison
of the observed with the predicted survival. However, a key issue in this assessment is
the methodology used for the estimation of the predicted survival from those models
(79; 78; 36; 45; 100; 32), especially when the model incorporates time-dependent
covariates. Since no readily available predictions are available from these models,
the predicted survival is subject to additional assumptions (modeling mechanism)
beyond those stipulated in the model specification procedure. Therefore, assessment1
of the predictive accuracy of such models depends, not only on the model itself, but
also on the method used for obtaining predicted survival. On the contrary, prediction
of survival times is usually an integral part of the outcome of an MSM (as is the case
with our streamlined MSM), therefore assessment of the predictive performance is
straightforward, and refers directly to the model itself and not some other external
estimation procedure.
1 A systematic review of methods used for the assessment of the predictive performance of riskprediction models can be found in Gerds et al. (2008)
139
A variety of statistics for comparing survival functions is available in the literature.
They include a set of tests based on the comparison of weighted Kaplan-Meier esti-
mates of the survival functions, such as the Log-Rank test (21), and tests based on
the weighted differences of the Nelson-Aalen estimates of the hazard rate, such as
the tests by Gehan (1965), Breslow (1970), Tarone and Ware (1977). These tests,
although very popular, are not very powerful to detect differences in crossing hazards
situations. A class of statistics that has been proposed to amend this shortcoming,
includes the Renyi-type and the Cramer Von Mises statistics. A detailed account of
the statistics used in this chapter for the comparison of the two survival curves can
be found in the Klein and Moeschberger (2003) survival analysis book.
In the following sections we describe in detail the statistics proposed for the assess-
ment of the predictive accuracy of an MSM, as well as the conduct of a simulation
study for the comparison of those methods in an MSM setting.
4.2 Methods
4.2.1 Notation
In order to describe the statistics suggested in this chapter for the assessment of the
predictive accuracy of an MSM, we have to introduce some special notation.
Let X1, X2, ..., XN , and X1, X2, ..., XN the observed and the predicted event
times respectively, and Z1, Z2, ..., ZN , p×1 vectors of covariates in a sample of N
individuals. In our case, where the objective is to predict individual trajectories using
the MSM for lung cancer, the covariates comprise age, gender and smoking history
of each individual. Let also Ti be the actual survival times and Di the corresponding
censoring variable, i.e, the time at which the subject is censored. We assume that D
is independent of T and Z. Let {(Ti, Zi, Di), i=1, ..., N} be N independent copies of
140
{(T, Z, D)}. For each individual i we only observe (Xi, Zi, ∆i) where Xi=min(Ti,
Di) and ∆i =
1, if Xi= Ti
0, otherwise.
Furthermore, when comparing the survival between two samples, t1, t2, ..., tK denote
the distinct event times in the pooled sample, Ykj the number of individuals at risk,
and qkj the total number of events, observed in sample j at time tk, where k=1,2,...,K.
In addition Yk =∑2
j Yjk, and qk =∑2
j qjk are the total number of individuals at risk
and total number of events respectively, in the pooled sample at time tk. Following
this notation, the Kaplan-Meier estimator of the survival function, for example in
the pooled sample is:
S(t) =
1, if tk < t1∏tk≤t(1−
qkYk
), otherwise(4.1)
while the Nelson-Aalen estimator of the cumulative hazard is:
H(t) =
0, if tk < t1∑tk≤t
qkYk, otherwise
(4.2)
4.2.2 Concordance statistics
Definition Let (X1, T1), ... (XN , TN) be a sample of bivariate, continuous obser-
vations. The concordance (C) index for a pair of them, let say (X1, T1) and (X2, T2)
is defined in general as (84):
C = pr(T1 > T2|X1 > X2) (4.3)
The concordance index has been widely used, for the assessment of the predictive
accuracy of regression models for survival data. In this setting the C-index can take141
either of the following two forms:
C = pr(g(Z1) > g(Z2)|T1 < T2) (4.4)
or
C = pr(T1 < T2|g(Z1) > g(Z2)) (4.5)
where
Ti denotes the actual survival time and g(Zi) is some expression of the risk for the
ith individual as a function of the set Z of covariates.
In the first case (eq. 4.4), the concordance probability is defined conditionally on the
true value and can be considered an expression of the model’s sensitivity (i.e., the
probability the model correctly classifies the observations given the ”truth”). The
second form of the concordance probability expression is defined conditionally on
the test value and is analogous to the predictive value of a diagnostic test, in that
it expresses the probability of having a certain ordering in the observed times given
what the model predicts for these specific data. Most of the C-statistics for survival
models are developed based to estimate conditional probability presented in equation
4.4 (39; 113), while estimates of the other conditional probability are also discussed
in the literature (34)
The concordance index can be used to quantify one of the key aspects of the predictive
accuracy, namely the discrimination ability of a statistical model (105). It takes
values between 0.5 and 1. A C-index equal to 1 indicates perfect discrimination
ability while values of the index closer to 0.5 indicate poor discrimination ability of
the model.
142
Harrell’s index
Perhaps the most well-known, easy to compute and, therefore, broadly used measure
of the discrimination ability of a survival model, is the Harrell’s C-statistic (39). Let
consider all different pairs of subjects (i,j), i<j. A pair is said to be concordant if
(Xi < Xj and Xi < Xj) or (Xi > Xj and Xi > Xj). The overall C index suggested
by Harrell et al. (39) is defined as the proportion of all usable concordant pairs in
the sample. Every pair of subjects, at least one of whom had experienced the event
of interest, is usable. This index provides an estimate of the concordance probability
(eq. 4.4) as:
CH =
∑i 6=j ∆iI(Xi < Xj)I(Xi < Xj)∑
i 6=j ∆iI(Xi < Xj)(4.6)
Uno’s index
Uno et al. (113) focus on the estimation of a truncated version of the concordance
probability (eq. 4.4), i.e.:
C = pr(g(Z1) < g(Z2)|T1 > T2, T1 < τ) (4.7)
where τ is a pre-specified time point, the only restriction of which being that it should
be greater than the shortest censoring time observed. The truncation is introduced
to address the problem of the unstable estimation of the tail part of the survival
function.
Uno et al employ an ”inverse probability weighting” technique, (10), and propose a
non-parametric-estimate of the concordance probability. The most important fea-
ture of the Uno’s C-statistic is that, unlike Harrell’s index, it does not depend on
the study-specific censoring distribution. Using a simulation study, Uno et al. (2011)
show that this index is in general robust to the choice of τ and it performs most of
143
the times better or at least equally well to the Harrell’s index.
144
4.2.3 Hypothesis testing
The second set of methods proposed in this chapter for the assessment of the pre-
dictive accuracy of an MSM, comprise statistical tests for the comparison of the
predicted with the observed survival curve. In particular we compute the log-rank
statistic, a Renyi type statistic, and two different versions of a Cramer-von Mises
type statistic. Each of these statistics are used to test the null hypothesis H0 that
there is no difference in the survival distributions between the two samples (observed
versus predicted data).
Log-Rank statistic
We first apply the well known and broadly used log-rank test (85), which, following
the notation previously introduced, employs the statistic:
Z =
∑Kk=1(qk1 − Yk1
qkYk
)∑Kk=1
Yk1Yk
(1− Yk1Yk
)(Yk−qkYk−1
)qk(4.8)
which under the H0 has a standard normal distribution. The main limitation of this
test is that it does not perform very well in crossing hazard situations.
Renyi type tests
The Renyi type statistics aim at comparing two (or more) survival distributions in
a way analogous to the Kolmogorov-Smirnov test for uncensored data (54). These
statistics are more powerful to detect differences in crossing hazards situations. In
our case, we implement the “log-rank” version of this test. The statistic used for
testing the null hypothesis is:
Q =sup{|Z(t)|, t ≤ τ}
σ(τ)(4.9)
145
with
Z(tα) =∑tk≤tα
[qk1 − Yk1
(qkYk
)], α = 1, ..., K (4.10)
and
σ2(τ) =∑tk≤τ
(Yk1
Yk
)(Yk2
Yk
)(Yk − qkYk − 1
)qk (4.11)
where τ is the largest tk for which Yk1, Yk2 > 0.
The statistic Q under the null hypothesis can be approximated by the distribution
of sup{|B(x)|, 0 ≤ x ≤ 1}, where B is a standard Brownian motion process. Critical
values of Q can be found in relevant tables. The supremum of the absolute deviations
in the calculation makes the test more powerful than the simple log-rank test to detect
(existing) differences between two crossing survival curves.
Cramer-von Mises tests
The last two statistics used for the comparison between observed and predicted
survival belong to the Cramer-von Mises type of statistics, which are also analogue
of the Kolmogorov-Smirnov test for comparing two cumulative distribution functions
(54). Both statistics depend on the weighted squared differences between the Nelson-
Aalen estimates of the respective survival functions. The first statistic used for this
type of test, is defined as:
Q1 =
(1
σ2(τ)
)∑tk≤τ
[H1(tk)− H2(tk)
]2 [σ2(tk)− σ2(tk−1)
](4.12)
with t0=0, and the summation calculated over the distinct death times up to time
τ , which is the largest tk for which Yk1, Yk2 > 0, i.e., for that death time for which
146
there are still subjects at risk in both samples. Furthermore, H(tk) (j=1,2 for the two
samples, observed and predicted), is the Nelson-Aalen estimator of the cumulative
hazard function (section 4.2.1), with estimated variance:
σ2j =
∑tj≤t
qijYij(Yij − 1)
, j = 1, 2 (4.13)
The Q1 statistic is based on the difference between H1(t) and H2(t), the variance of
which is estimated as:
σ2(τ) = σ21(t) + σ2
2(t) (4.14)
The statistic of the alternative version of the Cramer-von Mises test applied in this
chapter, is defined as:
Q2 = n∑tk≤τ
[H1(tk)− H2(tk)
1 + nσ2(tk)
]2
[A(tk)− A(tk−1)] (4.15)
where,
A(t) =nσ2(t)
[1 + nσ2(t)]
Under the null hypothesis Q1 and Q2 approximately have the same distribution
with R1 =∫ 1
0[B(x)]2dx, where B(x) is a standard Brownian motion process, and
R2 =∫ A(τ)
0[B0(x)]2dx, where B0(x) is a Brownian bridge process respectively. The
critical values of these two processes are also provided in relevant tables.
Note here that there is some loss of power when using either of the two Cramer-von
Mises tests compared to the log-rank test (97). However, Q1 performs almost equally
well when the hazard rates of the two samples are proportional, while Q2 perform
better compared to the other tests, in the case of large early differences when the
hazards rates cross.
147
4.2.4 Simulation Study
The purpose of the simulation study conducted in this chapter is to implement and
compare the alternatives approaches, denote their differences, and make suggestions
about the most suitable ones to be used for the assessment of the predictive accuracy
of an MSM. To this end, the methods were used to assess and compare the predictive
accuracy of the two calibrated MSMs, obtained in Chapter Two. These two MSMs
have exactly the same structure and were calibrated to the same targets using two
different calibration methods, a Bayesian and an Empirical one. The two meth-
ods resulted in different MSMs with respect to the set of values for the calibrated
parameters.
As input we used a sample of N=5000 men (smpl.15000), current smokers, randomly
drawn from the 1980 US population (smpl100,000, Chapter 3). Note here, that this
sample is different from the one used for the implementation of the two calibration
methods (smpl.C5000). As in chapter II, the baseline characteristics taken into ac-
count for predicting trajectories are age, and smoking intensity, expressed as average
number of cigarettes smoked per day, for each individual.
For the assessment of the predictive accuracy of the MSM we need to know the truth,
namely if and when each person developed lung cancer. In the absence of real data on
the time of the development of lung cancer in the group used in the simulation study,
we simulated the truth. Specifically, we use two simplified “toy” models, which, given
only age, predict time to death and time to lung cancer diagnosis for each individual.
The first simplified model (truth model 1 toy.1) uses exponential distributions to
predict these to time points, while the second one (truth model 2 toy.2) uses Gumbel
distributions. The simulated truth about the censoring status is obtained from the
comparison of the two predicted times for each individual. For instance, if predicted
time to death is larger than predicted age to lung cancer diagnosis, the prediction
148
indicates that this person had the event otherwise it is censored at the age of death.
Ad hoc estimates of the exponential and the Gumbel distributions involved in these
simulations, were chosen so as overall lung cancer incidence rates by age group (i.e.,
<60, 60-80, and >80 years old) to approximate those reported in the 2002-2006
SEER data.
We apply these two “toy” models on the input data (smpl.11000), in order to simu-
late the truth about the age at the development of lung cancer for each individual.
Subsequently the same sample is used as input to each of the two calibrated MSMs,
resulted from chapter II, in order to also predict lung cancer incidence. The compar-
isons between the predictions and the simulated “truth”, will provide an indication
about the adequacy of each proposed method to assess the predictive performance
of an MSM.
As indicated in section 3.2.4, the results from each calibration method is a set of
V=1000 vectors for the four MSM parameters, calibrated in the previous chapter. A
single run of the MSM pertains to the implementation of the model once, in order
to make predictions (one trajectory for each individual) about the input sample of
interest, given one vector of parameter values. In tables we present summary results
of the model’s performance for different number V of parameter vectors (i.e., V=200,
400, 600, 800, and all 1000). In this way we are also able to investigate the effect
the total number of microsimulations has on the final conclusions from the applied
statistics.
149
4.3 Results
4.3.1 Single run of the MSM
For V=1 we present Kaplan-Meier curves of the predicted against the observed sur-
vival functions. We also provide estimates of the suggested measures for assessing
the MSM’s predictive accuracy. Test statistics are accompanied by the respective
p-values.
The results from the implementation of the assessment methods on the MSMs, using
only one vector of calibrated parameter values, indicate that simulated “observed”
lung cancer survival using the first toy model (toy.1, exponential distributions) is
very close to the predictions from both models (Figures 4.1 and 4.2), although the
survival functions resulted from the predictions of the Bayesian calibrated MSM,
crosses with the observed survival.
Table 4.1: Assessment of the predictive accuracy of the two calibrated MSMs: Pre-dicted versus simulated (from toy.1 model) survival.
Method Calibrated MSMBayesian Empirical
Harrel’s index 0.779 0.754
Uno’s indexτ = 100 0.641 0.568
τ = 80 0.733 0.710
Log-Rank x2 7.313 3.013(p-value) (0.00685) (0.0826)
Renyi test Q 4.03 2.11(p-value) (< 0.01) (0.06)
Cramer-von Mises
Q1 0.654 2.26(p-value) (>0.01) (< 0.025)
Q2 1.66 0.326(p-value) (<0.02) (>0.1)
150
Figure 4.1: Kaplan-Meier curves of the predicted versus the observed (simulated bythe first toy model) survival.
151
The proximity between the predicted and the observed survival is also verified by
most of the statistics applied for the assessment of the model (Table 4.1). The C-
statistics are similar for the two models, with slightly higher values for the Bayesian
model. Also the log-rank, Renyi type and Cramer-von Mises (Q2) tests, all reject
the null hypothesis for the predictions from the Bayesian model but do not reject
for those from the Empirically calibrated MSM at α = 5%. However, we draw the
opposite conclusions when looking at the Q1 statistic, according to which, observed
survival is similar with the predicted one from the Bayesian model but differs from
the one predicted by the Empirical MSM. The reason for this is probably because, as
already mentioned, Q2 performs better than the other tests in cases like this, namely
when the hazard rates cross and we observe relative large, early differences among
them.
When it comes to the comparison of the predictions with the simulated truth from
the second toy model (figure 4.2), observed survival is very close to the predicted
one from the Bayesian model, but differs considerably from the predicted survival
by the Empirically calibrated MSM. This difference, apparently cannot be captured
by neither of the C-statistics applied, since the respective estimates are very close
for the two models (table 4.2). On the contrary, this difference is reflected on the
results from all the statistical tests (log-rank, Renyi type, and Cramer-von Mises).
None of these tests rejects the null hypothesis for the Bayesian model, while they all
reject it for the Empirically calibrated model, at least at α = 5% significance level.
152
Figure 4.2: Kaplan-Meier curves of the predicted versus the observed (simulated bythe second toy model) survival.
Table 4.2: Assessment of the predictive accuracy of the two calibrated MSMs: Pre-dicted versus simulated (from toy.2 model) survival.
Method Calibrated MSMBayesian Empirical
Harrel’s index 0.799 0.796
Uno’s indexτ = 100 0.762 0.719
τ = 80 0.807 0.790
Log-Rank x2 0.027 18.52(p-value) (0.869) (<0.0001)
Renyi test Q 1.894 4.317(p-value) (0.110) (< 0.01)
Cramer-von Mises
Q1 0.724 2.853(p-value) (>0.01) (< 0.01)
Q2 0.325 1.318(p-value) (> 0.01) (< 0.02)
153
4.3.2 Multiple runs of the MSM
We also assessed the predictive accuracy running each of the two calibrated MSMs
multiple times, i.e., for multiple vectors V of values for the calibrated parameters.
In particular, we run each MSM for five different cases, namely for V=200, 400, 600,
800, and 1000 vectors of parameter values, in order to also investigate the effect the
total number of MSM runs has on the results from this assessment. We compare
predictions with simulated truth from both toy models. For each case we provide
Kaplan-Meier estimates of the predicted versus the observed survival probabilities.
We further provide summary statistics to describe the results from the application of
each statistical method for the assessment of the predictive accuracy of the model.
In particular, we report means and standard deviations of the concordance statistics
(Harrell’s and Uno’s index) from V implementations of each of these measures on
the MSM predictions. Furthermore, for the statistics comparing the observed with
the predicted survival we report the percentage of times the test has not rejected
the H0 at α = 5%, i.e., the hypothesis that the predicted survival is the same as the
“observed” (simulated) one.
According to the produced graphs (Kaplan-Meier curves in figures 4.3 to 4.12), as
well as the respective tables with the summary statistics (tables 4.3 and 4.4) from the
implementation of the methods, that are suggested in this chapter for the assessment
of the predictive accuracy of an MSM, the total number V of MSM runs, does not
appear to affect the final conclusions. Apparently, even a V=400 appears adequate
to draw safe conclusions about the predictive accuracy of the two MSMs, calibrated
in the previous chapters.
The simulated true survival, simulated by the first toy model, lies within the range of
the predictions from both MSMs, for all five cases (i.e., for V=200, 400, 600, 800, and
1000). This means that, overall, the individual predictions from the two models are
154
very close to the observed survival, resulted from the first toy model. This proximity
between the two survivals is reflected on the summary statistics of all the methods
suggested in this chapter (table 4.3).
The estimates of the Harrell’s and Uno’s index are almost identical for the two
models. The results from the applied tests are also very close for the two MSMs with
a small difference between the non-rejection of the H0 rate, in favor of the Bayesian
calibrated MSM according to the first three tests. However, when looking at the
Cramer-von Mises Q2 test, the difference between the non-rejection rates is bigger
and reversed, namely in favor of the Empirically calibrated MSM. This finding is in
line with the characteristics of this specific test. As already mentioned, Q2 performs
well when there is a large early difference in the hazard rates. The Kaplan-Meier
plots reveal much more dispersed predicted survival curves earlier time points for the
Bayesian compared to the Empirically calibrated MSM, consequently the difference
between predicted and observed survival is larger at those points for the Bayesian
MSM. This difference is reflected on the results from the implementation of the Q2
test.
155
Figure 4.3: Kaplan-Meier curves of the predicted (for V=200 vectors of the calibratedMSM parameters) versus the observed (simulated by the first toy model) survival.
Figure 4.4: Kaplan-Meier curves of the predicted (for V=400 vectors of the calibratedMSM parameters) versus the observed (simulated by the first toy model) survival.
156
Figure 4.5: Kaplan-Meier curves of the predicted (for V=600 vectors of the calibratedMSM parameters) versus the observed (simulated by the first toy model) survival.
Figure 4.6: Kaplan-Meier curves of the predicted (for V=800 vectors of the calibratedMSM parameters) versus the observed (simulated by the first toy model) survival.
157
Figure 4.7: Kaplan-Meier curves of the predicted (for V=1000 vectors of the cal-ibrated MSM parameters) versus the observed (simulated by the first toy model)survival.
In the second example we compare the predictions with the “true” survival, simulated
using the second model. According to figures from 4.8 to 4.12 the observed survival
curve, although marginally, lies within the range of the predicted survival curves
from the Bayesian calibrated MSM. This is not the case for the Empirically calibrated
MSM, for which a considerable part of the observed survival curve lies above the range
of the predicted ones. This is an example of a possible scenario, where two “well”
calibrated MSMs, i.e., two MSMs almost equivalent according to some overall GoF
measures, differ considerably when it comes to the individual predicted trajectories.
The estimates of both C-statistics are almost identical for the two models, thus indi-
cating that a concordance index cannot capture adequately the differences between
the predicted and the observed survival noted in the Kaplan-Meier curves. On the
contrary, the results from all the statistical tests of the two survival functions are
very different between the two models, indicating that the Bayesian calibrated MSM
is more accurate than the Empirically calibrated one. The difference between the
158
Table 4.3: Assessment of the predictive accuracy of the two calibrated MSMs com-pared to the simulated truth from toy model 1: Summary statistics of the estimatesof six different predictive accuracy measures.
Bayesian Calibrated MSM
C-statistic (mean±sd)* Test (%)**
VCramer -
Harrell Uno Log-Rank Renyi - von Mises(Z) (Q) (Q1) (Q2)
200 0.7808±0.0099 0.6746±0.0605 80.50 80.00 49.50 79.00
400 0.7806±0.0095 0.6740±0.0560 79.75 82.75 52.75 83.00
600 0.7804±0.0096 0.6740±0.0559 80.33 82.17 52.50 82.83
800 0.7801±0.0096 0.6740±0.0555 80.50 83.25 50.88 84.88
1000 0.7802±0.0095 0.6741±0.0557 80.00 82.40 53.60 84.10
Empirically Calibrated MSM
C-statistic (mean±sd)* Test (%)**
VCramer -
Harrell Uno Log-Rank Renyi - von Mises(Z) (Q) (Q1) (Q2)
200 0.7804±0.0092 0.6683±0.0587 73.50 73.00 49.00 98.50
400 0.7794±0.0090 0.6730±0.0567 71.25 71.00 48.00 98.50
600 0.7791±0.0089 0.6729±0.0555 71.50 73.17 45.50 99.00
800 0.7787±0.0090 0.6722±0.0546 71.13 73.25 45.13 99.13
1000 0.7787±0.0090 0.6718±0.0548 71.30 73.50 45.00 98.60
* Means and standard deviations of the C-indices estimates, from the V
implementations.
**Percentage of times, in the V implementations, that the test did not reject
the H0 at α = 5%.
159
two models is more prominent when looking at the results from the log-rank test,
and smaller based on the results from the implementation of the Cramer-von Mises
Q1 test.
Figure 4.8: Kaplan-Meier curves of the predicted (for V=200 vectors of the calibratedMSM parameters) versus the observed (simulated by the second toy model) survival.
Figure 4.9: Kaplan-Meier curves of the predicted (for V=400 vectors of the calibratedMSM parameters) versus the observed (simulated by the second toy model) survival.
160
Figure 4.10: Kaplan-Meier curves of the predicted (for V=600 vectors of the cali-brated MSM parameters) versus the observed (simulated by the second toy model)survival.
Figure 4.11: Kaplan-Meier curves of the predicted (for V=800 vectors of the cali-brated MSM parameters) versus the observed (simulated by the second toy model)survival.
161
Figure 4.12: Kaplan-Meier curves of the predicted (for V=1000 vectors of the cali-brated MSM parameters) versus the observed (simulated by the second toy model)survival.
4.4 Discussion
Given that MSMs usually can predict, among other outcomes, actual survival time
and censoring status for each individual, we consider them as a special type of survival
predictive models. In this chapter we implement two concordance indices broadly
used for assessing the predictive accuracy of survival models. Furthermore, we sug-
gest and implement four different hypothesis tests, the log-rank test, a Renyi type
test, and two Cramer-von Mises tests, as alternative methods to assess the predictive
accuracy of an MSM. These tests compare the observed with the predicted survival
curve.
It is important to note here that the suggested hypothesis testing methods account
for the effect of censoring in a competing risks setting, as is the case in the prediction
of lung cancer incidence and mortality given smoking. The MSM takes into account
the presence of competing risks when modeling mortality and consequently the KM
162
Table 4.4: Assessment of the predictive accuracy of the two calibrated MSMs com-pared to the simulated truth from toy model 2: Summary statistics of the estimatesof six different predictive accuracy measures.
Bayesian Calibrated MSMC-statistic (mean±sd)* Test (%)**
VCramer -
Harrell Uno Log-Rank Renyi - von Mises(Z) (Q) (Q1) (Q2)
200 0.7943±0.0083 0.7298±0.0307 29.50 27.00 29.00 62.50400 0.7946±0.0079 0.7295±0.0306 26.00 24.25 25.25 58.00600 0.7944±0.0078 0.7304±0.0308 26.33 24.00 26.17 58.17800 0.7942±0.0076 0.7301±0.0307 24.75 22.50 24.25 57.251000 0.7943±0.0077 0.7305±0.0308 26.10 24.10 25.50 59.50
Empirically Calibrated MSMC-statistic (mean±sd)* Test (%)**
VCramer -
Harrell Uno Log-Rank Renyi - von Mises(Z) (Q) (Q1) (Q2)
200 0.7932±0.0079 0.7308±0.0296 3.00 9.00 12.50 29.00400 0.7927±0.0081 0.7300±0.0323 3.50 8.00 15.75 27.75600 0.7928±0.0081 0.7292±0.0322 2.50 7.50 15.83 27.50800 0.7930±0.0080 0.7293±0.0323 2.38 7.63 16.13 28.631000 0.7928±0.0080 0.7292±0.0319 2.00 7.10 15.90 28.90
* Means and standard deviations of the C-indices estimates, from the V
implementations.
**Percentage of times, in the V implementations, that the test did not reject
the H0 at α = 5%.
163
curve of the predicted survival times is adjusted accordingly. In the simulation study
we compared the predictions obtained by the MSM with the simulated truth, namely
a hypothetical observed KM curve that has been adjusted for the competing risks
problem. In practice, when implementing the hypothesis tests, it is advisable to
adjust the observed survival in order to account for the presence of competing risks,
so as to avoid bias in the survival estimates of the event of interest (54).
Summarizing the main findings from the simulation study first of all we note that
a single implementation of the MSM, for a randomly selected vector of parameter
values (V=1) is not sufficient for the comparison of the predictive accuracy of two
similar MSMs. Furthermore, as already indicated in section 3.2.4, MSM outputs
based on more than one sets of calibrated values for the model parameters, allow
for conveying parameter uncertainty in the final results. For these reasons multiple
runs of the model are recommended instead. Based on the results presented in this
chapter, a number of V=400 runs of the model is deemed adequate to draw safe
conclusions about the relative predictive accuracy of the two models.
In addition, concordance indices, although useful to measure the overall discrimi-
nation ability of a model, sometimes they may not be able to capture differences
between distinct observed and predicted survival times. The reason for this is that
concordances indices are based on the relative ranks of the observed and the predicted
values rather than the actual magnitudes. The estimates of the two C-statistics ap-
plied in the simulation study, are almost identical for the two model in all cases, thus
non-informative about the discrepancies observed, especially between MSMs predic-
tions and simulated “truth” from the second toy model. In the context of MSMing
other statistical measures are preferable to signify this characteristic of an MSM,
such as estimates of the mean squared error of the individual predictions.
In this chapter we also investigated the performance of several hypothesis tests for
164
survival data. These tests aim at comparing observed and predicted survival distri-
butions, and can provide an indication about the predictive accuracy of the model
with respect to the overall survival estimates for the event of interest.
The simulation study showed that the hypothesis tests result in the same conclusions
when there are relatively large differences between the observed and the predicted
survival, as in the case of the comparisons with the simulated truth from the second
toy model, where all tests indicated the same MSM to be more accurate. However, for
less prominent differences it is possible the tests to result in contradictory conclusions.
The reason for that lies on the specifics of each test, namely which differences (earlier
or later) each test weighs more in the calculations, as well as whether or not they
perform well in crossing hazard situations. In such a case, it is unclear whether
the individual predictions from one MSM is more accurate than the respective ones
from the other. Therefore further investigation of the situation is required, and the
final conclusions will also depend on the type of differences we are more interested
in detecting.
Furthermore, the log-rank and Renyi type tests lead to similar results about the
predictive accuracy of the two models. However the log-rank test proved a little bit
more sensitive in detecting more prominent differences between the observed and the
predicted survival curves, compared to the Renyi type test.
In future work of high priority we will apply the suggested methods to assess the
predictive accuracy of the two calibrated MSMs using real data from the National
Lung Screening Trial (NLST) (28; 1). This is a large scale, randomized, multicenter
study aimed at comparing the effect of two different screening tests, i.e., low-dose
helical computed tomography (CT), and chest radiography on the lung cancer mor-
tality of current and heavy smokers. Another very interesting application will be the
comparison of two structurally different, yet comparable, MSMs using the methods
165
suggested in this chapter. Special attention and additional work is required on the
correct incorporation of between-subjects variability in the assessment, as well as
the expansion of the methods to base assessment results on multiple outcomes of
interest. Finally, another interesting objective for further research is the considera-
tion of censoring in MSM individual predictions, as well as in the assessment of the
predictive performance of this type of models.
Finally, another very interesting objective for further research is the construction and
use of a predictive accuracy measure focused on the predictions obtained for each
specific individual. Such a measure would be based on the mean squared differences
of the individual predictions (MSEP) from the observed data (35; 36). These squared
differences could refer to estimates of predicted versus observed survival probabilities
or times to events for each individual.
166
Chapter 5
Conclusions
The main objective of this thesis was to study statistical methods for the develop-
ment and evaluation of micro-simulation models. In this chapter we summarize the
findings, as well as future work related to this research.
We began the work for this dissertation by developing an MSM that describes the
natural history of lung cancer. This model was then used as a tool for the implemen-
tation and comparison of a Bayesian and an Empirical calibration method, aimed
at specifying sets of MSM parameter values that provide good fit to the observed
quantities of interest. Finally, we have adapted tools from survival data analysis to
evaluate the predictive accuracy of a calibrated MSM.
The streamlined MSM, developed in Chapter 2, combines some of the best practices
followed in the modeling of the natural history of lung cancer and can be used for
valid predictions about the course of the disease. The development of this MSM in an
open source statistical software (R.3.0.1), enhances the transparency of the model,
facilitates research on the statistical properties of MSMs in general, and promotes
the improvement and expansion of the model to describe the course of lung cancer
in more detail, with the collaboration of scientists from several fields.
The comparative analysis presented in Chapter 3 showed that both calibration meth-
167
ods result in extensively overlapping results with respect both to sets of values for
the calibrated parameters, as well as predictions obtained by each model. However
only the Bayesian calibration method provides a sound theoretical background for
the incorporation of prior beliefs in the model and the interpretation of the results
from this procedure. The ultimate goal of this method is to draw values for the joint
posterior distribution of the MSM parameters.
Furthermore, the Bayesian method results in an MSM that performs better in the
prediction of rare events compared to the Empirical one. The predictions from the
Empirically calibrated MSM, on the other hand, are less dispersed. In addition, the
Empirical method is more efficient with respect to the computational time required
for the entire calibration.
The Bayesian approach, when focused on estimation, may not serve the purpose of
model calibration. Actually, the performance of the Bayesian calibration method can
be considerably improved by adding in the procedure a “refinement” step, aimed at
the selection of those subset of parameter values which provide better fit of the MSM
to observed data, according to some pre-specified GoF measure.
Finally Chapter 3 emphasizes on the imperative need to use High Performance Com-
puting techniques in order to undertake a rather complicated task, as the calibration
of an MSM, in R. That is because, the implementation of a calibration procedure
involves multitudinous, independent micro-simulations, which can be carried out in
parallel, thus reducing the total required running time. The R package facilitates
parallel processing via special designed libraries that can set up and distribute the
task to large computer clusters.
According to the simulation study conducted in Chapter 4, concordance statistics,
although useful for assessing the overall discrimination ability of an MSM, may not
capture differences between observed and predicted survival. The accuracy of an
168
MSM, with respect to overall predicted survival, can be better assessed by apply-
ing hypothesis tests, used in survival analysis, to compare observed with predicted
survival curves. These tests account for the effect of censoring in a competing risks
setting, as in the case of the survival estimates of lung cancer incidence and mortality
given smoking. All the tests suggested in this chapter result in the same conclusion,
when the predictions, obtained by the MSM are very different from the respective
actual observations. Furthermore, the log-rank test proved more sensitive than the
other tests in detecting more prominent differences.
We intend to continue and extend our work in a number of important directions.
First, we plan to extend the original MSM in order to incorporate more detailed
information, as well as screening and treatment components, thus making it compa-
rable to existing models about lung cancer. We also plan the the publication of the
MSM in the form of a library into the CRAN package repository of the R statistical
software.
We used the two methods presented in Chapter 3 in order to calibrate the MSM
on data about males, current smokers. We plan to perform a complete calibration
of this MSM, that is, to calibrate the parameters so that the model will be able
to predict individual trajectories within narrower subgroups defined by covariates
beyond gender and smoking status. Furthermore, we will expand the methods so as
to account for multiple calibration targets.
We also intend to apply the methods suggested in Chapter 4 for the assessment of the
predictive accuracy of the MSM using actual data from the NLST study. It would
also be informative to study how measures of predictive performance can be used to
compare two completely different models, such as two structurally different MSMs
for lung cancer. More research is also required for the expansion of the methods so
as to account for multiple outcomes of interest, as well as for the incorporation of
169
the between-subject variability in the calculations.
Another very interesting topic for further consideration would be the construction of
a predictive accuracy measure, focusing on discrepancies of the individual predictions
from the observed data. This measure would be an estimate of the mean squared
error of the MSM predictions (MSEP). The quantities involved in the calculation
could be estimates of the survival probabilities for each particular individual, as well
as times to event or censoring.
170
Bibliography
[1] Aberle, D. R., Adams, A. M., Berg, C. D., Black, W. C., Clapp, J. D., Fager-
strom, R. M., Gareen, I. F., Gatsonis, C., Marcus, P. M., Sicks, J. D., and
Team, N. L. S. T. R. (2011), “Reduced Lung-Cancer Mortality with Low-Dose
Computed Tomographic Screening,” New England Journal of Medicine, 365,
395–409.
[2] Antolini, L., Boracchi, P., and Biganzoli, E. (2005), “A time-dependent dis-
crimination index for survival data,” Statistics in Medicine, 24, 3927–3944.
[3] Baker, R. (1998), “Use of a mathematical model to evaluate breast cancer
screening policy,” Health Care Management Science, 1, 103–113, 1386-9620.
[4] Berry, D. A., Inoue, L., Shen, Y., Venier, J., Cohen, D., Bondy, M., Theriault,
R., and Munsell, M. F. (2006), “Chapter 6: Modeling the Impact of Treatment
and Screening on U.S. Breast Cancer Mortality: A Bayesian Approach,” JNCI
Monographs, 2006, 30–36.
[5] Blower, S. M. and Dowlatabadi, H. (1994), “Sensitivity and Uncertainty Anal-
ysis of Complex Models of Disease Transmission: An HIV Model, as an Ex-
ample,” International Statistical Review / Revue Internationale de Statistique,
62, 229–243.
[6] Breslow, N. (1970), “Generalized Kruskal-Wallis Test for Comparing K Samples
171
Subject to Unequal Patterns of Censorship,” Biometrika, 57, 579–594, h9895
Times Cited:1156 Cited References Count:11.
[7] Briggs, A. H., O’Brien, B. J., and Blackhouse, G. (2002), “Thinking outside
the box: Recent advances in the analysis and presentation of uncertainty in
cost-effectiveness studies,” Annual Review of Public Health, 23, 377–401.
[8] Campbell, K. (2006), “Statistical calibration of computer simulations,” Relia-
bility Engineering & System Safety, 91, 1358–1363.
[9] Chen, H. C., Kodell, R. L., Cheng, K. F., and Chen, J. J. (2012), “Assess-
ment of performance of survival prediction models for cancer prognosis,” Bmc
Medical Research Methodology, 12.
[10] Cheng, S. C., Wei, L. J., and Ying, Z. (1995), “Analysis of transformation
models with censored data,” Biometrika, 82, 835–845.
[11] Chia, Y. L., Salzman, P., Plevritis, S. K., and Glynn, P. W. (2004),
“Simulation-based parameter estimation for complex models: a breast cancer
natural history modelling illustration,” Statistical Methods in Medical Research,
13, 507–524.
[12] Clyde, M. and George, E. I. (2004), “Model uncertainty,” Statistical Science,
19, 81–94.
[13] Cronin, K. A., Legler, J. M., and Etzioni, R. D. (1998), “Assessing uncertainty
in micro simulation modelling with application to cancer screening interven-
tions,” Statistics in Medicine, 17, 2509–2523.
[14] De Angelis, D., Sweeting, M., Ades, A. E., Hickman, M., Hope, V., and Ram-
say, M. (2009), “An evidence synthesis approach to estimating Hepatitis C
Prevalence in England and Wales,” .
172
[15] Detterbeck, F. C. and Gibson, C. J. (2008), “Turning gray: The natural history
of lung cancer over time,” Journal of Thoracic Oncology, 3, 781–792.
[16] Deutsch, J. L. and Deutsch, C. V. (2012), “Latin hypercube sampling with
multidimensional uniformity,” Journal of Statistical Planning and Inference,
142, 763–772.
[17] Doubilet, P., Begg, C. B., Weinstein, M. C., Braun, P., and McNeil, B. J.
(1985), “Probabilistic Sensitivity Analysis Using Monte Carlo Simulation,”
Medical Decision Making, 5, 157–177.
[18] Draisma, G., Boer, R., Otto, S. J., van der Cruijsen, I. W., Damhuis, R. A. M.,
Schr...o der, F. H., and de Koning, H. J. (2003), “Lead Times and Overdetection
Due to Prostate-Specific Antigen Screening: Estimates From the European
Randomized Study of Screening for Prostate Cancer,” Journal of the National
Cancer Institute, 95, 868–878.
[19] Eddelbuettel, D. (2013), “CRAN Task View: High-
Performance and Parallel Computing with R,” http://cran.r-
project.org/web/views/HighPerformanceComputing.html, [Online; Retrieved:
15-March-2013].
[20] Fine, J. P. and Gray, R. J. (1999), “A proportional hazards model for the
subdistribution of a competing risk,” J Am Stat Assoc, 94, 496–509.
[21] Fleming, T. R. and Harrington, D. P. (1981), “A Class of Hypothesis Tests
for One and 2 Sample Censored Survival-Data,” Communications in Statistics
Part a-Theory and Methods, 10, 763–794, ls917 Times Cited:73 Cited Refer-
ences Count:22.
[22] Foy, M., Spitz, M. R., Kimmel, M., and Gorlova, O. Y. (2011), “A smoking-
173
based carcinogenesis model for lung cancer risk prediction,” International Jour-
nal of Cancer, n/a–n/a, 1097-0215.
[23] Fryback, D. G., Stout, N. K., Rosenberg, M. A., Trentham-Dietz, A., Kuru-
chittham, V., and Remington, P. L. (2006), “Chapter 7: The Wisconsin Breast
Cancer Epidemiology Simulation Model,” JNCI Monographs, 2006, 37–47.
[24] Gampe Jutta, Z. S. (2009), “The Microsimulation tool of the MicMac project,”
2nd General Conference of the International Microsimulation Association, (Ot-
tawa, Canada).
[25] Garber, A. M. and Tunis, S. R. (2009), “Does Comparative-Effectiveness Re-
search Threaten Personalized Medicine?.” New England Journal of Medicine,
360, 1925–1927.
[26] Garg, M. L., Rao, B. R., and Redmond, C. K. (1970), “Maximum-Likelihood
Estimation of the Parameters of the Gompertz Survival Function,” Journal of
the Royal Statistical Society. Series C (Applied Statistics), 19, 152–159.
[27] Gatsonis, C. (2010), “The promise and realities of comparative effectiveness
research,” Statistics in Medicine, 29, 1977–1981.
[28] Gatsonis, C. A. and Team, N. L. S. T. R. (2011), “The National Lung Screening
Trial: Overview and Study Design,” Radiology, 258, 243–253.
[29] Geddes, D. M. (1979), “The natural history of lung cancer: a review based on
rates of tumour growth,” Br J Dis Chest, 73, 1–17.
[30] Gehan, E. A. (1965), “A Generalized Wilcoxon Test for Comparing Arbitrarily
Singly-Censored Samples,” Biometrika, 52, 203–223.
[31] Gerds, T. A., Cai, T. X., and Schumacher, M. (2008), “The performance of
risk prediction models,” Biometrical Journal, 50, 457–479.
174
[32] Gerds, T. A., Kattan, M. W., Schumacher, M., and Yu, C. (2013), “Estimat-
ing a time-dependentconcordance index for survival prediction models with
covariate dependent censoring,” Statistics in Medicine, 32, 2173–2184.
[33] Goldwasser, D. L. (2009), “Parameter estimation in mathematical models of
lung cancer [doctoral thesis],” Ph.D. thesis.
[34] Gonen, M. and Heller, G. (2005), “Concordance probability and discriminatory
power in proportional hazards regression,” Biometrika, 92, 965–970.
[35] Gorfine, M., Hsu, L., Zucker, D. M., and Parmigiani, G. (2013), “Calibrated
predictions for multivariate competing risks models,” Lifetime Data Anal.
[36] Graf, E., Schmoor, C., Sauerbrei, W., and Schumacher, M. (1999), “Assess-
ment and comparison of prognostic classification schemes for survival data,”
Statistics in Medicine, 18, 2529–2545.
[37] Gray, R. J. (1988), “A Class of K-Sample Tests for Comparing the Cumulative
Incidence of a Competing Risk,” Annals of Statistics, 16, 1141–1154.
[38] Habbema, J. D. F., van Oortmarssen, G. J., Lubbe, J. T. N., and van der
Maas, P. J. (1985), “The MISCAN simulation program for the evaluation of
screening for disease,” Computer Methods and Programs in Biomedicine, 20,
79–93.
[39] Harrell, F. E., Lee, K. L., and Mark, D. B. (1996), “Multivariable prognostic
models: Issues in developing models, evaluating assumptions and adequacy,
and measuring and reducing errors,” Statistics in Medicine, 15, 361–387.
[40] Hazelton, W. D., Clements, M. S., and Moolgavkar, S. H. (2005), “Multistage
carcinogenesis and lung cancer mortality in three cohorts,” Cancer Epidemiol-
ogy Biomarkers & Prevention, 14, 1171–1181.
[41] Hazelton, W. D., Luebeck, E. G., Heidenreich, W. E., and Moolgavkar, S. H.175
(2001), “Analysis of a historical cohort of Chinese tin miners with arsenic,
radon, cigarette smoke, and pipe smoke exposures using the biologically based
two-stage clonal expansion model,” Radiation Research, 156, 78–94.
[42] Heagerty, P. J. and Zheng, Y. Y. (2005), “Survival model predictive accuracy
and ROC curves,” Biometrics, 61, 92–105.
[43] Heidenreich, W. F., Jacob, P., and Paretzke, H. G. (1997), “Exact solutions
of the clonal expansion model and their application to the incidence of solid
tumors of atomic bomb survivors,” Radiation and Environmental Biophysics,
36, 45–58.
[44] Heidenreich, W. F., Luebeck, E. G., and Moolgavkar, S. H. (1997), “Some
properties of the hazard function of the two-mutation clonal expansion model,”
Risk Analysis, 17, 391–399.
[45] Henderson, R., Jones, M., and Stare, J. (2001), “Accuracy of point predictions
in survival analysis,” Statistics in Medicine, 20, 3083–3096.
[46] Hielscher, T., Zucknick, M., Werft, W., and Benner, A. (2010), “On the prog-
nostic value of survival models with application to gene expression signatures,”
Statistics in Medicine, 29, 818–29.
[47] Howlader, N., Noone, A., Krapcho, M., Neyman, N., Aminou, R., Waldron,
W., Altekruse, SF. Kosary, C., Ruhl, J., Tatalovich, Z., Cho, H., Mariotto,
A., Eisner, M., Lewis, D., Chen, H., Feuer, E., and Cronin, K. (posted to
the SEER web site, April 2012), “SEER Cancer Statistics Review, 1975-2009
(Vintage 2009 Populations),,” National Cancer Institute. Bethesda, MD.
[48] Hunink, M. G. M., Koerkamp, B. G., Weinstein, M. C., Stijnen, T., and
Heijenbrok-Kal, M. H. (2010), “Uncertainty and Patient Heterogeneity in Med-
ical Decision Models,” Medical Decision Making, 30, 194–205.
176
[49] Jit, M., Choi, Y. H., and Edmunds, W. J. (2008), “Economic evaluation of
human papillomavirus vaccination in the United Kingdom,” BMJ (Clinical
research ed.), 337, a769.
[50] Karnon J, Goyder E, T. P. M. S. T. I. B. J. e. a. (2007), “A review and cri-
tique of modelling in prioritising and designing screening programmes,” Health
Technology Assessment, 11.
[51] Kennedy, M. C. and O’Hagan, A. (2001), “Bayesian calibration of computer
models,” Journal of the Royal Statistical Society Series B-Statistical Method-
ology, 63, 425–450.
[52] Kim, J. J., Kuntz, K. M., Stout, N. K., Mahmud, S., Villa, L. L., Franco,
E. L., and Goldie, S. J. (2007), “Multiparameter Calibration of a Natural
History Model of Cervical Cancer,” American Journal of Epidemiology, 166,
137–150.
[53] Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983), “Optimization by
Simulated Annealing,” Science, 220, 671–680.
[54] Klein, J. P. and Moeschberger, M. L. (2003), Survival analysis: techniques for
censored and truncated data.
[55] Koerkamp, B. G., Stijnen, T., Weinstein, M. C., and Hunink, M. G. M. (2011),
“The Combined Analysis of Uncertainty and Patient Heterogeneity in Medical
Decision Models,” Medical Decision Making, 31, 650–661.
[56] Kopec, J. A., Fines, P., Manuel, D. G., Buckeridge, D. L., Flanagan, W. M.,
Oderkirk, J., Abrahamowicz, M., Harper, S., Sharif, B., Okhmatovskaia, A.,
Sayre, E. C., Rahman, M. M., and Wolfson, M. C. (2010), “Validation of
population-based disease simulation models: a review of concepts and meth-
ods,” Bmc Public Health, 10.
177
[57] Korn, E. L. and Simon, R. (1990), “Measures of explained variation for survival
data,” Statistics in Medicine, 9, 487–503.
[58] Koscielny, S., Tubiana, M., Le, M. G., Valleron, A. J., Mouriesse, H., Contesso,
G., and Sarrazin, D. (1984), “Breast-Cancer - Relationship between the Size of
the Primary Tumor and the Probability of Metastatic Dissemination,” British
Journal of Cancer, 49, 709–715.
[59] Koscielny, S., Tubiana, M., and Valleron, A. J. (1985), “A simulation model of
the natural history of human breast cancer,” Br J Cancer, 52, 515–524.
[60] Kullback, S. and Leibler, R. A. (1951), “On Information and Sufficiency,”
Annals of Mathematical Statistics, 22, 79–86.
[61] Laird, A. K. (1964), “Dynamics of Tumor Growth,” British Journal of Cancer,
18, 490–502.
[62] L’Ecuyer, P., Simard, R., Chen, E. J., and D., K. W. (2002), “An object-
oriented random-number package with many long streams and substreams.”
Operations Research, 50, 1073–1075.
[63] Leydold, P. L. and J. (2005), “rstream: Streams of Random Numbers for
Stochastic Simulation,” R News, 5, 16–20.
[64] Luebeck, E. G., Heidenreich, W. F., Hazelton, W. D., Paretzke, H. G., and
Moolgavkar, S. H. (1999), “Biologically based analysis of the data for the Col-
orado uranium miners cohort: Age, dose and dose-rate effects,” Radiation
Research, 152, 339–351.
[65] Mandelblatt, J., Schechter, C. B., Lawrence, W., Yi, B., and Cullen, J. (2006),
“Chapter 8: The SPECTRUM Population Model of the Impact of Screening
and Treatment on U.S. Breast Cancer Trends From 1975 to 2000: Principles
and Practice of the Model Methods,” JNCI Monographs, 2006, 47–55.
178
[66] Mannion, O., Lay-Yee, R., Wrapson, W., Davis, P., and Pearson, J. (2012),
“JAMSIM: a Microsimulation Modelling Policy Tool,” Jasss-the Journal of
Artificial Societies and Social Simulation, 15.
[67] Matloff, N. (2013), “Programming on Parallel Machines,”
http://heather.cs.ucdavis.edu/ matloff/158/PLN/ParProcBook.pdf, [On-
line; Retrieved: 13-March-2013].
[68] McCallum, Q. E. and Weston, S. (2012), “Parallel R,” O’Reilly.
[69] McKay, M. D., Beckman, R. J., and Conover, W. J. (2000), “A Comparison
of Three Methods for Selecting Values of Input Variables in the Analysis of
Output from a Computer Code,” Technometrics, 42, 55–61.
[70] McMahon, P. M. (2005), “Policy assessment of medical imaging utilization:
methods and applications [doctoral thesis],” Ph.D. thesis.
[71] McMahon, P. M., Kong, C. Y., Johnson, B. E., Weinstein, M. C., Weeks, J. C.,
Kuntz, K. M., Shepard, J. A. O., Swensen, S. J., and Gazelle, G. S. (2008),
“Estimating long-term effectiveness of lung cancer screening in the Mayo CT
screening study,” Radiology, 248, 278–287.
[72] Meza, R., Hazelton, W. D., Colditz, G. A., and Moolgavkar, S. H. (2008),
“Analysis of lung cancer incidence in the nurses’ health and the health pro-
fessionals’ follow-up studies using a multistage carcinogenesis model,” Cancer
Causes & Control, 19, 317–328.
[73] Moeschberger, M. L. and Klein, J. P. (1995), “Statistical methods for depen-
dent competing risks,” Lifetime Data Analysis, 1, 195–204.
[74] Moolgavkar, S. H. and Luebeck, E. G. (2003), “Multistage carcinogenesis and
the incidence of human cancer,” Genes Chromosomes Cancer, 38, 302–6.
[75] Moolgavkar, S. H. and Luebeck, G. (1990), “Two-Event Model for Carcinogen-179
esis: Biological, Mathematical, and Statistical Considerations,” Risk Analysis,
10, 323–341.
[76] Mountain, C. F. (1997), “Revisions in the International System for Staging
Lung Cancer,” Chest, 111, 1710–1717.
[77] Nelder, J. A. and Mead, R. (1965), “A Simplex Method for Function Mini-
mization,” The Computer Journal, 7, 308–313.
[78] Nielsen, B. (1997), “Expected survival in the Cox model,” Scandinavian Jour-
nal of Statistics, 24, 275–287.
[79] Nieto, F. J. and Coresh, J. (1996), “Adjusting survival curves for confounders:
A review and a new method,” American Journal of Epidemiology, 143, 1059–
1068.
[80] Oakley, J. E. and O’Hagan, A. (2004), “Probabilistic sensitivity analysis of
complex models: a Bayesian approach,” Journal of the Royal Statistical Society
Series B-Statistical Methodology, 66, 751–769.
[81] O’Hagan, A., Stevenson, M., and Madan, J. (2007), “Monte Carlo probabilistic
sensitivity analysis for patient level simulation models: Efficient estimation of
mean and variance using ANOVA,” Health Economics, 16, 1009–1023.
[82] Orcutt, G. H. (1957), “A New Type of Socio-Economic System,” Review of
Economics and Statistics, 39, 116–123, cgb69 Times Cited:26 Cited References
Count:4.
[83] Parmigiani, G. (2002), “Measuring uncertainty in complex decision analysis
models,” Statistical Methods in Medical Research, 11, 513–537.
[84] Pencina, M. J. and D’Agostino, R. B. (2004), “Overall C as a measure of dis-
crimination in survival analysis: model specific population value and confidence
interval estimation,” Statistics in Medicine, 23, 2109–2123.
180
[85] Peto, R. and Peto, J. (1972), “Asymptotically Efficient Rank Invariant Test
Procedures,” Journal of the Royal Statistical Society Series a-General, 135,
185–207.
[86] Plevritis, S. K., Salzman, P., Sigal, B. M., and Glynn, P. W. (2007), “A nat-
ural history model of stage progression applied to breast cancer,” Statistics in
Medicine, 26, 581–595.
[87] Plevritis, S. K., Sigal, B. M., Salzman, P., Rosenberg, J., and Glynn, P. (2006),
“Chapter 12: A Stochastic Simulation Model of U.S. Breast Cancer Mortality
Trends From 1975 to 2000,” JNCI Monographs, 2006, 86–95.
[88] Poole, D. and Raftery, A. E. (2000), “Inference for deterministic simulation
models: The Bayesian melding approach,” J Am Stat Assoc, 95, 1244–1255.
[89] Rossini, A. J., Tierney, L., and Li, N. (2007), “Simple parallel statistical com-
puting in R,” Journal of Computational and Graphical Statistics, 16, 399–420.
[90] Rutter, C. M., Miglioretti, D. L., and Savarino, J. E. (2009), “Bayesian Cali-
bration of Microsimulation Models,” J Am Stat Assoc, 104, 1338–1350.
[91] Rutter, C. M. and Savarino, J. E. (2010), “An Evidence-Based Microsimulation
Model for Colorectal Cancer: Validation and Application,” Cancer Epidemiol-
ogy Biomarkers and Prevention, 19, 1992–2002.
[92] Rutter, C. M., Zaslavsky, A. M., and Feuer, E. J. (2011), “Dynamic Microsim-
ulation Models for Health Outcomes,” Medical Decision Making, 31, 10–18.
[93] Saha-Chaudhuri, P. and Heagerty, P. J. (2013), “Non-parametric estimation of
a time-dependent predictive accuracy curve,” Biostatistics, 14, 42–59.
[94] Salomon, J. A., Weinstein, M. C., Hammitt, J. K., and Goldie, S. J. (2002),
“Empirically calibrated model of hepatitis C virus infection in the United
States,” American Journal of Epidemiology, 156, 761–773.181
[95] Santner, T. J., Williams, B. J., and Notz, W. (2003), The Design and analysis
of computer experiments, Springer series in statistics, New York: Springer.
[96] Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., and
Mansmann, U. (2009), “State of the Art in Parallel Computing with R,” Jour-
nal of Statistical Software, 31, 1–27.
[97] Schumacher, M. (1984), “2-Sample Tests of Cramer-Vonmises-Type and
Kolmogorov-Smirnov-Type for Randomly Censored-Data,” International Sta-
tistical Review, 52, 263–281.
[98] Shi, L., Tian, H., McCarthy, W., Berman, B., Wu, S., and Boer, R. (2011),
“Exploring the uncertainties of early detection results: model-based interpre-
tation of mayo lung project,” BMC Cancer, 11, 92.
[99] Siegel, R., Naishadham, D., and Jemal, A. (2012), “Cancer statistics, 2012,”
CA Cancer J Clin, 62, 10–29.
[100] Simon, R. M., Subramanian, J., Li, M. C., and Menezes, S. (2011), “Using
cross-validation to evaluate predictive accuracy of survival risk classifiers based
on high-dimensional data,” Briefings in Bioinformatics, 12, 203–214.
[101] Sonnenberg, F. A. and Beck, J. R. (1993), “Markov-Models in Medical
Decision-Making - a Practical Guide,” Medical Decision Making, 13, 322–338.
[102] Spratt, J. S. and Spratt, T. L. (1964), “Rates of Growth of Pulmonary Metas-
tases and Host Survival,” Annals of Surgery, 159, 161–171.
[103] Steel, G. G. (1977), Growth kinetics of tumours : cell population kinetics in
relation to the growth and treatment of cancer, Oxford: Clarendon Press.
[104] Stein, M. (1987), “Large Sample Properties of Simulations Using Latin Hyper-
cube Sampling,” Technometrics, 29, 143–151.
182
[105] Steyerberg, E. W., Vickers, A. J., Cook, N. R., Gerds, T., Gonen, M., Obu-
chowski, N., Pencina, M. J., and Kattan, M. W. (2010), “Assessing the Perfor-
mance of Prediction Models A Framework for Traditional and Novel Measures,”
Epidemiology, 21, 128–138.
[106] Stout, N. K., Knudsen, A. B., Kong, C. Y., McMahon, P. M., and Gazelle,
G. S. (2009), “Calibration Methods Used in Cancer Simulation Models and
Suggested Reporting Guidelines,” Pharmacoeconomics, 27, 533–545.
[107] Tan, S. Y. G. L., van Oortmarssen, G. J., de Koning, H. J., Boer, R., and
Habbema, J. D. F. (2006), “Chapter 9: The MISCAN-Fadia Continuous Tumor
Growth Model for Breast Cancer,” JNCI Monographs, 2006, 56–65.
[108] Tarone, R. E. and Ware, J. (1977), “Distribution-Free Tests for Equality of
Survival Distributions,” Biometrika, 64, 156–160.
[109] Department of Health and Human Services (2009), “Draft definition of Com-
parative Effectiveness Research for the Federal Coordinating Council,”
http://www.hhs.gov/recovery/programs/cer/draftdefinition.html.
[110] Thames, H. D., Buchholz, T. A., and Smith, C. D. (1999), “Frequency of first
metastatic events in breast cancer: Implications for sequencing of systemic and
local-regional treatment,” Journal of Clinical Oncology, 17, 2649–2658.
[111] Tierney, L. (2008), Implicit and Explicit Parallel Computing in R, Physica-
Verlag HD, chap. 4, pp. 43–51.
[112] Tunis, S. R., Benner, J., and McClellan, M. (2010), “Comparative effectiveness
research: Policy context, methods development and research infrastructure,”
Statistics in Medicine, 29, 1963–1976.
[113] Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B., and Wei, L. J. (2011), “On
183
the C-statistics for evaluating overall adequacy of risk prediction procedures
with censored survival data,” Statistics in Medicine, 30, 1105–1117.
[114] Vanni, T., Karnon, J., Madan, J., White, R. G., Edmunds, W. J., Foss, A. M.,
and Legood, R. (2011), “Calibrating models in economic evaluation: a seven-
step approach,” Pharmacoeconomics, 29, 35–49.
[115] Vanni, T., Legood, R., Franco, E. L., Villa, L. L., Luz, P. M., and Schwarts-
mann, G. (2011), “Economic evaluation of strategies for managing women with
equivocal cytological results in Brazil,” International Journal of Cancer, 129,
671–679.
[116] Wakelee, H. A., Chang, E. T., Gomez, S. L., Keegan, T. H., Feskanich, D.,
Clarke, C. A., Holmberg, L., Yong, L. C., Kolonel, L. N., Gould, M. K., and
West, D. W. (2007), “Lung cancer incidence in never smokers,” Journal of
Clinical Oncology, 25, 472–478.
[117] Welton, N. J. and Ades, A. E. (2005), “A model of toxoplasmosis incidence in
the UK: evidence synthesis and consistency of evidence,” Journal of the Royal
Statistical Society: Series C (Applied Statistics), 54, 385–404.
[118] Yamaguchi, N., Tamura, Y., Sobue, T., Akiba, S., Ohtaki, M., Baba, Y.,
Mizuno, S., and Watanabe, S. (1991), “Evaluation of Cancer Prevention Strate-
gies by Computerized Simulation Model: An Approach to Lung Cancer,” Can-
cer Causes & Control, 2, 147–155.
184