Session 15
Modelling net survival
Paul W Dickman1 and Paul C Lambert1,2
1Department of Medical Epidemiology and Biostatistics,Karolinska Institutet, Stockholm, Sweden
2Department of Health Sciences,University of Leicester, UK
Cancer survival: principles, methods and analysisLSHTM
July 2014
Overview
Outcome can be expressed as either a survival proportion ormortality rate (hazard)
The concept of adjusting for ‘time’; splitting time
Modelling cause-specific mortality
Cox proportional hazards modelPoisson regressionFlexible parametric modelStrong similarity of the approaches
Modelling excess mortality (cannot use Cox regression)
Poisson regressionFlexible parametric model
The proportional hazards assumption
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 2
Teaching style
I’ll focus on the key concepts explained via examples; usinggraphs where possible.
There are more slides than I can cover.
I chose to distribute slides with mathematical detail and extrainformation for reference.
I’ll discuss the exercises at the end of the lecture.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 3
Recap: The survivor function S(t) and
the hazard function h(t)
In survival analysis we can express the outcome in terms ofeither the survival proportion (the proportion who do notexperience the event) or the event rate.
We are assuming you are familiar with the basic concepts of thesurvivor function, S(t), and the hazard function, h(t).
We will nevertheless take some time to discuss these functions,how they are related, and their relevance for the topic of thiscourse.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 4
Relation between the survivor and hazard functions
h(t) = lim∆t→0
Pr(event in (t, t + ∆t] | alive at t)
∆t
= lim∆t→0
F (t + ∆t)− F (t)
S(t)×∆twhere F (t) = 1− S(t)
= lim∆t→0
S(t + ∆t)− S(t)
∆t× −1
S(t)
=dS(t)
dt× −1
S(t)by definition of a derivative
= − d ln S(t)
dtsince d/dx ln(f (x)) = f ′(x)/f (x)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 5
What does this mean in practice?
h(t) = − ddt
ln S(t)
In practical terms, this means that the event rate is proportionalto the rate at which the survival function decreases.
That is, if the survival function is decreasing sharply with timethen the mortality rate is high (and vice versa).
If the survival function is flat then the hazard is zero (and viceversa).
The derivative of a function at a point is the slope of the[tangent to the] curve at that point. A curve that is decreasing(like the survival function) has a negative slope, hence thenegative sign in the formula above.
Strictly, the hazard is the rate of change of ln S(t) but we canthink of it as being proportional to the rate of change of S(t).
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 6
Other relationships (for completeness)
− d log S(t)
dt= h(t)
m
S(t) = exp
(−∫ t
0
h(u) du
)= exp (−H(t))
H(t) =∫ t
0h(u) du is called the integrated hazard or cumulative
hazard.
h(t) = − d log(S(t))
dt= −S ′(t)
S(t)=
F ′(t)
1− F (t)=
f (t)
S(t)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 7
Which treatment (A or C) has the best survival?
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l Fun
ctio
n
0 .2 .4 .6 .8 1Time since treatment (years)
Treatment ATreatment C
Which treatment is associated with the best survival?
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 8
Which treatment (A or C) has the best survival?
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l Fun
ctio
n
0 1 2 3 4 5Time since treatment (years)
Treatment ATreatment C
Which treatment is associated with the best survival?
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 9
Now plot the (approximate) hazard for A
Haz
ard
Rat
e
0 1 2 3 4 5Years since diagnosis
Treatment ATreatment C
Don't worry about the scale; pay attention to if (and where) the lines crossDraw the (approximate) hazard function for treatment A
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 10
The two hazard functions
0
.1
.25
.5
1
2
4
6
Haz
ard
Rat
e
0 1 2 3 4 5Years since diagnosis
Treatment ATreatment C
Hazard function for each treatment group
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 11
What about if we further extend the follow-up?
0.0
0.2
0.4
0.6
0.8
1.0
Sur
viva
l Fun
ctio
n
0 5 10 15Time since treatment (years)
Treatment ATreatment C
Which treatment is associated with the best survival?
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 12
Hazard ratio for A vs C
.25
.5
1
2
5
20
Haz
ard
Rat
io (
A v
s C
)
0 5 10 15Years since diagnosis
True Hazard RatioHR from PH model
Hazard Ratio (A vs C)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 13
How is this relevant for cancer patient survival?
When studying cancer patient survival, hazards are very likely tobe non-proportional.
In general, if we can identify a factor associated with poorsurvival, the effect is usually greater early in the follow-up.
Jatoi et al. (2011) [1] discuss the presence of non-proportionalhazards for breast cancer (see next slide for hazards by ERstatus).
They write: ‘Similar nonproportional hazard rates are evident forlarge versus small tumors, positive versus negative lymph nodes,high versus low tumor grade, the intrinsic molecular breastcancer subtypes and the molecular prognostic signaturesOncotype DX and Mammaprint.’
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 14
Hazards by ER status, Jatoi et al. (2011) [1]
Breast Cancer Adjuvant Therapy: Time to ConsiderIts Time-Dependent EffectsIsmail Jatoi, University of Texas Health Science Center, San Antonio, TXWilliam F. Anderson, National Cancer Institute, Bethesda, MDJong-Hyeon Jeong and Carol K. Redmond, University of Pittsburgh, Pittsburgh, PA
Breast cancer is a chronic and heterogeneous disease that mayrecur many years after initial diagnosis and treatment.1 This has im-portant implications for the practicing oncologist. For instance, anearly effect of adjuvant treatment may diminish over time after cessa-tion of therapy, or, alternatively, there may exist a lag time before sometreatment effects become pronounced. Indeed, the risk of breast can-cer recurrence and death (hazard rate) varies over time (ie, is nonpro-portional) according to prognostic and predictive factors (Figs 1 and 2;Table 1).6,13 The hazard curve for breast cancer death peaks between 2and 3 years after initial diagnosis and then declines sharply, suggestingthat the biologic mechanisms responsible for early and late cancer-specific events are fundamentally different. Thus the early and lateeffects of adjuvant therapy may vary accordingly.
For example, Figure 1 shows the annual hazard rates for breastcancer deaths (percent per year) after initial diagnosis among womenin the National Cancer Institute’s Surveillance, Epidemiology, andEnd Results 13 Registries database.2 The average annual rate of breastcancer deaths is nonproportional overall and by estrogen receptor(ER) expression.14 Thus the annual hazard rate for all cases peaks near3% per year between the second and third years after diagnosis andthen declines to 1% to 2% per year by the sixth through eighth years.The hazard rates for ER-negative and ER-positive tumors peak atapproximately 6.5% and 2% per year, respectively, between the firstand third years (ie, � three-fold difference). Notably, ER-negative toER-positive hazard rates cross between the seventh and eighth years,after which women with ER-negative tumors have a lower rate ofbreast cancer death than those with ER-positive tumors. Table 1 fur-ther shows the fold difference for ER-negative compared with ER-positive tumors over time. ER-negative to ER-positive hazard ratios(HR) were more than 1.0 before the eighth year, after which HRs wereless than 1.0.
Similar nonproportional hazard rates are evident for largeversus small tumors, positive versus negative lymph nodes, highversus low tumor grade,13 the intrinsic molecular breast cancersubtypes,6,8 and the molecular prognostic signatures OncotypeDX12 and Mammaprint9-11 (Fig 2). Thus hazard rates for relapseamong high-risk tumors (eg, nonluminal A, Mammaprint poor sig-nature, and Oncotype high-risk score) show a sharp peak soon afterinitial diagnosis, similar to ER-negative cancers (Fig 1). Conversely,hazard rates for low-risk tumors (eg, luminal A, Mammaprint goodsignature, and Oncotype low- and intermediate-risk score) lack a
sharp peak, similar to ER-positive tumors. These hazard curves sug-gest that the biologic mechanisms responsible for early and late breastcancer events differ and may therefore respond differently to thesame treatment.
0
Annu
al H
azar
d Ra
te fo
r Bre
ast
Canc
er D
eath
(%)
Time After Initial Breast CancerDiagnosis (years)
7
6
4
5
2
3
1
2 4 6 8 10 12
All cases
ER negative
ER positive
Fig 1. Annual hazard rates for breast cancer death and ER-negative to ER-positive hazard ratios (Table 1) using the National Cancer Institute’s Surveillance,Epidemiology, and End Results 13 Registries Databases (1992 to 2007) forinvasive female breast cancer.2 Annual hazard rates for breast cancer deathoverall (all cases combined, n � 401,693), estrogen receptor (ER) –negative(n � 74,567), and ER-positive (n � 257,426) breast cancers. The annual hazardrate for cancer-specific death describes the instantaneous rate of dying fromcancer in a specified time interval after initial cancer diagnosis. Hazard rate curveswere modeled using cubic splines with join-points selected by Akaike’s informa-tion criteria3,4; 95% CIs were applied with bootstrap resampling techniques.5
Under the null hypothesis of no interaction over time, annual hazard rates forER-positive and ER-negative breast cancers would be proportional (or similar)with follow-up after initial breast cancer diagnosis. The overall rate of breastcancer death for all cases peaks near 3% per year between the second and thirdyears after initial breast cancer diagnosis and then declines to 1% to 2% per yearby the sixth through eighth years. The annual hazard rates for women withER-negative and ER-positive tumors demonstrate peaks of approximately 6.5%and 2% near the first through third years after initial breast cancer diagnosis,respectively (� three-fold difference). An ER-negative to ER-positive hazard ratecross-over occurs between the seventh and eighth years after breast cancerdiagnosis, and then women with ER-negative tumors had a somewhatparadoxically lower rate of breast cancer death than those with ER-positivebreast cancers.
JOURNAL OF CLINICAL ONCOLOGY COMMENTS AND CONTROVERSIES
VOLUME 29 � NUMBER 17 � JUNE 10 2011
© 2011 by American Society of Clinical Oncology 2301Journal of Clinical Oncology, Vol 29, No 17 (June 10), 2011: pp 2301-2304
Downloaded from jco.ascopubs.org on June 21, 2011. For personal use only. No other uses without permission.Copyright © 2011 American Society of Clinical Oncology. All rights reserved.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 15
‘The hazards of hazard ratios’ (Hernan 2010)
Hernan 2010 [2] presents an interesting discussion on ‘thehazards of hazard ratios’ where he argues that hazard ratios‘have a built-in selection bias’.
Hernan argues that our population will comprise both susceptibleand non-susceptible individuals.
When exposed, the susceptible individuals will experience theevent sooner than if they were unexposed resulting in a lowerproportion of susceptible individuals remaining at risk in theexposed relative to the unexposed group as time progresses (anda decreasing hazard ratio).
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 16
‘The hazards of hazard ratios’ (Hernan 2010)
The Women’s Health Initiative [3] followed over 16,000 womenfor an average of 5.2 years to study the association betweenHRT and CHD. Halted due to safety concerns.
‘Combined hormone therapy was associated with a hazard ratiofor CHD of 1.24.’ [from the article abstract]
HRs during each year of follow-up: 1.81, 1.34, 1.27, 1.25, 1.45,and 0.70 for years 1, 2, 3, 4, 5, and 6+, respectively. [Table 2]
The average HR in the WHI would have been 1.8 if the studyhad been halted after 1 year of follow-up, 1.7 after 2 years, and1.2 after 5 years.
The 24% increase in the rate of coronary heart disease thatmany researchers and journalists consider as the effect ofcombined hormone therapy is the result of the arbitrary choice ofan average follow-up period of 5.2 years.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 17
Overview of approaches to modelling prognosis
Modelling cause-specific mortalityCox proportional hazards model
Poisson regression
Parametric survival models
Flexible parametric models
Similarity of these approaches
Modelling excess mortality (cannot use Cox regression)
Poisson regression
Flexible parametric models
Analogues to the Cox model [4, 5]
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 18
Example: survival of patients diagnosed with colon
carcinoma in Finland
Patients diagnosed with colon carcinoma in Finland 1984–95.Potential follow-up to end of 1995; censored after 10 years.
Outcome is death due to colon carcinoma.
Interest is in the effect of clinical stage at diagnosis (distantmetastases vs no distant metastases).
How might we specify a statistical model for these data?
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 19
0.4
.81.
21.
6E
mpi
rical
haz
ard
0 2 4 6 8 10Years since diagnosis
Not distantDistant
sts graph, by(distant) hazard kernel(epan2)Smoothed empirical hazards (cancer-specific mortality rates)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 20
0.4
.81.
21.
6E
mpi
rical
haz
ard
0 2 4 6 8 10Years since diagnosis
Not distantDistant
sts graph, by(distant) hazardSmoothed empirical hazards (with default smoother)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 21
The Cox proportional hazards model
The ‘intercept’ in the Cox model [6], the hazard (event rate) forindividuals with all covariates x at the reference level, can bethought of as an arbitrary function of time1, often called thebaseline hazard and denoted by h0(t).
The hazard at time t for individual with other covariate values isa multiple of the baseline
h(t|x) = h0(t) exp(xβ).
Alternativelyln[h(t|x)] = ln[h0(t)] + xβ.
Does not explicitly estimate h0(t) while estimating the loghazard ratios (β).
1time t can be defined in many ways, e.g., attained age, time-on-study,calendar time, etc.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 22
0.0
5.1
.2.4
.81.
6E
mpi
rical
haz
ard
0 2 4 6 8 10Years since diagnosis
Not distantDistant
sts graph, by(distant) hazard kernel(epan2) yscale(log)Smoothed empirical hazards on log scale
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 23
Fit a Cox model to estimate the mortality rate ratio
. stcox distant
failure _d: status == 1
analysis time _t: (exit-origin)/365.25
origin: time dx
note: time>10 trimmed
Cox regression -- Breslow method for ties
No. of subjects = 13208 Number of obs = 13208
No. of failures = 7122
Time at risk = 44013.26215
LR chi2(1) = 5544.65
Log likelihood = -61651.446 Prob > chi2 = 0.0000
--------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% C.I.]
--------+-----------------------------------------------------
distant | 6.557777 .1689328 73.00 0.000 6.235 6.897
--------------------------------------------------------------
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 24
Hazard ratio: 6.56
0.4
.81.
21.
6F
itted
haz
ard
0 2 4 6 8 10Years since diagnosis
Not distantDistant
stcurve, hazard at1(distant=0) at2(distant=1) kernel(epan2)Fitted hazards from Cox model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 25
0.0
5.1
.2.4
.81.
6F
itted
haz
ard
0 2 4 6 8 10Years since diagnosis
Not distantDistant
stcurve, hazard at1(distant=0) at2(distant=1) kernel(epan2) yscale(log)Fitted hazards (on log scale) from Cox model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 26
0.4
.81.
21.
62
Em
piric
al h
azar
d
0 2 4 6 8 10Years since diagnosis
Young, Not distantYoung, DistantOld, Not distantOld, Distant
sts graph, by(agestage) hazard kernel(epan2)Smoothed empirical hazards by age and stage
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 27
Fit a Cox model adjusted for age at diagnosis
. stcox distant old
failure _d: status == 1
analysis time _t: (exit-origin)/365.25
origin: time dx
note: time>10 trimmed
Cox regression -- Breslow method for ties
No. of subjects = 13208 Number of obs = 13208
No. of failures = 7122
Time at risk = 44013.26215
LR chi2(2) = 5778.91
Log likelihood = -61534.317 Prob > chi2 = 0.0000
----------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------+-------------------------------------------------------------
distant | 6.65287 .1716121 73.47 0.000 6.324877 6.997871
old | 1.463653 .0358098 15.57 0.000 1.395124 1.535549
----------------------------------------------------------------------
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 28
0.4
.81.
21.
62
Fitt
ed h
azar
d
0 2 4 6 8 10Years since diagnosis
Young, Not distantYoung, DistantOld, Not distantOld, Distant
Fitted hazards from Cox model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 29
Hazard ratio: 6.56
Using default (Breslow) method for ties
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from Cox model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 30
Hazard ratio: 6.64
stcox distant, efron
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from Cox model with Efron method for ties
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 31
Hazard ratio: 10.04
Hazard RatiosCox: 6.64
Exponential: 10.04
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from parametric survival model (exponential)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 32
Hazard ratio: 7.41
Hazard RatiosCox: 6.64
Exponential: 10.04Weibull: 7.41
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from parametric survival model (Weibull)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 33
.51
1.5
2H
azar
d
2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from parametric survival model (Weibull)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 34
12
34
Cum
ulat
ive
Haz
ard
2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted cumulative hazards from Weibull model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 35
Hazard ratio: 6.89
Hazard RatiosCox: 6.64
Exponential: 10.04Weibull: 7.41
Poisson (annual): 6.89
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from Poisson model (yearly intervals)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 36
Time as a confounder
When the rate changes with time then time may confound theeffect of exposure.
We will, for the moment, assume that the rates are constantwithin broad time bands but can change from band to band.
This approach (categorising a metric variable and assuming theeffect is constant within each category) is standard inepidemiology.
We often categorise metric variables — the only difference hereis that the variable is ‘time’.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 37
What are the failure rates for each band?
Consider a group of subjects with rates λ1 during band 1, λ2 duringband 2, etc.
0 5 10 15
Time (years)
5 5 2
5 4
3 u
u
Subject 1
Subject 2
Subject 3
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 38
Splitting the records by follow-up time
In software, we split the observation for each subject into oneobservation for each timeband.
subject timeband follow-up failure1 0-5 3 12 0-5 5 02 5-10 4 03 0-5 5 03 5-10 5 03 10-15 2 1
The rate for timeband 0-5 is then 1/(3+5+5), and so on forother time bands.
This method can be used whether rates are varying simply as afunction of time or in response to some time–varying exposure.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 39
Let’s see how we split person-time using Stata
The original data.
. list
+----------------------------+
| subject survtime event |
|----------------------------|
1. | 1 3 1 |
2. | 2 9 0 |
3. | 3 12 1 |
+----------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 40
Stata internal variables created by stset
. stset survtime, fail(event) id(subject)
. list subject survtime event _t0 _t _d
+--------------------------------------------+
| subject survtime event _t0 _t _d |
|--------------------------------------------|
1. | 1 3 1 0 3 1 |
2. | 2 9 0 0 9 0 |
3. | 3 12 1 0 12 1 |
+--------------------------------------------+
stset creates the following internal variables.t0 time at entryt time at exitd failure indicatorst inclusion indicator
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 41
Now we split person-time with stsplit
. stsplit timeband, at(0(5)15)
(3 observations (episodes) created)
. list subject timeband survtime event _t0 _t _d
----------------------------------------------------+
subject timeband survtime event _t0 _t _d
-----------------------------------------------------
1 0 3 1 0 3 1
2 0 5 . 0 5 0
2 5 9 0 5 9 0
3 0 5 . 0 5 0
3 5 10 . 5 10 0
-----------------------------------------------------
3 10 12 1 10 12 1
----------------------------------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 42
We can now tabulate rates by timeband
. strate timeband
failure _d: event
analysis time _t: survtime
id: subject
Estimated rates and 95% confidence intervals
(6 records included in the analysis)
+---------------------------------------------------------+
| timeband D Y Rate Lower Upper |
|---------------------------------------------------------|
| 0 1 13.0000 0.076923 0.010836 0.546082 |
| 5 0 9.0000 0.000000 . . |
| 10 1 2.0000 0.500000 0.070432 3.549536 |
+---------------------------------------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 43
Splitting is a very powerful tool
Not just for Poisson regression. Splitting is used together with,for example, Cox regression for:
Multiple timescalesTime-varying covariatesModelling non-proportional hazards
Splitting is used when applying multi-state models or competingrisks analysis.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 44
Splitting in SAS and R
Stata is the only software we are aware of in which this powerfultool (i.e., splitting person-time) comes standard.
Can also split at, for example, dates of intervention or dates atwhich exposure otherwise changes.
Several user-written SAS macros exist;I use the lexis macro written by Bendix Carstensen(http://staff.pubhealth.ku.dk/~bxc/Lexis/).This macro has been tried and tested over 10 years.
Use the Epi package in R.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 45
Splitting on time since diagnosis for the colon data
. stsplit fu, at(0(1)10)
(37458 observations (episodes) created)
The variable fu (follow-up) will be created. We can name thisanything we like.
We now have a separate observation for each individual for eachyear of follow-up.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 46
The split dataset
. list id fu _t0 _t _d in 1/10, sepby(id)
+----------------------------+
| id fu _t0 _t _d |
|----------------------------|
1. | 1 0 0 1 0 |
2. | 1 1 1 1.375 1 |
|----------------------------|
3. | 2 0 0 1 0 |
4. | 2 1 1 2 0 |
5. | 2 2 2 3 0 |
6. | 2 3 3 4 0 |
7. | 2 4 4 5 0 |
8. | 2 5 5 6 0 |
9. | 2 6 6 6.875 0 |
|----------------------------|
10. | 3 0 0 .125 1 |
+----------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 47
Rates for each time band
. strate fu, per(1000)
Estimated rates (per 1000) and lower/upper bounds of 95% CI
+------------------------------------------------------+
| fu D Y Rate Lower Upper |
|------------------------------------------------------|
| 0 4223 10.3339 408.6533 396.5122 421.1662 |
| 1 1444 7.4190 194.6351 184.8507 204.9373 |
| 2 597 5.7934 103.0487 95.1054 111.6555 |
| 3 342 4.7379 72.1834 64.9246 80.2537 |
| 4 227 3.9301 57.7599 50.7143 65.7844 |
|------------------------------------------------------|
| 5 130 3.2741 39.7051 33.4342 47.1522 |
| 6 78 2.7282 28.5906 22.9004 35.6946 |
| 7 33 2.2848 14.4430 10.2679 20.3158 |
| 8 22 1.9273 11.4152 7.5163 17.3365 |
| 9 26 1.5845 16.4085 11.1721 24.0993 |
+------------------------------------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 48
Now estimate the effect of distant metastases
while controlling for time since diagnosis
. streg i.fu distant, dist(exp)
-------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
----------+--------------------------------------------------------------
fu 0 | 1 (base)
1 | .6636731 .0204396 -13.31 0.000 .6247973 .7049677
2 | .4041937 .0178801 -20.48 0.000 .3706255 .4408022
3 | .3008889 .0170795 -21.16 0.000 .2692087 .3362972
4 | .251201 .0172525 -20.12 0.000 .2195639 .2873967
5 | .1754712 .015704 -19.45 0.000 .1472403 .2091151
6 | .1267095 .014524 -18.02 0.000 .101214 .1586273
7 | .0635113 .0111133 -15.75 0.000 .0450719 .0894943
8 | .0506048 .0108267 -13.95 0.000 .0332721 .0769667
9 | .0732248 .0144203 -13.27 0.000 .049777 .1077177
|
distant | 6.890536 .1758401 75.64 0.000 6.554372 7.24394
_cons | .1523781 .0036926 -77.64 0.000 .1453099 .1597901
-------------------------------------------------------------------------
What does the estimate labelled _cons represent?Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 49
Now that we have adjusted for time
Now that we have adjusted for time since diagnosis theestimated rate ratio is similar to that obtained from Coxregression.
Cox and Poisson regression are extremely similar - the onlydifference is that with Poisson regression we categorise time intopre-specified intervals and model the effect of time as a stepfunction (see next slide) whereas in Cox regression we effectivelymodel time as a continuous function.
There is an analogue to the actuarial and Kaplan-Meier methodsfor estimating the survivor function; the actuarial approach usespre-specified intervals whereas the Kaplan-Meier method treatstime as continuous.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 50
Fitted values for the model adjusted for time
Hazard ratio: 6.89
0.152 [exp(_cons)]
0.152*6.89
0.152*6.89*0.66
0.1
.2.3
.4.5
.6.7
.8.9
11.
1H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 51
We can make Poisson regression more similar, and
even equivalent to, Cox regression
The actuarial method with time classified as narrowly as possibleis equivalent to the Kaplan-Meier method (in the absence oftimes where both events and censoring occur).
Similarly, we can make Poisson regression more similar to Coxregression by using a larger number of smaller intervals.
If we split at each event time then the estimates from Poissonregression are equivalent to Cox regression.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 52
Demography and epidemiology:Practical use of the Lexis diagram inthe computer age.
or:
Who needs the Cox-model anyway?
Annual meeting of Finnish Statistical Society23–24 May 2005Revised December 2005.
Bendix CarstensenSteno Diabetes Center, Gentofte, Denmark& Department of Biostatistics, University of Copenhagen
www.biostat.ku.dk/~bxc
The contents of this paper was presented at the meeting of the Finnish Statistical Society in May 2005
in Oulu. The slides presented can be found on my homepage as
http://staff.pubhealth.ku.dk/~bxc/Talks/Oulu.pdf.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 53
Hazard ratio: 6.65
Hazard RatiosCox: 6.64
Exponential: 10.04Weibull: 7.41
Poisson (annual): 6.89Poisson (quarter): 6.65
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from Poisson model (3-months)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 54
Hazard ratio: 6.64
Hazard RatiosCox: 6.64
Exponential: 10.04Weibull: 7.41
Poisson (annual): 6.89Poisson (quarter): 6.65Poisson (months): 6.64
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from Poisson model (months)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 55
Modelling continuous variables (intro to splines)
If we are not happy with modelling, for example, age as a lineareffect we might create the variable age2=age^2 and includeboth variables age and age2 in the model.
We would then be modelling age as a quadratic polynomial andusing 2 degrees of freedom (df).
Could add age3=age^3 to model age as a cubic polynomial.
Alternatively, we might create dummy variables to model age asa step function.
Either way, we are creating a series of variables with which tomodel the effect of age.
Modelling with splines also involves creating a series of variables.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 56
What are splines?
Flexible mathematical functions defined by piecewisepolynomials.
The points at which the polynomials join are called knots.
Constraints ensure the function is smooth.
The most common splines used in practice are cubic splines.
However, splines can be of any degree, n.
Function is forced to have continuous 0th, 1st and 2nd
derivatives.
Regression splines can be incorporated into any regression modelwith a linear predictor.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 57
Using splines to estimate non-linear functions.
25
50
100
150
200
Mor
talit
y R
ate
(100
0 py
's)
0 1 2 3 4 5Years from Diagnosis
Interval Length: 1 week
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 58
No continuity corrections
25
50
100
150
200
Mor
talit
y R
ate
(100
0 py
's)
0 1 2 3 4 5Years from Diagnosis
No Constraints
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 59
Function forced to join at knots
25
50
100
150
200
Mor
talit
y R
ate
(100
0 py
's)
0 1 2 3 4 5Years from Diagnosis
Forced to Join at Knots
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 60
Continuous first derivative
25
50
100
150
200
Mor
talit
y R
ate
(100
0 py
's)
0 1 2 3 4 5Years from Diagnosis
Continuous 1st Derivatives
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 61
Continuous second derivative
25
50
100
150
200
Mor
talit
y R
ate
(100
0 py
's)
0 1 2 3 4 5Years from Diagnosis
Continuous 2nd Derivatives
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 62
Restricted cubic splines
Cubic splines can behave poorly in the tails.
Extension is restricted cubic splines[7] .
Forced to be linear before the first knot and after the final knot.
This is where there is often less data and standard cubic splinestend to be sensitive to a few extreme values.
For same number of knots needs 4 fewer parameters than cubicsplines.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 63
Hazard ratio: 6.65
Hazard RatiosCox: 6.64
Exponential: 10.04Weibull: 7.41
Poisson (annual): 6.89Poisson (quarter): 6.65Poisson (months): 6.64
Poisson (spline): 6.65
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from Poisson model (rcs 5df)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 64
Hazard ratio: 6.63
Hazard RatiosCox: 6.64
Exponential: 10.04Weibull: 7.41
Poisson (annual): 6.89Poisson (quarter): 6.65Poisson (months): 6.64
Poisson (spline): 6.65Flexible parametric: 6.63
0.4
.81.
21.
6H
azar
d
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted hazards from flexible parametric model (5df)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 65
Fine splitting example:
England and Wales Breast Cancer
stset survtime, failure(dead==1) exit(time 5) id(ident)
stsplit sp_time, every(‘=1/52.18´)
generate risktime = _t - _t0
collapse (min) start=_t0 (max) end=_t (count) n=_d ///
(sum) risktime _d, by(dep5 sp_time)
Leads to about 2.25 million rows before collapsing.
522 rows after collapsing.
We will compare mortality among the most deprived (category5) to the least deprived (category 1) with the other categoriesexcluded.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 66
Fitting a Poisson model with splines
Poisson model with restricted cubic splines
. gen midtime = (start + end)/2
. gen lntime = ln(midtime)
. rcsgen lntime, df(3) gen(rcs) fw(_d) orthogVariables rcs1 to rcs3 were created. glm _d rcs* dep5, family(poisson) lnoffset(risktime) nolog eformGeneralized linear models No. of obs = 522Optimization : ML Residual df = 517
Scale parameter = 1Deviance = 589.4507401 (1/df) Deviance = 1.140137Pearson = 565.6616566 (1/df) Pearson = 1.094123Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.513367Log likelihood = -1172.988905 BIC = -2645.763
OIM_d IRR Std. Err. z P>|z| [95% Conf. Interval]
rcs1 1.033444 .0203234 1.67 0.094 .9943687 1.074055rcs2 1.066464 .0202456 3.39 0.001 1.027513 1.106892rcs3 1.174232 .0229925 8.20 0.000 1.130021 1.220172dep5 1.309601 .0513388 6.88 0.000 1.212747 1.414189
risktime (exposure)
. estimates store pois_rcs_ph
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 67
Interaction with time
Poisson model with time-dependent effects
. glm _d i.dep5##c.rcs* , family(poisson) lnoffset(risktime) nologGeneralized linear models No. of obs = 522Optimization : ML Residual df = 514
Scale parameter = 1Deviance = 571.3748157 (1/df) Deviance = 1.111624Pearson = 549.7399366 (1/df) Pearson = 1.069533Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]
AIC = 4.490233Log likelihood = -1163.950943 BIC = -2645.066
OIM_d Coef. Std. Err. z P>|z| [95% Conf. Interval]
1.dep5 .2460437 .0407188 6.04 0.000 .1662363 .325851rcs1 .0974977 .0258872 3.77 0.000 .0467597 .1482358rcs2 .0568394 .0252797 2.25 0.025 .0072921 .1063867rcs3 .1486538 .024408 6.09 0.000 .100815 .1964926
dep5#c.rcs11 -.1711919 .0407214 -4.20 0.000 -.2510044 -.0913795
dep5#c.rcs21 .0377089 .0392366 0.96 0.337 -.0391935 .1146113
dep5#c.rcs31 .0389086 .0413411 0.94 0.347 -.0421186 .1199357
_cons -2.76561 .0237546 -116.42 0.000 -2.812168 -2.719052risktime (exposure)
. estimates store rcs_tvc_df3
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 68
Predicted hazard (mortality) rates
50
100
150
Mor
talit
y R
ate
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Time from Diagnosis (years)
Least Deprived (PH)Least Deprived (TD)Most Deprived (PH)Most Deprived (TD)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 69
Flexible Parametric Survival Models [8, 10, 11]
First introduced by Royston and Parmar (2002) [8].
Parametric estimate of the baseline hazard without the usualrestrictions on the shape (i.e, flexible).
Applicable for ‘standard’ and relative survival models.
Can fit relative survival cure models (Andersson 2011) [9].
Once we have a parametric expression for the baseline hazard wederive other quantities of interest (e.g., survival, hazard ratio,hazard differences, expectation of life).
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 70
The Cox model[6]
hi(t|xi , β) = h0(t) exp (xiβ)
Advantage: The baseline hazard, h0(t) is not directly estimatedfrom a Cox model.
Disadvantage: The baseline hazard, h0(t) is not directlyestimated from a Cox model.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 71
Quote from Sir David Cox (Reid 1994 [12])
Reid “What do you think of the cottage industry that’s grown uparound [the Cox model]?”
Cox “In the light of further results one knows since, I think Iwould normally want to tackle the problem parametrically.. . . I’m not keen on non-parametric formulations normally.”
Reid “So if you had a set of censored survival data today, youmight rather fit a parametric model, even though there wasa feeling among the medical statisticians that that wasn’tquite right.”
Cox “That’s right, but since then various people have shown thatthe answers are very insensitive to the parametricformulation of the underlying distribution. And if you wantto do things like predict the outcome for a particular patient,it’s much more convenient to do that parametrically.”
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 72
Flexible Parametric Models: Basic Idea
Consider a Weibull survival curve.
S(t) = exp (−λtγ)
If we transform to the log cumulative hazard scale.
ln [H(t)] = ln[− ln(S(t))]
ln [H(t)] = ln(λ) + γ ln(t)
This is a linear function of ln(t)Introducing covariates gives
ln [H(t|xi)] = ln(λ) + γ ln(t) + xiβ
Rather than assuming linearity with ln(t) flexible parametricmodels use restricted cubic splines for ln(t).
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 73
12
34
Cum
ulat
ive
Haz
ard
2 4 6 8 10Years since diagnosis
Not distantDistant
Fitted cumulative hazards from Weibull model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 74
Flexible Parametric Models: Incorporating Splines
We thus model on the log cumulative hazard scale.
ln[H(t|xi)] = ln [H0(t)] + xiβ
This is a proportional hazards model.Restricted cubic splines with knots, k0, are used to model thelog baseline cumulative hazard.
ln[H(t|xi)] = ηi = s (ln(t)|γ, k0) + xiβ
For example, with 4 knots we can write
ln [H(t|xi)] = ηi = γ0 + γ1z1i + γ2z2i + γ3z3i︸ ︷︷ ︸log baseline
cumulative hazard
+ xiβ︸︷︷︸log hazard
ratios
We are fitting a linear predictor on the log cumulative hazardscale.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 75
Survival and Hazard Functions
We can transform to the survival scale
S(t|xi) = exp(− exp(ηi))
The hazard function is a bit more complex.
h(t|xi) =ds (ln(t)|γ, k0)
dtexp(ηi)
This involves the derivatives of the restricted cubic splinesfunctions, although these are relatively easy to calculate.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 76
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 77
Sensitivity to choice of knots;
Simulation study by Rutherford et al. [13]
‘Through the use of simulation, we show that provided asufficient number of knots are used, the approximated hazardfunctions given by restricted cubic splines fit closely to the truefunction for a range of complex hazard shapes.’
‘The simulation results also highlight the insensitivity of theestimated relative effects (hazard ratios) to the correctspecification of the baseline hazard.’
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 80
Simulation Study (Rutherford et al.) [13]
Generate data assuming a mixture Weibull distribution.
0.0
0.5
1.0
1.5
2.0
2.5
Haz
ard
rate
0 2 4 6 8 10Time Since Diagnosis (Years)
Scenario 1
0.0
0.5
1.0
1.5
2.0
2.5
Haz
ard
rate
0 2 4 6 8 10Time Since Diagnosis (Years)
Scenario 2
0.0
0.5
1.0
1.5
2.0
2.5
Haz
ard
rate
0 2 4 6 8 10Time Since Diagnosis (Years)
Scenario 3
0.0
0.5
1.0
1.5
2.0
2.5
Haz
ard
rate
0 2 4 6 8 10Time Since Diagnosis (Years)
Scenario 4
Fit models using restricted cubic splines.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 81
Scenario 3 comparison of Log Hazard Ratios
-.6
-.55
-.5
-.45
-.4
Cox
Mod
el
-.6 -.55 -.5 -.45 -.4Flexible Parametric Model
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 82
Choice of knots: Scenario 3
0.0
0.2
0.4
0.6
0.8
1.0
S(t
)
0 2 4 6 8 10
Survival Function
0.0
0.4
0.8
1.2
1.6
h(t)
0 2 4 6 8 10
Hazard Function
Time since diagnosis (years)
8 knots (7 df)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 83
Model Selection
Estimated hazard and survival functions fairly insensitive to knotlocation.
AIC and BIC can be used as rough guides to choose models.
Not crucial (within reason) to inference based on the model.
We often present a sensitivity analysis to show this.
Could treat number of knots and their locations as unknowns.
However, it is an area where more work is still required.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 84
Implementation in Stata [10]
stpm2 available from SSC. ssc install stpm2
All cause survival. stpm2 eng, scale(hazard) df(5)
Relative survival. stpm2 eng, scale(hazard) df(5) hazard(rate)
Time-dependent effects. stpm2 eng, scale(hazard) df(5) hazard(rate) tvc(eng) dftvc(3)
Cure model. stpm2 eng, scale(hazard) df(5) hazard(rate) tvc(eng) dftvc(3) cure
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 85
Fitting a proportional hazards model
Example: 24,889 women aged under 50 diagnosed with breastcancer in England and Wales 1986-1990.
Compare five deprivation groups from most affluent to mostdeprived.
No information on cause of death, but given their age, mostwomen who die will die of their breast cancer.
Proportional hazards models. stcox dep2-dep5,
. stpm2 dep2-dep5, df(5) scale(hazard) eform
The df(5) option implies using 4 internal knots and 2 boundaryknots at their default locations.
The scale(hazard) requests the model to be fitted on the logcumulative hazard scale.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 86
Cox Model
Cox proportional hazards model
. stcox dep2-dep5,failure _d: dead == 1
analysis time _t: survtimeexit on or before: time 5
Iteration 0: log likelihood = -73334.091Iteration 1: log likelihood = -73303.081Iteration 2: log likelihood = -73302.997Iteration 3: log likelihood = -73302.997Refining estimates:Iteration 0: log likelihood = -73302.997Cox regression -- Breslow method for tiesNo. of subjects = 24889 Number of obs = 24889No. of failures = 7366Time at risk = 104638.953
LR chi2(4) = 62.19Log likelihood = -73302.997 Prob > chi2 = 0.0000
_t Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
dep2 1.048716 .0353999 1.41 0.159 .9815786 1.120445dep3 1.10618 .0383344 2.91 0.004 1.03354 1.183924dep4 1.212892 .0437501 5.35 0.000 1.130104 1.301744dep5 1.309478 .0513313 6.88 0.000 1.212638 1.414051
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 87
Flexible parametric proportional hazards model
Flexible Parametric Proportional Hazards Model
. stpm2 dep2-dep5, df(5) scale(hazard) eformIteration 0: log likelihood = -22507.096Iteration 1: log likelihood = -22502.639Iteration 2: log likelihood = -22502.633Iteration 3: log likelihood = -22502.633Log likelihood = -22502.633 Number of obs = 24889
exp(b) Std. Err. z P>|z| [95% Conf. Interval]
xbdep2 1.048752 .0354011 1.41 0.158 .9816125 1.120483dep3 1.10615 .0383334 2.91 0.004 1.033513 1.183893dep4 1.212872 .0437493 5.35 0.000 1.130085 1.301722dep5 1.309479 .0513313 6.88 0.000 1.212639 1.414052_rcs1 2.126897 .0203615 78.83 0.000 2.087361 2.167182_rcs2 .9812977 .0074041 -2.50 0.012 .9668927 .9959173_rcs3 1.057255 .0043746 13.46 0.000 1.048715 1.065863_rcs4 1.005372 .0020877 2.58 0.010 1.001288 1.009472_rcs5 1.002216 .0010203 2.17 0.030 1.000218 1.004218
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 88
Proportional hazards models
The hazard ratios and 95% confidence intervals are very similar.
I have yet to find an example of a proportional hazards model,where there is a large difference in the estimated hazard ratios.
If you are just interested in hazard ratios in a proportionalhazards model, then you can get away with poor modelling ofthe baseline hazard.
One important exception is when the follow-up time differsbetween groups.
It is of course better to model the baseline hazard well!
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 89
Simple predictions
To predict the survival and hazard functions use the folllowing
The predict command. predict survpred, survival
. predict hazpred, hazard
To estimate confidence intervals use the ci option.
To predict for particular covariate patterns use the at() option.
The at() option. predict haz_male_age50, hazard ci at(male 1 age 50)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 90
Simple predictions 2
The zeros option sets the values of all covariates, other thanthose specified in the the at() option, to zero. For example thebaseline survival function can be estimates using.
The zeros option. predict surv_baseline, survival ci zeros
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 91
Log cumulative hazard
−8
−6
−4
−2
0
Pre
dict
ed lo
g cu
mul
ativ
e ha
zard
0 1 2 3 4 5Time from Diagnosis (years)
Least Deprived234Most Deprived
Deprivation Group
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 92
Log Cumulative Hazard vs log(time)
−8
−6
−4
−2
0
Pre
dict
ed lo
g cu
mul
ativ
e ha
zard
1 2 3 4 5Time from Diagnosis (years)
Least Deprived234Most Deprived
Deprivation Group
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 93
Survival Function
.6
.7
.8
.9
1
Pro
port
ion
Aliv
e
0 1 2 3 4 5Time from Diagnosis (years)
Least Deprived234Most Deprived
Deprivation Group
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 94
Hazard Function ×1000
0
25
50
75
100
125
150
Pre
dict
ed M
orta
lity
Rat
e (p
er 1
000
py)
0 1 2 3 4 5Time from Diagnosis (years)
Least Deprived234Most Deprived
Deprivation Group
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 95
Useful predictions
A key advantage of using a parametric model over the Coxmodel is that we can transform the model parameters to expressdifferences between groups in different ways.The hazard ratio is a relative measure and a greaterunderstanding of the impact of an exposure can be obtained byalso looking at absolute differences.For two covariate patterns, x1 and x2 we can obtain
Differences in hazard rates
h(t|x1)− h(t|x2)
Differences in survival functions
S(t|x1)− S(t|x2)
Use the delta-method to calculate confidence intervals.Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 96
Difference in hazard functions
. predict hdiff, hdiff1(dep5 1) hdiff2(dep5 0) ci
0
50
100
150
200
Diff
eren
ce in
mor
talit
y ra
te (
per
1000
per
son
year
s)
0 1 2 3 4 5Time from Diagnosis (years)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 97
Predicted survival functions
0.6
0.7
0.8
0.9
1.0
Pro
port
ion
Aliv
e
0 1 2 3 4 5Time from Diagnosis (years)
Least DeprivedMost Deprived
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 98
Difference in survival proportions
. predict sdiff, sdiff1(dep5 1) sdiff2(dep5 0) ci
−0.10
−0.08
−0.06
−0.04
−0.02
0.00
0.02
Diff
eren
ce in
Sur
viva
l Cur
ves
0 1 2 3 4 5Time from Diagnosis (years)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 99
Modelling excess mortality (relative survival)
Instead of cause-specific mortality we estimate excess mortality:the difference between observed (all-cause) and expectedmortality.
excess = observed − expectedmortality mortality mortality
Relative survival is the survival analog of excess mortality.
Both cause-specific survival and relative survival estimate (underassumptions) the same underlying quantity (net survival) andthe estimates should be similar.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 100
Modelling excess mortality using a step function
for the effect of time
The hazard at time since diagnosis t for persons diagnosed withcancer, h(t), is modelled as the sum of the known baselinehazard, h∗(t), and the excess hazard due to a diagnosis ofcancer, λ(t) [14, 15, 16, 17, 18].
h(t) = h∗(t) + λ(t)
Follow-up time is partitioned into bands corresponding to lifetable intervals and indicator variables included in the designmatrix. The model is written as
h(x) = h∗(x) + exp(xβ) (1)
orln [h(x)− h∗(x)] = xβ.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 101
The proportional excess hazards model
ln [h(x)− h∗(x)] = xβ.
The excess hazard is additive to the expected hazard, but weassume the excess component is a multiplicative function ofcovariates (i.e., proportional excess hazards).
Non-proportional excess hazards are common but can beincorporated by introducing follow-up time by covariateinteraction terms.
We note that h(x)− h∗(x) is excess mortality. We might betempted to calculate the number of excess deaths (trivial since d
and d star are both saved in grouped.do) and use it as theoutcome (eliminating the need for a special link). The problemis that the variance is driven by the number of observed deaths.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 102
Interpreting the parameter estimates
The exponentiated parameter estimates have an interpretationas excess hazard ratios, also known as relative excess risks.
An excess hazard ratio of, for example, 1.5 for males comparedto females implies that the excess hazard associated with adiagnosis of cancer is 50% higher for males than females.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 103
Modelling excess mortality using Poisson regression
The model assumes piecewise constant hazards which implies aPoisson process for the number of deaths in each interval. Wecan therefore estimate the model as a GLM.
We assume that the total number of deaths, dj , for observation jcan be described by a Poisson distribution, dj ∼ Poisson(µj)where µj = λjyj and yj is person-time at risk for the observation.Equation 1 is then written as
ln(µj − d∗j ) = ln(yj) + xβ, (2)
where d∗j is the expected number of deaths (due to causes otherthan the cancer of interest).
This implies a generalised linear model with outcome dj , Poissonerror structure, link ln(µj − d∗j ), and offset ln(yj). This is not astandard link function so the link is defined in rs.ado.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 104
Poisson regression for the colon carcinoma data
When we stset the data we specify all deaths as events.
. stset exit, fail(status==1 2) origin(dx) scale(365.25) id(id)
We use strs to estimate relative survival for each combinationof relevant predictor variables and save the results to a file.
. strs using popmort, br(0(1)10) mergeby(_year sex _age)
> by(sex distant agegrp year8594) notables save(replace)
The save(replace) option requests strs to save two data filesusing default names (grouped.dta and individ.dta) andreplace existing copies of these two files if they exist.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 105
Partial contents of grouped.dta (output by strs)
. use grouped, clear
. list start end n d d_star y ///
> if distant==1 & sex==1 & agegrp==1 & year8594==1
+------------------------------------------+
| start end n d d_star y |
|------------------------------------------|
191. | 0 1 251 140 1.8 168.3 |
192. | 1 2 111 48 0.9 77.8 |
193. | 2 3 56 15 0.6 47.0 |
194. | 3 4 37 4 0.5 33.7 |
195. | 4 5 30 8 0.3 21.6 |
|------------------------------------------|
196. | 5 6 16 2 0.2 14.2 |
197. | 6 7 11 1 0.1 9.0 |
198. | 7 8 7 1 0.1 5.6 |
199. | 8 9 5 0 0.1 4.8 |
200. | 9 10 4 1 0.1 2.7 |
+------------------------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 106
Now fit the model
We now fit the Poisson regression model to the data ingrouped.dta (which contains the observed (d) and expected(d_star) numbers of deaths for each life table interval alongwith person-time at risk (y)).
. use grouped, clear
. glm d i.end distant, fam(pois) l(rs d_star) lnoffset(y) eform
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 107
Estimated excess hazard ratios
---------------------------------------------------------------------
d | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
--------+------------------------------------------------------------
end 1 | 1 (base)
2 | .6363348 .0219954 -13.08 0.000 .5946525 .6809388
3 | .3500617 .0202011 -18.19 0.000 .3126254 .391981
4 | .226715 .0201962 -16.66 0.000 .1903941 .2699648
5 | .192835 .0214866 -14.77 0.000 .1550032 .2399004
6 | .118558 .0228721 -11.05 0.000 .0812303 .1730388
7 | .0771904 .0222247 -8.90 0.000 .043902 .1357194
8 | .0482283 .0212467 -6.88 0.000 .0203381 .1143652
9 | .0466583 .0223545 -6.40 0.000 .0182435 .1193299
10 | .0575149 .026689 -6.15 0.000 .0231628 .1428133
distant | 8.187736 .2666006 64.58 0.000 7.681533 8.727298
_cons | .131474 .0040362 -66.09 0.000 .1237964 .1396277
---------------------------------------------------------------------
We estimate that excess mortality is 8.2 times higher forpatients with distant metastases at diagnosis compared topatients without distant metastases at diagnosis.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 108
Fitted values from the excess mortality model
HR: 8.19
0.131 [exp(_cons)]
0.131*8.19
0.131*8.19*0.64
0.131*8.19*0.35
0.1
.2.3
.4.5
.6.7
.8.9
11.
1E
xces
s ha
zard
(pe
r ye
ar)
0 2 4 6 8 10Years since diagnosis
Not distantDistant
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 109
Can adjust for additional covariates
. glm d i.end distant sex year8594 i.agegrp, fam(pois) ///
link(rs d_star) lnoffset(y) eform
----------------------------------------------------------------------
| OIM
d | exp(b) Std. Err. z P>|z| [95% Conf. Interval]
---------+------------------------------------------------------------
end 1 | 1 (base)
2 | .6582263 .022388 -12.30 0.000 .6157772 .7036016
[output omitted]
10 | .07477 .0243443 -7.97 0.000 .0394989 .1415367
|
distant | 8.008541 .2490144 66.91 0.000 7.535056 8.511779
sex | .9878062 .0272241 -0.45 0.656 .9358634 1.042632
year8594 | .8909376 .0238832 -4.31 0.000 .8453358 .9389994
|
agegrp |
0 | 1 (base)
1 | 1.046824 .0680002 0.70 0.481 .9216818 1.188959
2 | 1.17649 .070505 2.71 0.007 1.046109 1.32312
3 | 1.549778 .0950402 7.14 0.000 1.374262 1.74771
|
_cons | .1149154 .0087708 -28.35 0.000 .0989489 .1334582
----------------------------------------------------------------------
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 110
Interpretation
The variable year8594 is coded as 1 for patients diagnosed1985–1994 and 0 for patients diagnosed 1975–1984.
We see that patients diagnosed in the recent period areestimated to experience 11% lower excess mortality compared tothose diagnosed in the earlier period.
There is evidence that excess mortality decreases with follow-uptime, higher excess mortality in the older age groups, and noevidence of a difference between males and females.
No evidence that the effect of distant metastases at diagnosis isconfounded by sex, age at diagnosis, or period of diagnosis.
Later we will show time can be modelled using a smoothfunction rather than a step function.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 111
Flexible parametric relative survival models
From a practical point of view fitting flexible parametric relativesurvival models is simple.
Relative Survival in stpm2
. stpm2 agegrp2-agegrp4, scale(hazard) df(5) bhazard(rate)
We just add the bhazard(rate) option, where rate is theexpected mortality rate at the event times.
Most modelling issues are similar (or the same) as cause-specificmodels.
e.g., Time-dependent effects (non-proportional excess hazards)are fitted in the same way.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 112
Merging in expected mortality
The expected mortality at the time of death is required.
Make use of stset information to obtain attained age andcalendar year.
merging in population mortality file
. use ew_breast(England and Wales Breast Cancer: All ages). stset survtime, failure(dead==1) exit(time 5) id(ident)
(output omitted ). gen age = int(min(agediag + _t,99)). gen year = int(year(datediag) + _t). merge m:1 sex region dep year age using popmort_UK ///> ,keepusing(rate) keep(match)
Result # of obs.
not matched 0matched 115,331 (_merge==3)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 113
Proportional excess hazards model
. stpm2 agegrp2-agegrp5, scale(hazard) df(7) eform bhazard(rate) nologLog likelihood = -133767.97 Number of obs = 115331
exp(b) Std. Err. z P>|z| [95% Conf. Interval]
xbagegrp2 1.051629 .0182862 2.90 0.004 1.016393 1.088087agegrp3 1.072436 .018162 4.13 0.000 1.037424 1.108631agegrp4 1.410387 .0250455 19.36 0.000 1.362143 1.46034agegrp5 2.649869 .0510512 50.58 0.000 2.551676 2.751841_rcs1 2.343311 .0111576 178.85 0.000 2.321544 2.365282_rcs2 .9680121 .0032421 -9.71 0.000 .9616784 .9743875_rcs3 .9520213 .0018722 -25.00 0.000 .9483589 .9556979_rcs4 1.024994 .0013508 18.73 0.000 1.02235 1.027645_rcs5 1.004471 .0008377 5.35 0.000 1.002831 1.006115_rcs6 1.002511 .0005577 4.51 0.000 1.001419 1.003605_rcs7 1.000378 .0003745 1.01 0.312 .9996448 1.001113
. estimates store rs_hazard
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 114
Predict excess hazards
range temptime 0.003 5 200predict eh ph1, hazard per(1000) zeros timevar(temptime)forvalues i = 2/5 {predict eh ph‘i´, hazard per(1000) at(agegrp‘i´ 1) timevar(temptime) zeros}
Note predict command is same as a cause-specific model.
Prediction is excess hazard (mortality) rate as this is a relativesurvival model.
The use of the timevar option saves time with large datasets(here the prediction is for 200 observations rather than 115,331.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 115
Predicted excess mortality rate
50
100
200
500
750
1000
1500
Exc
ess
Mor
talit
y R
ate
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Years from Diagnosis
<5050−5960−6970−7980+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 116
Time-dependent effects
Fitting time-dependent effects is same as before.
stpm2 agegrp2-agegrp5, scale(hazard) df(7) bhazard(rate) ///tvc(agegrp2-agegrp5) dftvc(3) nolog
As are predictions...
predict eh_tvc1, hazard per(1000) timevar(temptime) zerospredict rs_tvc1, survival timevar(temptime) zerosforvalues i = 2/5 {
predict eh_tvc‘i´, hazard per(1000) at(agegrp‘i´ 1) ///timevar(temptime) zeros
predict rs_tvc‘i´, survival at(agegrp‘i´ 1) ///timevar(temptime) zeros
}
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 117
Predicted excess mortality rate
50
100
200
500
750
1000
1500
Exc
ess
Mor
talit
y R
ate
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Years from Diagnosis
<5050−5960−6970−7980+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 118
Predicted relative survival
0.0
0.2
0.4
0.6
0.8
1.0
Rel
ativ
e S
urvi
val
0 1 2 3 4 5Years from Diagnosis
<5050−5960−6970−7980+
The dots are Ederer II estimates.Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 119
Quantifying differences
Excess mortality rate ratios, excess mortality rate differences anddifferences in relative survival can be easily estimated.
forvalues i = 2/5 {predict ehr‘i´ tvc, hrnum(agegrp‘i´ 1) timevar(temptime) cipredict ehdiff‘i´ tvc, hdiff1(agegrp‘i´ 1) timevar(temptime) ///
per(1000) cipredict rsdiff‘i´ tvc, sdiff1(agegrp‘i´ 1) timevar(temptime) ci
}
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 120
Excess mortality rate ratios
1
2
4
10
203050
Exc
ess
Mor
talit
y R
ate
Rat
io
0 1 2 3 4 5Years from Diagnosis
50-59
1
2
4
10
203050
Exc
ess
Mor
talit
y R
ate
Rat
io
0 1 2 3 4 5Years from Diagnosis
60-69
1
2
4
10
203050
Exc
ess
Mor
talit
y R
ate
Rat
io
0 1 2 3 4 5Years from Diagnosis
70-79
1
2
4
10
203050
Exc
ess
Mor
talit
y R
ate
Rat
io
0 1 2 3 4 5Years from Diagnosis
80+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 121
Excess mortality rate differences
0
200
400
600
Diff
eren
ce in
Exc
ess
Mor
talit
y R
ates
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Years from Diagnosis
50-59
0
200
400
600
Diff
eren
ce in
Exc
ess
Mor
talit
y R
ates
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Years from Diagnosis
60-69
0
200
400
600
Diff
eren
ce in
Exc
ess
Mor
talit
y R
ates
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Years from Diagnosis
70-79
0
200
400
600
Diff
eren
ce in
Exc
ess
Mor
talit
y R
ates
(per
100
0 pe
rson
yea
rs)
0 1 2 3 4 5Years from Diagnosis
80+
Due to very high initial differences, the estimated functions for the 70-79 and80+ age groups are not plotted for the first month.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 122
Differences in relative survival
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
Diff
eren
ce in
Rel
ativ
e S
urvi
val
0 1 2 3 4 5Years from Diagnosis
50-59
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
Diff
eren
ce in
Rel
ativ
e S
urvi
val
0 1 2 3 4 5Years from Diagnosis
70-79
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
Diff
eren
ce in
Rel
ativ
e S
urvi
val
0 1 2 3 4 5Years from Diagnosis
70-79
-0.25
-0.20
-0.15
-0.10
-0.05
0.00
Diff
eren
ce in
Rel
ativ
e S
urvi
val
0 1 2 3 4 5Years from Diagnosis
80+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 123
Estimating relative survival using Stata
As an example, we will use data on patients diagnosed withcolon carcinoma in Finland 1975–94. Potential follow-up to endof 1995.
sex byte sex Sex
age byte Age at diagnosis
stage byte stage Clinical stage at diagnosis
mmdx byte Month of diagnosis
yydx int Year of diagnosis
surv_mm float Survival time in months
surv_yy float Survival time in years
status byte status Vital status at last contact
subsite byte colonsub Anatomical subsite of tumour
year8594 byte year8594 Year of diagnosis 1985-94
agegrp byte agegrp Age in 4 categories
dx int Date of diagnosis
exit int Date of exit
id float Unique ID
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 124
Coding of vital status
. use colon
. codebook status
--------------------------------------------------
status Vital status at last contact
--------------------------------------------------
range: [0,4] units: 1
unique values: 4 missing .: 0/15564
Freq. Numeric Label
4642 0 Alive
8369 1 Dead: cancer
2549 2 Dead: other
4 4 Lost to follow-up
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 125
The population mortality file (popmort.dta)
. list
+----------------------------------------+
| sex _year _age prob rate |
|----------------------------------------|
1. | 1 1951 0 .96429 .0363632 |
2. | 1 1951 1 .99639 .0036165 |
3. | 1 1951 2 .99783 .0021724 |
4. | 1 1951 3 .99842 .0015812 |
5. | 1 1951 4 .99882 .0011807 |
|----------------------------------------|
6. | 1 1951 5 .99893 .0010706 |
7. | 1 1951 6 .99913 .0008704 |
8. | 1 1951 7 .99905 .0009504 |
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 126
The strs command for estimating and modelling
relative survival using Stata
Estimating relative survival.
cohort, period, or hybrid approachchoice of three methods for estimating expected survival(Ederer I, Ederer II, Hakulinen); Pohar Perme estimatorestimation in the presence of competing risks (Cronin and Feuer(2000) [19]).estimates can be standardised (by age for example)saves estimates for subsequent modelling (or presentation intables or graphs)
Modelling excess mortality (relative survival)
several alternative approaches to estimating the model
See Dickman et al. [20] for details.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 127
An example: localised colon carcinoma
. use colon if stage==1, clear
. stset surv_mm, fail(status==1 2) id(id) scale(12)
. strs using popmort, br(0 0.5 1(1)9) mergeby(_year sex _age) by(sex)
-> sex = Male
+------------------------------------------------------------------------+
|interval n d w p p_star r cp cp_e2 cr_e2 |
|------------------------------------------------------------------------|
| 0 .5 2620 229 0 0.9126 0.9728 0.9381 0.9126 0.9728 0.9381 |
| .5 1 2391 99 0 0.9586 0.9749 0.9833 0.8748 0.9484 0.9224 |
| 1 2 2292 229 166 0.8963 0.9483 0.9452 0.7841 0.8993 0.8719 |
| 2 3 1897 180 139 0.9015 0.9470 0.9519 0.7069 0.8517 0.8300 |
| 3 4 1578 140 119 0.9078 0.9449 0.9607 0.6417 0.8048 0.7974 |
|------------------------------------------------------------------------|
| 4 5 1319 113 104 0.9108 0.9428 0.9660 0.5845 0.7588 0.7703 |
| 5 6 1102 102 81 0.9039 0.9414 0.9601 0.5283 0.7143 0.7396 |
| 6 7 919 71 71 0.9196 0.9409 0.9774 0.4859 0.6721 0.7229 |
| 7 8 777 59 72 0.9204 0.9391 0.9800 0.4472 0.6312 0.7084 |
| 8 9 646 49 62 0.9203 0.9380 0.9811 0.4115 0.5921 0.6950 |
+------------------------------------------------------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 128
Syntax of the strs command
strs using filename[
if] [
in] [
iweight=varname], breaks(numlist
ascending) mergeby(varlist)[by(varlist) diagage(varname)
diagyear(varname) attage(newvarname) attyear(newvarname)
survprob(varname) maxage(int 99) standstrata(varname) brenner
list(varlist) potfu(varname) format(%fmt) pohar ederer1 notables
level(int) save[(replace)
]savind(filename
[, replace
])
savgroup(filename[, replace
])]
the patient data file must be stset using the id() option withtime since entry in years as the timescale before using strs
using filename specifies a file containing general populationsurvival probabilities sorted by the variables specified inmergeby().
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 129
Life table quantities calculated by strs
start Start of life table interval
end End of life table interval
n Number alive at start
d Number of deaths during the interval
d_star Expected number of deaths
ns Number of survivors
w Withdrawals (censorings) during the interval
n_prime Effective number at risk
y Person-time at risk
p Interval-specific observed survival
se_p Standard error of P
lo_p Lower 95% CI for P
hi_p Upper 95% CI for P
p_star Interval-specific expected survival (Ederer II)
r Interval-specific relative survival (Ederer II)
se_r Standard error of R
lo_r Lower 95% CI for R
hi_r Upper 95% CI for R
cp Cumulative observed survival
se_cp Standard error of CP
lo_cp Lower 95% CI for CP
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 130
Life table quantities calculated by strs 2
hi_cp Upper 95% CI for CP
nu Estimated excess mortality rate, (d-d_star)/y
cp_e1 Cumulative expected survival (Ederer I)
cr_e1 Cumulative relative survival (Ederer I)
lo_cr_e1 Lower 95% CI for CR (Ederer I)
hi_cr_e1 Upper 95% CI for CR (Ederer I)
cp_e2 Cumulative expected survival (Ederer II)
cr_e2 Cumulative relative survival (Ederer II)
lo_cr_e2 Lower 95% CI for CR (Ederer II)
hi_cr_e2 Upper 95% CI for CR (Ederer II)
cp_hak Cumulative expected survival (Hakulinen)
cr_hak Cumulative relative survival (Hakulinen)
lo_cr_hak Lower 95% CI for CR (Hakulinen)
hi_cr_hak Upper 95% CI for CR (Hakulinen)
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 131
Estimates can be saved to a file
. use colon if stage==1, clear
. stset surv_mm, fail(status==1 2) id(id) scale(12)
. strs using popmort, br(0(1)10) mergeby(_year sex _age) by(sex agegrp) save
. use grouped, clear
. gen n0=n[_n-4]
. list sex agegrp n0 cp cr_e2 lo_cr_e2 hi_cr_e2 if end==5, sepby(sex) noobs
+----------------------------------------------------------------+
| sex agegrp n0 cp cr_e2 lo_cr_e2 hi_cr_e2 |
|----------------------------------------------------------------|
| Male 0-44 161 0.7737 0.7881 0.7102 0.8486 |
| Male 45-59 462 0.7686 0.8233 0.7766 0.8636 |
| Male 60-74 1228 0.5945 0.7512 0.7128 0.7878 |
| Male 75+ 769 0.4131 0.7777 0.7067 0.8479 |
|----------------------------------------------------------------|
| Female 0-44 136 0.7657 0.7709 0.6866 0.8358 |
| Female 45-59 531 0.7765 0.7953 0.7536 0.8314 |
| Female 60-74 1488 0.6993 0.7873 0.7588 0.8141 |
| Female 75+ 1499 0.4854 0.7816 0.7374 0.8249 |
+----------------------------------------------------------------+
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 132
References
[1] Jatoi I, Anderson WF, Jeong JH, Redmond CK. Breast cancer adjuvant therapy: time toconsider its time-dependent effects. J Clin Oncol 2011;29:2301–2304.
[2] Hernan MA. The hazards of hazard ratios. Epidemiology 2010;21:13–15.
[3] Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, et al.. Estrogen plusprogestin and the risk of coronary heart disease. N Engl J Med 2003;349:523–534.
[4] Sasieni PD. Proportional excess hazards. Biometrika 1996;83:127–141.
[5] Pohar Perme M, Henderson R, Stare J. An approach to estimation in relative survivalregression. Biostatistics 2009;10:136–146.
[6] Cox DR. Regression models and life-tables (with discussion). JRSSB 1972;34:187–220.
[7] Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics inMedicine 1989;8:551–561.
[8] Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-oddsmodels for censored survival data, with application to prognostic modelling and estimationof treatment effects. Statistics in Medicine 2002;21:2175–2197.
[9] Andersson TML, Dickman PW, Eloranta S, Lambert PC. Estimating and modelling curein population-based cancer studies within the framework of flexible parametric survivalmodels. BMC Med Res Methodol 2011;11:96.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 133
References 2
[10] Lambert PC, Royston P. Further development of flexible parametric models for survivalanalysis. The Stata Journal 2009;9:265–290.
[11] Royston P, Lambert PC. Flexible parametric survival analysis in Stata: Beyond the Coxmodel . Stata Press, 2011.
[12] Reid N. A conversation with Sir David Cox. Statistical Science 1994;9:439–455.
[13] Rutherford MJ, Hinchliffe SR, Abel GA, Lyratzopoulos G, Lambert PC, Greenberg DC.How much of the deprivation gap in cancer survival can be explained by variation in stageat diagnosis: An example from breast cancer in the east of england. International Journalof Cancer 2013;.
[14] Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival.Stat Med 2004;23:51–64.
[15] Esteve J, Benhamou E, Croasdale M, Raymond L. Relative survival and the estimation ofnet survival: elements for further discussion. Statistics in Medicine 1990;9:529–538.
[16] Hakulinen T, Tenkanen L, Abeywickrama K, Paivarinta L. Testing equality of relativesurvival patterns based on aggregated data. Biometrics 1987;43:313–325.
[17] Berry G. The analysis of mortality by the subject-years method. Biometrics 1983;39:173–184.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 134
References 3
[18] Pocock S, Gore S, Kerr G. Long term survival analysis: the curability of breast cancer.Stat Med 1982;1:93–104.
[19] Cronin KA, Feuer EJ. Cumulative cause-specific mortality for cancer patients in thepresence of other causes: a crude analogue of relative survival. Statistics in Medicine2000;19:1729–1740.
[20] Dickman PW, Coviello E, Hills M. Estimating and modelling relative survival. The StataJournal 2012;(in press). http://pauldickman.com/survival/strs.pdf.
Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 135