session 15 modelling net survival - paul dickman 15 modelling net survival paul w dickman 1 and paul...

46
Session 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1,2 1 Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden 2 Department of Health Sciences, University of Leicester, UK Cancer survival: principles, methods and analysis LSHTM July 2014

Upload: buiquynh

Post on 10-May-2018

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Session 15

Modelling net survival

Paul W Dickman1 and Paul C Lambert1,2

1Department of Medical Epidemiology and Biostatistics,Karolinska Institutet, Stockholm, Sweden

2Department of Health Sciences,University of Leicester, UK

Cancer survival: principles, methods and analysisLSHTM

July 2014

Page 2: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Overview

Outcome can be expressed as either a survival proportion ormortality rate (hazard)

The concept of adjusting for ‘time’; splitting time

Modelling cause-specific mortality

Cox proportional hazards modelPoisson regressionFlexible parametric modelStrong similarity of the approaches

Modelling excess mortality (cannot use Cox regression)

Poisson regressionFlexible parametric model

The proportional hazards assumption

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 2

Teaching style

I’ll focus on the key concepts explained via examples; usinggraphs where possible.

There are more slides than I can cover.

I chose to distribute slides with mathematical detail and extrainformation for reference.

I’ll discuss the exercises at the end of the lecture.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 3

Recap: The survivor function S(t) and

the hazard function h(t)

In survival analysis we can express the outcome in terms ofeither the survival proportion (the proportion who do notexperience the event) or the event rate.

We are assuming you are familiar with the basic concepts of thesurvivor function, S(t), and the hazard function, h(t).

We will nevertheless take some time to discuss these functions,how they are related, and their relevance for the topic of thiscourse.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 4

Page 3: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Relation between the survivor and hazard functions

h(t) = lim∆t→0

Pr(event in (t, t + ∆t] | alive at t)

∆t

= lim∆t→0

F (t + ∆t)− F (t)

S(t)×∆twhere F (t) = 1− S(t)

= lim∆t→0

S(t + ∆t)− S(t)

∆t× −1

S(t)

=dS(t)

dt× −1

S(t)by definition of a derivative

= − d ln S(t)

dtsince d/dx ln(f (x)) = f ′(x)/f (x)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 5

What does this mean in practice?

h(t) = − ddt

ln S(t)

In practical terms, this means that the event rate is proportionalto the rate at which the survival function decreases.

That is, if the survival function is decreasing sharply with timethen the mortality rate is high (and vice versa).

If the survival function is flat then the hazard is zero (and viceversa).

The derivative of a function at a point is the slope of the[tangent to the] curve at that point. A curve that is decreasing(like the survival function) has a negative slope, hence thenegative sign in the formula above.

Strictly, the hazard is the rate of change of ln S(t) but we canthink of it as being proportional to the rate of change of S(t).

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 6

Other relationships (for completeness)

− d log S(t)

dt= h(t)

m

S(t) = exp

(−∫ t

0

h(u) du

)= exp (−H(t))

H(t) =∫ t

0h(u) du is called the integrated hazard or cumulative

hazard.

h(t) = − d log(S(t))

dt= −S ′(t)

S(t)=

F ′(t)

1− F (t)=

f (t)

S(t)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 7

Page 4: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Which treatment (A or C) has the best survival?

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l Fun

ctio

n

0 .2 .4 .6 .8 1Time since treatment (years)

Treatment ATreatment C

Which treatment is associated with the best survival?

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 8

Which treatment (A or C) has the best survival?

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l Fun

ctio

n

0 1 2 3 4 5Time since treatment (years)

Treatment ATreatment C

Which treatment is associated with the best survival?

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 9

Now plot the (approximate) hazard for A

Haz

ard

Rat

e

0 1 2 3 4 5Years since diagnosis

Treatment ATreatment C

Don't worry about the scale; pay attention to if (and where) the lines crossDraw the (approximate) hazard function for treatment A

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 10

Page 5: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

The two hazard functions

0

.1

.25

.5

1

2

4

6

Haz

ard

Rat

e

0 1 2 3 4 5Years since diagnosis

Treatment ATreatment C

Hazard function for each treatment group

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 11

What about if we further extend the follow-up?

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l Fun

ctio

n

0 5 10 15Time since treatment (years)

Treatment ATreatment C

Which treatment is associated with the best survival?

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 12

Hazard ratio for A vs C

.25

.5

1

2

5

20

Haz

ard

Rat

io (

A v

s C

)

0 5 10 15Years since diagnosis

True Hazard RatioHR from PH model

Hazard Ratio (A vs C)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 13

Page 6: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

How is this relevant for cancer patient survival?

When studying cancer patient survival, hazards are very likely tobe non-proportional.

In general, if we can identify a factor associated with poorsurvival, the effect is usually greater early in the follow-up.

Jatoi et al. (2011) [1] discuss the presence of non-proportionalhazards for breast cancer (see next slide for hazards by ERstatus).

They write: ‘Similar nonproportional hazard rates are evident forlarge versus small tumors, positive versus negative lymph nodes,high versus low tumor grade, the intrinsic molecular breastcancer subtypes and the molecular prognostic signaturesOncotype DX and Mammaprint.’

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 14

Hazards by ER status, Jatoi et al. (2011) [1]

Breast Cancer Adjuvant Therapy: Time to ConsiderIts Time-Dependent EffectsIsmail Jatoi, University of Texas Health Science Center, San Antonio, TXWilliam F. Anderson, National Cancer Institute, Bethesda, MDJong-Hyeon Jeong and Carol K. Redmond, University of Pittsburgh, Pittsburgh, PA

Breast cancer is a chronic and heterogeneous disease that mayrecur many years after initial diagnosis and treatment.1 This has im-portant implications for the practicing oncologist. For instance, anearly effect of adjuvant treatment may diminish over time after cessa-tion of therapy, or, alternatively, there may exist a lag time before sometreatment effects become pronounced. Indeed, the risk of breast can-cer recurrence and death (hazard rate) varies over time (ie, is nonpro-portional) according to prognostic and predictive factors (Figs 1 and 2;Table 1).6,13 The hazard curve for breast cancer death peaks between 2and 3 years after initial diagnosis and then declines sharply, suggestingthat the biologic mechanisms responsible for early and late cancer-specific events are fundamentally different. Thus the early and lateeffects of adjuvant therapy may vary accordingly.

For example, Figure 1 shows the annual hazard rates for breastcancer deaths (percent per year) after initial diagnosis among womenin the National Cancer Institute’s Surveillance, Epidemiology, andEnd Results 13 Registries database.2 The average annual rate of breastcancer deaths is nonproportional overall and by estrogen receptor(ER) expression.14 Thus the annual hazard rate for all cases peaks near3% per year between the second and third years after diagnosis andthen declines to 1% to 2% per year by the sixth through eighth years.The hazard rates for ER-negative and ER-positive tumors peak atapproximately 6.5% and 2% per year, respectively, between the firstand third years (ie, � three-fold difference). Notably, ER-negative toER-positive hazard rates cross between the seventh and eighth years,after which women with ER-negative tumors have a lower rate ofbreast cancer death than those with ER-positive tumors. Table 1 fur-ther shows the fold difference for ER-negative compared with ER-positive tumors over time. ER-negative to ER-positive hazard ratios(HR) were more than 1.0 before the eighth year, after which HRs wereless than 1.0.

Similar nonproportional hazard rates are evident for largeversus small tumors, positive versus negative lymph nodes, highversus low tumor grade,13 the intrinsic molecular breast cancersubtypes,6,8 and the molecular prognostic signatures OncotypeDX12 and Mammaprint9-11 (Fig 2). Thus hazard rates for relapseamong high-risk tumors (eg, nonluminal A, Mammaprint poor sig-nature, and Oncotype high-risk score) show a sharp peak soon afterinitial diagnosis, similar to ER-negative cancers (Fig 1). Conversely,hazard rates for low-risk tumors (eg, luminal A, Mammaprint goodsignature, and Oncotype low- and intermediate-risk score) lack a

sharp peak, similar to ER-positive tumors. These hazard curves sug-gest that the biologic mechanisms responsible for early and late breastcancer events differ and may therefore respond differently to thesame treatment.

0

Annu

al H

azar

d Ra

te fo

r Bre

ast

Canc

er D

eath

(%)

Time After Initial Breast CancerDiagnosis (years)

7

6

4

5

2

3

1

2 4 6 8 10 12

All cases

ER negative

ER positive

Fig 1. Annual hazard rates for breast cancer death and ER-negative to ER-positive hazard ratios (Table 1) using the National Cancer Institute’s Surveillance,Epidemiology, and End Results 13 Registries Databases (1992 to 2007) forinvasive female breast cancer.2 Annual hazard rates for breast cancer deathoverall (all cases combined, n � 401,693), estrogen receptor (ER) –negative(n � 74,567), and ER-positive (n � 257,426) breast cancers. The annual hazardrate for cancer-specific death describes the instantaneous rate of dying fromcancer in a specified time interval after initial cancer diagnosis. Hazard rate curveswere modeled using cubic splines with join-points selected by Akaike’s informa-tion criteria3,4; 95% CIs were applied with bootstrap resampling techniques.5

Under the null hypothesis of no interaction over time, annual hazard rates forER-positive and ER-negative breast cancers would be proportional (or similar)with follow-up after initial breast cancer diagnosis. The overall rate of breastcancer death for all cases peaks near 3% per year between the second and thirdyears after initial breast cancer diagnosis and then declines to 1% to 2% per yearby the sixth through eighth years. The annual hazard rates for women withER-negative and ER-positive tumors demonstrate peaks of approximately 6.5%and 2% near the first through third years after initial breast cancer diagnosis,respectively (� three-fold difference). An ER-negative to ER-positive hazard ratecross-over occurs between the seventh and eighth years after breast cancerdiagnosis, and then women with ER-negative tumors had a somewhatparadoxically lower rate of breast cancer death than those with ER-positivebreast cancers.

JOURNAL OF CLINICAL ONCOLOGY COMMENTS AND CONTROVERSIES

VOLUME 29 � NUMBER 17 � JUNE 10 2011

© 2011 by American Society of Clinical Oncology 2301Journal of Clinical Oncology, Vol 29, No 17 (June 10), 2011: pp 2301-2304

Downloaded from jco.ascopubs.org on June 21, 2011. For personal use only. No other uses without permission.Copyright © 2011 American Society of Clinical Oncology. All rights reserved.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 15

‘The hazards of hazard ratios’ (Hernan 2010)

Hernan 2010 [2] presents an interesting discussion on ‘thehazards of hazard ratios’ where he argues that hazard ratios‘have a built-in selection bias’.

Hernan argues that our population will comprise both susceptibleand non-susceptible individuals.

When exposed, the susceptible individuals will experience theevent sooner than if they were unexposed resulting in a lowerproportion of susceptible individuals remaining at risk in theexposed relative to the unexposed group as time progresses (anda decreasing hazard ratio).

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 16

Page 7: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

‘The hazards of hazard ratios’ (Hernan 2010)

The Women’s Health Initiative [3] followed over 16,000 womenfor an average of 5.2 years to study the association betweenHRT and CHD. Halted due to safety concerns.

‘Combined hormone therapy was associated with a hazard ratiofor CHD of 1.24.’ [from the article abstract]

HRs during each year of follow-up: 1.81, 1.34, 1.27, 1.25, 1.45,and 0.70 for years 1, 2, 3, 4, 5, and 6+, respectively. [Table 2]

The average HR in the WHI would have been 1.8 if the studyhad been halted after 1 year of follow-up, 1.7 after 2 years, and1.2 after 5 years.

The 24% increase in the rate of coronary heart disease thatmany researchers and journalists consider as the effect ofcombined hormone therapy is the result of the arbitrary choice ofan average follow-up period of 5.2 years.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 17

Overview of approaches to modelling prognosis

Modelling cause-specific mortalityCox proportional hazards model

Poisson regression

Parametric survival models

Flexible parametric models

Similarity of these approaches

Modelling excess mortality (cannot use Cox regression)

Poisson regression

Flexible parametric models

Analogues to the Cox model [4, 5]

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 18

Example: survival of patients diagnosed with colon

carcinoma in Finland

Patients diagnosed with colon carcinoma in Finland 1984–95.Potential follow-up to end of 1995; censored after 10 years.

Outcome is death due to colon carcinoma.

Interest is in the effect of clinical stage at diagnosis (distantmetastases vs no distant metastases).

How might we specify a statistical model for these data?

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 19

Page 8: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

0.4

.81.

21.

6E

mpi

rical

haz

ard

0 2 4 6 8 10Years since diagnosis

Not distantDistant

sts graph, by(distant) hazard kernel(epan2)Smoothed empirical hazards (cancer-specific mortality rates)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 20

0.4

.81.

21.

6E

mpi

rical

haz

ard

0 2 4 6 8 10Years since diagnosis

Not distantDistant

sts graph, by(distant) hazardSmoothed empirical hazards (with default smoother)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 21

The Cox proportional hazards model

The ‘intercept’ in the Cox model [6], the hazard (event rate) forindividuals with all covariates x at the reference level, can bethought of as an arbitrary function of time1, often called thebaseline hazard and denoted by h0(t).

The hazard at time t for individual with other covariate values isa multiple of the baseline

h(t|x) = h0(t) exp(xβ).

Alternativelyln[h(t|x)] = ln[h0(t)] + xβ.

Does not explicitly estimate h0(t) while estimating the loghazard ratios (β).

1time t can be defined in many ways, e.g., attained age, time-on-study,calendar time, etc.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 22

Page 9: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

0.0

5.1

.2.4

.81.

6E

mpi

rical

haz

ard

0 2 4 6 8 10Years since diagnosis

Not distantDistant

sts graph, by(distant) hazard kernel(epan2) yscale(log)Smoothed empirical hazards on log scale

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 23

Fit a Cox model to estimate the mortality rate ratio

. stcox distant

failure _d: status == 1

analysis time _t: (exit-origin)/365.25

origin: time dx

note: time>10 trimmed

Cox regression -- Breslow method for ties

No. of subjects = 13208 Number of obs = 13208

No. of failures = 7122

Time at risk = 44013.26215

LR chi2(1) = 5544.65

Log likelihood = -61651.446 Prob > chi2 = 0.0000

--------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% C.I.]

--------+-----------------------------------------------------

distant | 6.557777 .1689328 73.00 0.000 6.235 6.897

--------------------------------------------------------------

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 24

Hazard ratio: 6.56

0.4

.81.

21.

6F

itted

haz

ard

0 2 4 6 8 10Years since diagnosis

Not distantDistant

stcurve, hazard at1(distant=0) at2(distant=1) kernel(epan2)Fitted hazards from Cox model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 25

Page 10: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

0.0

5.1

.2.4

.81.

6F

itted

haz

ard

0 2 4 6 8 10Years since diagnosis

Not distantDistant

stcurve, hazard at1(distant=0) at2(distant=1) kernel(epan2) yscale(log)Fitted hazards (on log scale) from Cox model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 26

0.4

.81.

21.

62

Em

piric

al h

azar

d

0 2 4 6 8 10Years since diagnosis

Young, Not distantYoung, DistantOld, Not distantOld, Distant

sts graph, by(agestage) hazard kernel(epan2)Smoothed empirical hazards by age and stage

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 27

Fit a Cox model adjusted for age at diagnosis

. stcox distant old

failure _d: status == 1

analysis time _t: (exit-origin)/365.25

origin: time dx

note: time>10 trimmed

Cox regression -- Breslow method for ties

No. of subjects = 13208 Number of obs = 13208

No. of failures = 7122

Time at risk = 44013.26215

LR chi2(2) = 5778.91

Log likelihood = -61534.317 Prob > chi2 = 0.0000

----------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

--------+-------------------------------------------------------------

distant | 6.65287 .1716121 73.47 0.000 6.324877 6.997871

old | 1.463653 .0358098 15.57 0.000 1.395124 1.535549

----------------------------------------------------------------------

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 28

Page 11: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

0.4

.81.

21.

62

Fitt

ed h

azar

d

0 2 4 6 8 10Years since diagnosis

Young, Not distantYoung, DistantOld, Not distantOld, Distant

Fitted hazards from Cox model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 29

Hazard ratio: 6.56

Using default (Breslow) method for ties

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from Cox model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 30

Hazard ratio: 6.64

stcox distant, efron

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from Cox model with Efron method for ties

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 31

Page 12: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Hazard ratio: 10.04

Hazard RatiosCox: 6.64

Exponential: 10.04

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from parametric survival model (exponential)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 32

Hazard ratio: 7.41

Hazard RatiosCox: 6.64

Exponential: 10.04Weibull: 7.41

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from parametric survival model (Weibull)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 33

.51

1.5

2H

azar

d

2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from parametric survival model (Weibull)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 34

Page 13: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

12

34

Cum

ulat

ive

Haz

ard

2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted cumulative hazards from Weibull model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 35

Hazard ratio: 6.89

Hazard RatiosCox: 6.64

Exponential: 10.04Weibull: 7.41

Poisson (annual): 6.89

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from Poisson model (yearly intervals)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 36

Time as a confounder

When the rate changes with time then time may confound theeffect of exposure.

We will, for the moment, assume that the rates are constantwithin broad time bands but can change from band to band.

This approach (categorising a metric variable and assuming theeffect is constant within each category) is standard inepidemiology.

We often categorise metric variables — the only difference hereis that the variable is ‘time’.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 37

Page 14: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

What are the failure rates for each band?

Consider a group of subjects with rates λ1 during band 1, λ2 duringband 2, etc.

0 5 10 15

Time (years)

5 5 2

5 4

3 u

u

Subject 1

Subject 2

Subject 3

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 38

Splitting the records by follow-up time

In software, we split the observation for each subject into oneobservation for each timeband.

subject timeband follow-up failure1 0-5 3 12 0-5 5 02 5-10 4 03 0-5 5 03 5-10 5 03 10-15 2 1

The rate for timeband 0-5 is then 1/(3+5+5), and so on forother time bands.

This method can be used whether rates are varying simply as afunction of time or in response to some time–varying exposure.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 39

Let’s see how we split person-time using Stata

The original data.

. list

+----------------------------+

| subject survtime event |

|----------------------------|

1. | 1 3 1 |

2. | 2 9 0 |

3. | 3 12 1 |

+----------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 40

Page 15: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Stata internal variables created by stset

. stset survtime, fail(event) id(subject)

. list subject survtime event _t0 _t _d

+--------------------------------------------+

| subject survtime event _t0 _t _d |

|--------------------------------------------|

1. | 1 3 1 0 3 1 |

2. | 2 9 0 0 9 0 |

3. | 3 12 1 0 12 1 |

+--------------------------------------------+

stset creates the following internal variables.t0 time at entryt time at exitd failure indicatorst inclusion indicator

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 41

Now we split person-time with stsplit

. stsplit timeband, at(0(5)15)

(3 observations (episodes) created)

. list subject timeband survtime event _t0 _t _d

----------------------------------------------------+

subject timeband survtime event _t0 _t _d

-----------------------------------------------------

1 0 3 1 0 3 1

2 0 5 . 0 5 0

2 5 9 0 5 9 0

3 0 5 . 0 5 0

3 5 10 . 5 10 0

-----------------------------------------------------

3 10 12 1 10 12 1

----------------------------------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 42

We can now tabulate rates by timeband

. strate timeband

failure _d: event

analysis time _t: survtime

id: subject

Estimated rates and 95% confidence intervals

(6 records included in the analysis)

+---------------------------------------------------------+

| timeband D Y Rate Lower Upper |

|---------------------------------------------------------|

| 0 1 13.0000 0.076923 0.010836 0.546082 |

| 5 0 9.0000 0.000000 . . |

| 10 1 2.0000 0.500000 0.070432 3.549536 |

+---------------------------------------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 43

Page 16: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Splitting is a very powerful tool

Not just for Poisson regression. Splitting is used together with,for example, Cox regression for:

Multiple timescalesTime-varying covariatesModelling non-proportional hazards

Splitting is used when applying multi-state models or competingrisks analysis.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 44

Splitting in SAS and R

Stata is the only software we are aware of in which this powerfultool (i.e., splitting person-time) comes standard.

Can also split at, for example, dates of intervention or dates atwhich exposure otherwise changes.

Several user-written SAS macros exist;I use the lexis macro written by Bendix Carstensen(http://staff.pubhealth.ku.dk/~bxc/Lexis/).This macro has been tried and tested over 10 years.

Use the Epi package in R.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 45

Splitting on time since diagnosis for the colon data

. stsplit fu, at(0(1)10)

(37458 observations (episodes) created)

The variable fu (follow-up) will be created. We can name thisanything we like.

We now have a separate observation for each individual for eachyear of follow-up.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 46

Page 17: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

The split dataset

. list id fu _t0 _t _d in 1/10, sepby(id)

+----------------------------+

| id fu _t0 _t _d |

|----------------------------|

1. | 1 0 0 1 0 |

2. | 1 1 1 1.375 1 |

|----------------------------|

3. | 2 0 0 1 0 |

4. | 2 1 1 2 0 |

5. | 2 2 2 3 0 |

6. | 2 3 3 4 0 |

7. | 2 4 4 5 0 |

8. | 2 5 5 6 0 |

9. | 2 6 6 6.875 0 |

|----------------------------|

10. | 3 0 0 .125 1 |

+----------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 47

Rates for each time band

. strate fu, per(1000)

Estimated rates (per 1000) and lower/upper bounds of 95% CI

+------------------------------------------------------+

| fu D Y Rate Lower Upper |

|------------------------------------------------------|

| 0 4223 10.3339 408.6533 396.5122 421.1662 |

| 1 1444 7.4190 194.6351 184.8507 204.9373 |

| 2 597 5.7934 103.0487 95.1054 111.6555 |

| 3 342 4.7379 72.1834 64.9246 80.2537 |

| 4 227 3.9301 57.7599 50.7143 65.7844 |

|------------------------------------------------------|

| 5 130 3.2741 39.7051 33.4342 47.1522 |

| 6 78 2.7282 28.5906 22.9004 35.6946 |

| 7 33 2.2848 14.4430 10.2679 20.3158 |

| 8 22 1.9273 11.4152 7.5163 17.3365 |

| 9 26 1.5845 16.4085 11.1721 24.0993 |

+------------------------------------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 48

Now estimate the effect of distant metastases

while controlling for time since diagnosis

. streg i.fu distant, dist(exp)

-------------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

----------+--------------------------------------------------------------

fu 0 | 1 (base)

1 | .6636731 .0204396 -13.31 0.000 .6247973 .7049677

2 | .4041937 .0178801 -20.48 0.000 .3706255 .4408022

3 | .3008889 .0170795 -21.16 0.000 .2692087 .3362972

4 | .251201 .0172525 -20.12 0.000 .2195639 .2873967

5 | .1754712 .015704 -19.45 0.000 .1472403 .2091151

6 | .1267095 .014524 -18.02 0.000 .101214 .1586273

7 | .0635113 .0111133 -15.75 0.000 .0450719 .0894943

8 | .0506048 .0108267 -13.95 0.000 .0332721 .0769667

9 | .0732248 .0144203 -13.27 0.000 .049777 .1077177

|

distant | 6.890536 .1758401 75.64 0.000 6.554372 7.24394

_cons | .1523781 .0036926 -77.64 0.000 .1453099 .1597901

-------------------------------------------------------------------------

What does the estimate labelled _cons represent?Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 49

Page 18: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Now that we have adjusted for time

Now that we have adjusted for time since diagnosis theestimated rate ratio is similar to that obtained from Coxregression.

Cox and Poisson regression are extremely similar - the onlydifference is that with Poisson regression we categorise time intopre-specified intervals and model the effect of time as a stepfunction (see next slide) whereas in Cox regression we effectivelymodel time as a continuous function.

There is an analogue to the actuarial and Kaplan-Meier methodsfor estimating the survivor function; the actuarial approach usespre-specified intervals whereas the Kaplan-Meier method treatstime as continuous.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 50

Fitted values for the model adjusted for time

Hazard ratio: 6.89

0.152 [exp(_cons)]

0.152*6.89

0.152*6.89*0.66

0.1

.2.3

.4.5

.6.7

.8.9

11.

1H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 51

We can make Poisson regression more similar, and

even equivalent to, Cox regression

The actuarial method with time classified as narrowly as possibleis equivalent to the Kaplan-Meier method (in the absence oftimes where both events and censoring occur).

Similarly, we can make Poisson regression more similar to Coxregression by using a larger number of smaller intervals.

If we split at each event time then the estimates from Poissonregression are equivalent to Cox regression.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 52

Page 19: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Demography and epidemiology:Practical use of the Lexis diagram inthe computer age.

or:

Who needs the Cox-model anyway?

Annual meeting of Finnish Statistical Society23–24 May 2005Revised December 2005.

Bendix CarstensenSteno Diabetes Center, Gentofte, Denmark& Department of Biostatistics, University of Copenhagen

[email protected]

www.biostat.ku.dk/~bxc

The contents of this paper was presented at the meeting of the Finnish Statistical Society in May 2005

in Oulu. The slides presented can be found on my homepage as

http://staff.pubhealth.ku.dk/~bxc/Talks/Oulu.pdf.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 53

Hazard ratio: 6.65

Hazard RatiosCox: 6.64

Exponential: 10.04Weibull: 7.41

Poisson (annual): 6.89Poisson (quarter): 6.65

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from Poisson model (3-months)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 54

Hazard ratio: 6.64

Hazard RatiosCox: 6.64

Exponential: 10.04Weibull: 7.41

Poisson (annual): 6.89Poisson (quarter): 6.65Poisson (months): 6.64

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from Poisson model (months)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 55

Page 20: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Modelling continuous variables (intro to splines)

If we are not happy with modelling, for example, age as a lineareffect we might create the variable age2=age^2 and includeboth variables age and age2 in the model.

We would then be modelling age as a quadratic polynomial andusing 2 degrees of freedom (df).

Could add age3=age^3 to model age as a cubic polynomial.

Alternatively, we might create dummy variables to model age asa step function.

Either way, we are creating a series of variables with which tomodel the effect of age.

Modelling with splines also involves creating a series of variables.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 56

What are splines?

Flexible mathematical functions defined by piecewisepolynomials.

The points at which the polynomials join are called knots.

Constraints ensure the function is smooth.

The most common splines used in practice are cubic splines.

However, splines can be of any degree, n.

Function is forced to have continuous 0th, 1st and 2nd

derivatives.

Regression splines can be incorporated into any regression modelwith a linear predictor.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 57

Using splines to estimate non-linear functions.

25

50

100

150

200

Mor

talit

y R

ate

(100

0 py

's)

0 1 2 3 4 5Years from Diagnosis

Interval Length: 1 week

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 58

Page 21: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

No continuity corrections

25

50

100

150

200

Mor

talit

y R

ate

(100

0 py

's)

0 1 2 3 4 5Years from Diagnosis

No Constraints

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 59

Function forced to join at knots

25

50

100

150

200

Mor

talit

y R

ate

(100

0 py

's)

0 1 2 3 4 5Years from Diagnosis

Forced to Join at Knots

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 60

Continuous first derivative

25

50

100

150

200

Mor

talit

y R

ate

(100

0 py

's)

0 1 2 3 4 5Years from Diagnosis

Continuous 1st Derivatives

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 61

Page 22: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Continuous second derivative

25

50

100

150

200

Mor

talit

y R

ate

(100

0 py

's)

0 1 2 3 4 5Years from Diagnosis

Continuous 2nd Derivatives

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 62

Restricted cubic splines

Cubic splines can behave poorly in the tails.

Extension is restricted cubic splines[7] .

Forced to be linear before the first knot and after the final knot.

This is where there is often less data and standard cubic splinestend to be sensitive to a few extreme values.

For same number of knots needs 4 fewer parameters than cubicsplines.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 63

Hazard ratio: 6.65

Hazard RatiosCox: 6.64

Exponential: 10.04Weibull: 7.41

Poisson (annual): 6.89Poisson (quarter): 6.65Poisson (months): 6.64

Poisson (spline): 6.65

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from Poisson model (rcs 5df)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 64

Page 23: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Hazard ratio: 6.63

Hazard RatiosCox: 6.64

Exponential: 10.04Weibull: 7.41

Poisson (annual): 6.89Poisson (quarter): 6.65Poisson (months): 6.64

Poisson (spline): 6.65Flexible parametric: 6.63

0.4

.81.

21.

6H

azar

d

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted hazards from flexible parametric model (5df)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 65

Fine splitting example:

England and Wales Breast Cancer

stset survtime, failure(dead==1) exit(time 5) id(ident)

stsplit sp_time, every(‘=1/52.18´)

generate risktime = _t - _t0

collapse (min) start=_t0 (max) end=_t (count) n=_d ///

(sum) risktime _d, by(dep5 sp_time)

Leads to about 2.25 million rows before collapsing.

522 rows after collapsing.

We will compare mortality among the most deprived (category5) to the least deprived (category 1) with the other categoriesexcluded.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 66

Fitting a Poisson model with splines

Poisson model with restricted cubic splines

. gen midtime = (start + end)/2

. gen lntime = ln(midtime)

. rcsgen lntime, df(3) gen(rcs) fw(_d) orthogVariables rcs1 to rcs3 were created. glm _d rcs* dep5, family(poisson) lnoffset(risktime) nolog eformGeneralized linear models No. of obs = 522Optimization : ML Residual df = 517

Scale parameter = 1Deviance = 589.4507401 (1/df) Deviance = 1.140137Pearson = 565.6616566 (1/df) Pearson = 1.094123Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.513367Log likelihood = -1172.988905 BIC = -2645.763

OIM_d IRR Std. Err. z P>|z| [95% Conf. Interval]

rcs1 1.033444 .0203234 1.67 0.094 .9943687 1.074055rcs2 1.066464 .0202456 3.39 0.001 1.027513 1.106892rcs3 1.174232 .0229925 8.20 0.000 1.130021 1.220172dep5 1.309601 .0513388 6.88 0.000 1.212747 1.414189

risktime (exposure)

. estimates store pois_rcs_ph

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 67

Page 24: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Interaction with time

Poisson model with time-dependent effects

. glm _d i.dep5##c.rcs* , family(poisson) lnoffset(risktime) nologGeneralized linear models No. of obs = 522Optimization : ML Residual df = 514

Scale parameter = 1Deviance = 571.3748157 (1/df) Deviance = 1.111624Pearson = 549.7399366 (1/df) Pearson = 1.069533Variance function: V(u) = u [Poisson]Link function : g(u) = ln(u) [Log]

AIC = 4.490233Log likelihood = -1163.950943 BIC = -2645.066

OIM_d Coef. Std. Err. z P>|z| [95% Conf. Interval]

1.dep5 .2460437 .0407188 6.04 0.000 .1662363 .325851rcs1 .0974977 .0258872 3.77 0.000 .0467597 .1482358rcs2 .0568394 .0252797 2.25 0.025 .0072921 .1063867rcs3 .1486538 .024408 6.09 0.000 .100815 .1964926

dep5#c.rcs11 -.1711919 .0407214 -4.20 0.000 -.2510044 -.0913795

dep5#c.rcs21 .0377089 .0392366 0.96 0.337 -.0391935 .1146113

dep5#c.rcs31 .0389086 .0413411 0.94 0.347 -.0421186 .1199357

_cons -2.76561 .0237546 -116.42 0.000 -2.812168 -2.719052risktime (exposure)

. estimates store rcs_tvc_df3

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 68

Predicted hazard (mortality) rates

50

100

150

Mor

talit

y R

ate

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Time from Diagnosis (years)

Least Deprived (PH)Least Deprived (TD)Most Deprived (PH)Most Deprived (TD)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 69

Flexible Parametric Survival Models [8, 10, 11]

First introduced by Royston and Parmar (2002) [8].

Parametric estimate of the baseline hazard without the usualrestrictions on the shape (i.e, flexible).

Applicable for ‘standard’ and relative survival models.

Can fit relative survival cure models (Andersson 2011) [9].

Once we have a parametric expression for the baseline hazard wederive other quantities of interest (e.g., survival, hazard ratio,hazard differences, expectation of life).

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 70

Page 25: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

The Cox model[6]

hi(t|xi , β) = h0(t) exp (xiβ)

Advantage: The baseline hazard, h0(t) is not directly estimatedfrom a Cox model.

Disadvantage: The baseline hazard, h0(t) is not directlyestimated from a Cox model.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 71

Quote from Sir David Cox (Reid 1994 [12])

Reid “What do you think of the cottage industry that’s grown uparound [the Cox model]?”

Cox “In the light of further results one knows since, I think Iwould normally want to tackle the problem parametrically.. . . I’m not keen on non-parametric formulations normally.”

Reid “So if you had a set of censored survival data today, youmight rather fit a parametric model, even though there wasa feeling among the medical statisticians that that wasn’tquite right.”

Cox “That’s right, but since then various people have shown thatthe answers are very insensitive to the parametricformulation of the underlying distribution. And if you wantto do things like predict the outcome for a particular patient,it’s much more convenient to do that parametrically.”

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 72

Flexible Parametric Models: Basic Idea

Consider a Weibull survival curve.

S(t) = exp (−λtγ)

If we transform to the log cumulative hazard scale.

ln [H(t)] = ln[− ln(S(t))]

ln [H(t)] = ln(λ) + γ ln(t)

This is a linear function of ln(t)Introducing covariates gives

ln [H(t|xi)] = ln(λ) + γ ln(t) + xiβ

Rather than assuming linearity with ln(t) flexible parametricmodels use restricted cubic splines for ln(t).

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 73

Page 26: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

12

34

Cum

ulat

ive

Haz

ard

2 4 6 8 10Years since diagnosis

Not distantDistant

Fitted cumulative hazards from Weibull model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 74

Flexible Parametric Models: Incorporating Splines

We thus model on the log cumulative hazard scale.

ln[H(t|xi)] = ln [H0(t)] + xiβ

This is a proportional hazards model.Restricted cubic splines with knots, k0, are used to model thelog baseline cumulative hazard.

ln[H(t|xi)] = ηi = s (ln(t)|γ, k0) + xiβ

For example, with 4 knots we can write

ln [H(t|xi)] = ηi = γ0 + γ1z1i + γ2z2i + γ3z3i︸ ︷︷ ︸log baseline

cumulative hazard

+ xiβ︸︷︷︸log hazard

ratios

We are fitting a linear predictor on the log cumulative hazardscale.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 75

Survival and Hazard Functions

We can transform to the survival scale

S(t|xi) = exp(− exp(ηi))

The hazard function is a bit more complex.

h(t|xi) =ds (ln(t)|γ, k0)

dtexp(ηi)

This involves the derivatives of the restricted cubic splinesfunctions, although these are relatively easy to calculate.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 76

Page 27: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 77

Page 28: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Sensitivity to choice of knots;

Simulation study by Rutherford et al. [13]

‘Through the use of simulation, we show that provided asufficient number of knots are used, the approximated hazardfunctions given by restricted cubic splines fit closely to the truefunction for a range of complex hazard shapes.’

‘The simulation results also highlight the insensitivity of theestimated relative effects (hazard ratios) to the correctspecification of the baseline hazard.’

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 80

Simulation Study (Rutherford et al.) [13]

Generate data assuming a mixture Weibull distribution.

0.0

0.5

1.0

1.5

2.0

2.5

Haz

ard

rate

0 2 4 6 8 10Time Since Diagnosis (Years)

Scenario 1

0.0

0.5

1.0

1.5

2.0

2.5

Haz

ard

rate

0 2 4 6 8 10Time Since Diagnosis (Years)

Scenario 2

0.0

0.5

1.0

1.5

2.0

2.5

Haz

ard

rate

0 2 4 6 8 10Time Since Diagnosis (Years)

Scenario 3

0.0

0.5

1.0

1.5

2.0

2.5

Haz

ard

rate

0 2 4 6 8 10Time Since Diagnosis (Years)

Scenario 4

Fit models using restricted cubic splines.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 81

Scenario 3 comparison of Log Hazard Ratios

-.6

-.55

-.5

-.45

-.4

Cox

Mod

el

-.6 -.55 -.5 -.45 -.4Flexible Parametric Model

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 82

Page 29: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Choice of knots: Scenario 3

0.0

0.2

0.4

0.6

0.8

1.0

S(t

)

0 2 4 6 8 10

Survival Function

0.0

0.4

0.8

1.2

1.6

h(t)

0 2 4 6 8 10

Hazard Function

Time since diagnosis (years)

8 knots (7 df)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 83

Model Selection

Estimated hazard and survival functions fairly insensitive to knotlocation.

AIC and BIC can be used as rough guides to choose models.

Not crucial (within reason) to inference based on the model.

We often present a sensitivity analysis to show this.

Could treat number of knots and their locations as unknowns.

However, it is an area where more work is still required.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 84

Implementation in Stata [10]

stpm2 available from SSC. ssc install stpm2

All cause survival. stpm2 eng, scale(hazard) df(5)

Relative survival. stpm2 eng, scale(hazard) df(5) hazard(rate)

Time-dependent effects. stpm2 eng, scale(hazard) df(5) hazard(rate) tvc(eng) dftvc(3)

Cure model. stpm2 eng, scale(hazard) df(5) hazard(rate) tvc(eng) dftvc(3) cure

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 85

Page 30: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Fitting a proportional hazards model

Example: 24,889 women aged under 50 diagnosed with breastcancer in England and Wales 1986-1990.

Compare five deprivation groups from most affluent to mostdeprived.

No information on cause of death, but given their age, mostwomen who die will die of their breast cancer.

Proportional hazards models. stcox dep2-dep5,

. stpm2 dep2-dep5, df(5) scale(hazard) eform

The df(5) option implies using 4 internal knots and 2 boundaryknots at their default locations.

The scale(hazard) requests the model to be fitted on the logcumulative hazard scale.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 86

Cox Model

Cox proportional hazards model

. stcox dep2-dep5,failure _d: dead == 1

analysis time _t: survtimeexit on or before: time 5

Iteration 0: log likelihood = -73334.091Iteration 1: log likelihood = -73303.081Iteration 2: log likelihood = -73302.997Iteration 3: log likelihood = -73302.997Refining estimates:Iteration 0: log likelihood = -73302.997Cox regression -- Breslow method for tiesNo. of subjects = 24889 Number of obs = 24889No. of failures = 7366Time at risk = 104638.953

LR chi2(4) = 62.19Log likelihood = -73302.997 Prob > chi2 = 0.0000

_t Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

dep2 1.048716 .0353999 1.41 0.159 .9815786 1.120445dep3 1.10618 .0383344 2.91 0.004 1.03354 1.183924dep4 1.212892 .0437501 5.35 0.000 1.130104 1.301744dep5 1.309478 .0513313 6.88 0.000 1.212638 1.414051

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 87

Flexible parametric proportional hazards model

Flexible Parametric Proportional Hazards Model

. stpm2 dep2-dep5, df(5) scale(hazard) eformIteration 0: log likelihood = -22507.096Iteration 1: log likelihood = -22502.639Iteration 2: log likelihood = -22502.633Iteration 3: log likelihood = -22502.633Log likelihood = -22502.633 Number of obs = 24889

exp(b) Std. Err. z P>|z| [95% Conf. Interval]

xbdep2 1.048752 .0354011 1.41 0.158 .9816125 1.120483dep3 1.10615 .0383334 2.91 0.004 1.033513 1.183893dep4 1.212872 .0437493 5.35 0.000 1.130085 1.301722dep5 1.309479 .0513313 6.88 0.000 1.212639 1.414052_rcs1 2.126897 .0203615 78.83 0.000 2.087361 2.167182_rcs2 .9812977 .0074041 -2.50 0.012 .9668927 .9959173_rcs3 1.057255 .0043746 13.46 0.000 1.048715 1.065863_rcs4 1.005372 .0020877 2.58 0.010 1.001288 1.009472_rcs5 1.002216 .0010203 2.17 0.030 1.000218 1.004218

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 88

Page 31: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Proportional hazards models

The hazard ratios and 95% confidence intervals are very similar.

I have yet to find an example of a proportional hazards model,where there is a large difference in the estimated hazard ratios.

If you are just interested in hazard ratios in a proportionalhazards model, then you can get away with poor modelling ofthe baseline hazard.

One important exception is when the follow-up time differsbetween groups.

It is of course better to model the baseline hazard well!

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 89

Simple predictions

To predict the survival and hazard functions use the folllowing

The predict command. predict survpred, survival

. predict hazpred, hazard

To estimate confidence intervals use the ci option.

To predict for particular covariate patterns use the at() option.

The at() option. predict haz_male_age50, hazard ci at(male 1 age 50)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 90

Simple predictions 2

The zeros option sets the values of all covariates, other thanthose specified in the the at() option, to zero. For example thebaseline survival function can be estimates using.

The zeros option. predict surv_baseline, survival ci zeros

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 91

Page 32: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Log cumulative hazard

−8

−6

−4

−2

0

Pre

dict

ed lo

g cu

mul

ativ

e ha

zard

0 1 2 3 4 5Time from Diagnosis (years)

Least Deprived234Most Deprived

Deprivation Group

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 92

Log Cumulative Hazard vs log(time)

−8

−6

−4

−2

0

Pre

dict

ed lo

g cu

mul

ativ

e ha

zard

1 2 3 4 5Time from Diagnosis (years)

Least Deprived234Most Deprived

Deprivation Group

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 93

Survival Function

.6

.7

.8

.9

1

Pro

port

ion

Aliv

e

0 1 2 3 4 5Time from Diagnosis (years)

Least Deprived234Most Deprived

Deprivation Group

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 94

Page 33: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Hazard Function ×1000

0

25

50

75

100

125

150

Pre

dict

ed M

orta

lity

Rat

e (p

er 1

000

py)

0 1 2 3 4 5Time from Diagnosis (years)

Least Deprived234Most Deprived

Deprivation Group

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 95

Useful predictions

A key advantage of using a parametric model over the Coxmodel is that we can transform the model parameters to expressdifferences between groups in different ways.The hazard ratio is a relative measure and a greaterunderstanding of the impact of an exposure can be obtained byalso looking at absolute differences.For two covariate patterns, x1 and x2 we can obtain

Differences in hazard rates

h(t|x1)− h(t|x2)

Differences in survival functions

S(t|x1)− S(t|x2)

Use the delta-method to calculate confidence intervals.Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 96

Difference in hazard functions

. predict hdiff, hdiff1(dep5 1) hdiff2(dep5 0) ci

0

50

100

150

200

Diff

eren

ce in

mor

talit

y ra

te (

per

1000

per

son

year

s)

0 1 2 3 4 5Time from Diagnosis (years)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 97

Page 34: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Predicted survival functions

0.6

0.7

0.8

0.9

1.0

Pro

port

ion

Aliv

e

0 1 2 3 4 5Time from Diagnosis (years)

Least DeprivedMost Deprived

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 98

Difference in survival proportions

. predict sdiff, sdiff1(dep5 1) sdiff2(dep5 0) ci

−0.10

−0.08

−0.06

−0.04

−0.02

0.00

0.02

Diff

eren

ce in

Sur

viva

l Cur

ves

0 1 2 3 4 5Time from Diagnosis (years)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 99

Modelling excess mortality (relative survival)

Instead of cause-specific mortality we estimate excess mortality:the difference between observed (all-cause) and expectedmortality.

excess = observed − expectedmortality mortality mortality

Relative survival is the survival analog of excess mortality.

Both cause-specific survival and relative survival estimate (underassumptions) the same underlying quantity (net survival) andthe estimates should be similar.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 100

Page 35: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Modelling excess mortality using a step function

for the effect of time

The hazard at time since diagnosis t for persons diagnosed withcancer, h(t), is modelled as the sum of the known baselinehazard, h∗(t), and the excess hazard due to a diagnosis ofcancer, λ(t) [14, 15, 16, 17, 18].

h(t) = h∗(t) + λ(t)

Follow-up time is partitioned into bands corresponding to lifetable intervals and indicator variables included in the designmatrix. The model is written as

h(x) = h∗(x) + exp(xβ) (1)

orln [h(x)− h∗(x)] = xβ.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 101

The proportional excess hazards model

ln [h(x)− h∗(x)] = xβ.

The excess hazard is additive to the expected hazard, but weassume the excess component is a multiplicative function ofcovariates (i.e., proportional excess hazards).

Non-proportional excess hazards are common but can beincorporated by introducing follow-up time by covariateinteraction terms.

We note that h(x)− h∗(x) is excess mortality. We might betempted to calculate the number of excess deaths (trivial since d

and d star are both saved in grouped.do) and use it as theoutcome (eliminating the need for a special link). The problemis that the variance is driven by the number of observed deaths.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 102

Interpreting the parameter estimates

The exponentiated parameter estimates have an interpretationas excess hazard ratios, also known as relative excess risks.

An excess hazard ratio of, for example, 1.5 for males comparedto females implies that the excess hazard associated with adiagnosis of cancer is 50% higher for males than females.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 103

Page 36: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Modelling excess mortality using Poisson regression

The model assumes piecewise constant hazards which implies aPoisson process for the number of deaths in each interval. Wecan therefore estimate the model as a GLM.

We assume that the total number of deaths, dj , for observation jcan be described by a Poisson distribution, dj ∼ Poisson(µj)where µj = λjyj and yj is person-time at risk for the observation.Equation 1 is then written as

ln(µj − d∗j ) = ln(yj) + xβ, (2)

where d∗j is the expected number of deaths (due to causes otherthan the cancer of interest).

This implies a generalised linear model with outcome dj , Poissonerror structure, link ln(µj − d∗j ), and offset ln(yj). This is not astandard link function so the link is defined in rs.ado.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 104

Poisson regression for the colon carcinoma data

When we stset the data we specify all deaths as events.

. stset exit, fail(status==1 2) origin(dx) scale(365.25) id(id)

We use strs to estimate relative survival for each combinationof relevant predictor variables and save the results to a file.

. strs using popmort, br(0(1)10) mergeby(_year sex _age)

> by(sex distant agegrp year8594) notables save(replace)

The save(replace) option requests strs to save two data filesusing default names (grouped.dta and individ.dta) andreplace existing copies of these two files if they exist.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 105

Partial contents of grouped.dta (output by strs)

. use grouped, clear

. list start end n d d_star y ///

> if distant==1 & sex==1 & agegrp==1 & year8594==1

+------------------------------------------+

| start end n d d_star y |

|------------------------------------------|

191. | 0 1 251 140 1.8 168.3 |

192. | 1 2 111 48 0.9 77.8 |

193. | 2 3 56 15 0.6 47.0 |

194. | 3 4 37 4 0.5 33.7 |

195. | 4 5 30 8 0.3 21.6 |

|------------------------------------------|

196. | 5 6 16 2 0.2 14.2 |

197. | 6 7 11 1 0.1 9.0 |

198. | 7 8 7 1 0.1 5.6 |

199. | 8 9 5 0 0.1 4.8 |

200. | 9 10 4 1 0.1 2.7 |

+------------------------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 106

Page 37: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Now fit the model

We now fit the Poisson regression model to the data ingrouped.dta (which contains the observed (d) and expected(d_star) numbers of deaths for each life table interval alongwith person-time at risk (y)).

. use grouped, clear

. glm d i.end distant, fam(pois) l(rs d_star) lnoffset(y) eform

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 107

Estimated excess hazard ratios

---------------------------------------------------------------------

d | exp(b) Std. Err. z P>|z| [95% Conf. Interval]

--------+------------------------------------------------------------

end 1 | 1 (base)

2 | .6363348 .0219954 -13.08 0.000 .5946525 .6809388

3 | .3500617 .0202011 -18.19 0.000 .3126254 .391981

4 | .226715 .0201962 -16.66 0.000 .1903941 .2699648

5 | .192835 .0214866 -14.77 0.000 .1550032 .2399004

6 | .118558 .0228721 -11.05 0.000 .0812303 .1730388

7 | .0771904 .0222247 -8.90 0.000 .043902 .1357194

8 | .0482283 .0212467 -6.88 0.000 .0203381 .1143652

9 | .0466583 .0223545 -6.40 0.000 .0182435 .1193299

10 | .0575149 .026689 -6.15 0.000 .0231628 .1428133

distant | 8.187736 .2666006 64.58 0.000 7.681533 8.727298

_cons | .131474 .0040362 -66.09 0.000 .1237964 .1396277

---------------------------------------------------------------------

We estimate that excess mortality is 8.2 times higher forpatients with distant metastases at diagnosis compared topatients without distant metastases at diagnosis.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 108

Fitted values from the excess mortality model

HR: 8.19

0.131 [exp(_cons)]

0.131*8.19

0.131*8.19*0.64

0.131*8.19*0.35

0.1

.2.3

.4.5

.6.7

.8.9

11.

1E

xces

s ha

zard

(pe

r ye

ar)

0 2 4 6 8 10Years since diagnosis

Not distantDistant

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 109

Page 38: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Can adjust for additional covariates

. glm d i.end distant sex year8594 i.agegrp, fam(pois) ///

link(rs d_star) lnoffset(y) eform

----------------------------------------------------------------------

| OIM

d | exp(b) Std. Err. z P>|z| [95% Conf. Interval]

---------+------------------------------------------------------------

end 1 | 1 (base)

2 | .6582263 .022388 -12.30 0.000 .6157772 .7036016

[output omitted]

10 | .07477 .0243443 -7.97 0.000 .0394989 .1415367

|

distant | 8.008541 .2490144 66.91 0.000 7.535056 8.511779

sex | .9878062 .0272241 -0.45 0.656 .9358634 1.042632

year8594 | .8909376 .0238832 -4.31 0.000 .8453358 .9389994

|

agegrp |

0 | 1 (base)

1 | 1.046824 .0680002 0.70 0.481 .9216818 1.188959

2 | 1.17649 .070505 2.71 0.007 1.046109 1.32312

3 | 1.549778 .0950402 7.14 0.000 1.374262 1.74771

|

_cons | .1149154 .0087708 -28.35 0.000 .0989489 .1334582

----------------------------------------------------------------------

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 110

Interpretation

The variable year8594 is coded as 1 for patients diagnosed1985–1994 and 0 for patients diagnosed 1975–1984.

We see that patients diagnosed in the recent period areestimated to experience 11% lower excess mortality compared tothose diagnosed in the earlier period.

There is evidence that excess mortality decreases with follow-uptime, higher excess mortality in the older age groups, and noevidence of a difference between males and females.

No evidence that the effect of distant metastases at diagnosis isconfounded by sex, age at diagnosis, or period of diagnosis.

Later we will show time can be modelled using a smoothfunction rather than a step function.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 111

Flexible parametric relative survival models

From a practical point of view fitting flexible parametric relativesurvival models is simple.

Relative Survival in stpm2

. stpm2 agegrp2-agegrp4, scale(hazard) df(5) bhazard(rate)

We just add the bhazard(rate) option, where rate is theexpected mortality rate at the event times.

Most modelling issues are similar (or the same) as cause-specificmodels.

e.g., Time-dependent effects (non-proportional excess hazards)are fitted in the same way.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 112

Page 39: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Merging in expected mortality

The expected mortality at the time of death is required.

Make use of stset information to obtain attained age andcalendar year.

merging in population mortality file

. use ew_breast(England and Wales Breast Cancer: All ages). stset survtime, failure(dead==1) exit(time 5) id(ident)

(output omitted ). gen age = int(min(agediag + _t,99)). gen year = int(year(datediag) + _t). merge m:1 sex region dep year age using popmort_UK ///> ,keepusing(rate) keep(match)

Result # of obs.

not matched 0matched 115,331 (_merge==3)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 113

Proportional excess hazards model

. stpm2 agegrp2-agegrp5, scale(hazard) df(7) eform bhazard(rate) nologLog likelihood = -133767.97 Number of obs = 115331

exp(b) Std. Err. z P>|z| [95% Conf. Interval]

xbagegrp2 1.051629 .0182862 2.90 0.004 1.016393 1.088087agegrp3 1.072436 .018162 4.13 0.000 1.037424 1.108631agegrp4 1.410387 .0250455 19.36 0.000 1.362143 1.46034agegrp5 2.649869 .0510512 50.58 0.000 2.551676 2.751841_rcs1 2.343311 .0111576 178.85 0.000 2.321544 2.365282_rcs2 .9680121 .0032421 -9.71 0.000 .9616784 .9743875_rcs3 .9520213 .0018722 -25.00 0.000 .9483589 .9556979_rcs4 1.024994 .0013508 18.73 0.000 1.02235 1.027645_rcs5 1.004471 .0008377 5.35 0.000 1.002831 1.006115_rcs6 1.002511 .0005577 4.51 0.000 1.001419 1.003605_rcs7 1.000378 .0003745 1.01 0.312 .9996448 1.001113

. estimates store rs_hazard

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 114

Predict excess hazards

range temptime 0.003 5 200predict eh ph1, hazard per(1000) zeros timevar(temptime)forvalues i = 2/5 {predict eh ph‘i´, hazard per(1000) at(agegrp‘i´ 1) timevar(temptime) zeros}

Note predict command is same as a cause-specific model.

Prediction is excess hazard (mortality) rate as this is a relativesurvival model.

The use of the timevar option saves time with large datasets(here the prediction is for 200 observations rather than 115,331.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 115

Page 40: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Predicted excess mortality rate

50

100

200

500

750

1000

1500

Exc

ess

Mor

talit

y R

ate

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Years from Diagnosis

<5050−5960−6970−7980+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 116

Time-dependent effects

Fitting time-dependent effects is same as before.

stpm2 agegrp2-agegrp5, scale(hazard) df(7) bhazard(rate) ///tvc(agegrp2-agegrp5) dftvc(3) nolog

As are predictions...

predict eh_tvc1, hazard per(1000) timevar(temptime) zerospredict rs_tvc1, survival timevar(temptime) zerosforvalues i = 2/5 {

predict eh_tvc‘i´, hazard per(1000) at(agegrp‘i´ 1) ///timevar(temptime) zeros

predict rs_tvc‘i´, survival at(agegrp‘i´ 1) ///timevar(temptime) zeros

}

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 117

Predicted excess mortality rate

50

100

200

500

750

1000

1500

Exc

ess

Mor

talit

y R

ate

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Years from Diagnosis

<5050−5960−6970−7980+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 118

Page 41: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Predicted relative survival

0.0

0.2

0.4

0.6

0.8

1.0

Rel

ativ

e S

urvi

val

0 1 2 3 4 5Years from Diagnosis

<5050−5960−6970−7980+

The dots are Ederer II estimates.Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 119

Quantifying differences

Excess mortality rate ratios, excess mortality rate differences anddifferences in relative survival can be easily estimated.

forvalues i = 2/5 {predict ehr‘i´ tvc, hrnum(agegrp‘i´ 1) timevar(temptime) cipredict ehdiff‘i´ tvc, hdiff1(agegrp‘i´ 1) timevar(temptime) ///

per(1000) cipredict rsdiff‘i´ tvc, sdiff1(agegrp‘i´ 1) timevar(temptime) ci

}

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 120

Excess mortality rate ratios

1

2

4

10

203050

Exc

ess

Mor

talit

y R

ate

Rat

io

0 1 2 3 4 5Years from Diagnosis

50-59

1

2

4

10

203050

Exc

ess

Mor

talit

y R

ate

Rat

io

0 1 2 3 4 5Years from Diagnosis

60-69

1

2

4

10

203050

Exc

ess

Mor

talit

y R

ate

Rat

io

0 1 2 3 4 5Years from Diagnosis

70-79

1

2

4

10

203050

Exc

ess

Mor

talit

y R

ate

Rat

io

0 1 2 3 4 5Years from Diagnosis

80+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 121

Page 42: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Excess mortality rate differences

0

200

400

600

Diff

eren

ce in

Exc

ess

Mor

talit

y R

ates

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Years from Diagnosis

50-59

0

200

400

600

Diff

eren

ce in

Exc

ess

Mor

talit

y R

ates

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Years from Diagnosis

60-69

0

200

400

600

Diff

eren

ce in

Exc

ess

Mor

talit

y R

ates

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Years from Diagnosis

70-79

0

200

400

600

Diff

eren

ce in

Exc

ess

Mor

talit

y R

ates

(per

100

0 pe

rson

yea

rs)

0 1 2 3 4 5Years from Diagnosis

80+

Due to very high initial differences, the estimated functions for the 70-79 and80+ age groups are not plotted for the first month.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 122

Differences in relative survival

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

Diff

eren

ce in

Rel

ativ

e S

urvi

val

0 1 2 3 4 5Years from Diagnosis

50-59

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

Diff

eren

ce in

Rel

ativ

e S

urvi

val

0 1 2 3 4 5Years from Diagnosis

70-79

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

Diff

eren

ce in

Rel

ativ

e S

urvi

val

0 1 2 3 4 5Years from Diagnosis

70-79

-0.25

-0.20

-0.15

-0.10

-0.05

0.00

Diff

eren

ce in

Rel

ativ

e S

urvi

val

0 1 2 3 4 5Years from Diagnosis

80+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 123

Estimating relative survival using Stata

As an example, we will use data on patients diagnosed withcolon carcinoma in Finland 1975–94. Potential follow-up to endof 1995.

sex byte sex Sex

age byte Age at diagnosis

stage byte stage Clinical stage at diagnosis

mmdx byte Month of diagnosis

yydx int Year of diagnosis

surv_mm float Survival time in months

surv_yy float Survival time in years

status byte status Vital status at last contact

subsite byte colonsub Anatomical subsite of tumour

year8594 byte year8594 Year of diagnosis 1985-94

agegrp byte agegrp Age in 4 categories

dx int Date of diagnosis

exit int Date of exit

id float Unique ID

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 124

Page 43: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Coding of vital status

. use colon

. codebook status

--------------------------------------------------

status Vital status at last contact

--------------------------------------------------

range: [0,4] units: 1

unique values: 4 missing .: 0/15564

Freq. Numeric Label

4642 0 Alive

8369 1 Dead: cancer

2549 2 Dead: other

4 4 Lost to follow-up

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 125

The population mortality file (popmort.dta)

. list

+----------------------------------------+

| sex _year _age prob rate |

|----------------------------------------|

1. | 1 1951 0 .96429 .0363632 |

2. | 1 1951 1 .99639 .0036165 |

3. | 1 1951 2 .99783 .0021724 |

4. | 1 1951 3 .99842 .0015812 |

5. | 1 1951 4 .99882 .0011807 |

|----------------------------------------|

6. | 1 1951 5 .99893 .0010706 |

7. | 1 1951 6 .99913 .0008704 |

8. | 1 1951 7 .99905 .0009504 |

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 126

The strs command for estimating and modelling

relative survival using Stata

Estimating relative survival.

cohort, period, or hybrid approachchoice of three methods for estimating expected survival(Ederer I, Ederer II, Hakulinen); Pohar Perme estimatorestimation in the presence of competing risks (Cronin and Feuer(2000) [19]).estimates can be standardised (by age for example)saves estimates for subsequent modelling (or presentation intables or graphs)

Modelling excess mortality (relative survival)

several alternative approaches to estimating the model

See Dickman et al. [20] for details.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 127

Page 44: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

An example: localised colon carcinoma

. use colon if stage==1, clear

. stset surv_mm, fail(status==1 2) id(id) scale(12)

. strs using popmort, br(0 0.5 1(1)9) mergeby(_year sex _age) by(sex)

-> sex = Male

+------------------------------------------------------------------------+

|interval n d w p p_star r cp cp_e2 cr_e2 |

|------------------------------------------------------------------------|

| 0 .5 2620 229 0 0.9126 0.9728 0.9381 0.9126 0.9728 0.9381 |

| .5 1 2391 99 0 0.9586 0.9749 0.9833 0.8748 0.9484 0.9224 |

| 1 2 2292 229 166 0.8963 0.9483 0.9452 0.7841 0.8993 0.8719 |

| 2 3 1897 180 139 0.9015 0.9470 0.9519 0.7069 0.8517 0.8300 |

| 3 4 1578 140 119 0.9078 0.9449 0.9607 0.6417 0.8048 0.7974 |

|------------------------------------------------------------------------|

| 4 5 1319 113 104 0.9108 0.9428 0.9660 0.5845 0.7588 0.7703 |

| 5 6 1102 102 81 0.9039 0.9414 0.9601 0.5283 0.7143 0.7396 |

| 6 7 919 71 71 0.9196 0.9409 0.9774 0.4859 0.6721 0.7229 |

| 7 8 777 59 72 0.9204 0.9391 0.9800 0.4472 0.6312 0.7084 |

| 8 9 646 49 62 0.9203 0.9380 0.9811 0.4115 0.5921 0.6950 |

+------------------------------------------------------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 128

Syntax of the strs command

strs using filename[

if] [

in] [

iweight=varname], breaks(numlist

ascending) mergeby(varlist)[by(varlist) diagage(varname)

diagyear(varname) attage(newvarname) attyear(newvarname)

survprob(varname) maxage(int 99) standstrata(varname) brenner

list(varlist) potfu(varname) format(%fmt) pohar ederer1 notables

level(int) save[(replace)

]savind(filename

[, replace

])

savgroup(filename[, replace

])]

the patient data file must be stset using the id() option withtime since entry in years as the timescale before using strs

using filename specifies a file containing general populationsurvival probabilities sorted by the variables specified inmergeby().

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 129

Life table quantities calculated by strs

start Start of life table interval

end End of life table interval

n Number alive at start

d Number of deaths during the interval

d_star Expected number of deaths

ns Number of survivors

w Withdrawals (censorings) during the interval

n_prime Effective number at risk

y Person-time at risk

p Interval-specific observed survival

se_p Standard error of P

lo_p Lower 95% CI for P

hi_p Upper 95% CI for P

p_star Interval-specific expected survival (Ederer II)

r Interval-specific relative survival (Ederer II)

se_r Standard error of R

lo_r Lower 95% CI for R

hi_r Upper 95% CI for R

cp Cumulative observed survival

se_cp Standard error of CP

lo_cp Lower 95% CI for CP

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 130

Page 45: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

Life table quantities calculated by strs 2

hi_cp Upper 95% CI for CP

nu Estimated excess mortality rate, (d-d_star)/y

cp_e1 Cumulative expected survival (Ederer I)

cr_e1 Cumulative relative survival (Ederer I)

lo_cr_e1 Lower 95% CI for CR (Ederer I)

hi_cr_e1 Upper 95% CI for CR (Ederer I)

cp_e2 Cumulative expected survival (Ederer II)

cr_e2 Cumulative relative survival (Ederer II)

lo_cr_e2 Lower 95% CI for CR (Ederer II)

hi_cr_e2 Upper 95% CI for CR (Ederer II)

cp_hak Cumulative expected survival (Hakulinen)

cr_hak Cumulative relative survival (Hakulinen)

lo_cr_hak Lower 95% CI for CR (Hakulinen)

hi_cr_hak Upper 95% CI for CR (Hakulinen)

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 131

Estimates can be saved to a file

. use colon if stage==1, clear

. stset surv_mm, fail(status==1 2) id(id) scale(12)

. strs using popmort, br(0(1)10) mergeby(_year sex _age) by(sex agegrp) save

. use grouped, clear

. gen n0=n[_n-4]

. list sex agegrp n0 cp cr_e2 lo_cr_e2 hi_cr_e2 if end==5, sepby(sex) noobs

+----------------------------------------------------------------+

| sex agegrp n0 cp cr_e2 lo_cr_e2 hi_cr_e2 |

|----------------------------------------------------------------|

| Male 0-44 161 0.7737 0.7881 0.7102 0.8486 |

| Male 45-59 462 0.7686 0.8233 0.7766 0.8636 |

| Male 60-74 1228 0.5945 0.7512 0.7128 0.7878 |

| Male 75+ 769 0.4131 0.7777 0.7067 0.8479 |

|----------------------------------------------------------------|

| Female 0-44 136 0.7657 0.7709 0.6866 0.8358 |

| Female 45-59 531 0.7765 0.7953 0.7536 0.8314 |

| Female 60-74 1488 0.6993 0.7873 0.7588 0.8141 |

| Female 75+ 1499 0.4854 0.7816 0.7374 0.8249 |

+----------------------------------------------------------------+

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 132

References

[1] Jatoi I, Anderson WF, Jeong JH, Redmond CK. Breast cancer adjuvant therapy: time toconsider its time-dependent effects. J Clin Oncol 2011;29:2301–2304.

[2] Hernan MA. The hazards of hazard ratios. Epidemiology 2010;21:13–15.

[3] Manson JE, Hsia J, Johnson KC, Rossouw JE, Assaf AR, Lasser NL, et al.. Estrogen plusprogestin and the risk of coronary heart disease. N Engl J Med 2003;349:523–534.

[4] Sasieni PD. Proportional excess hazards. Biometrika 1996;83:127–141.

[5] Pohar Perme M, Henderson R, Stare J. An approach to estimation in relative survivalregression. Biostatistics 2009;10:136–146.

[6] Cox DR. Regression models and life-tables (with discussion). JRSSB 1972;34:187–220.

[7] Durrleman S, Simon R. Flexible regression models with cubic splines. Statistics inMedicine 1989;8:551–561.

[8] Royston P, Parmar MKB. Flexible parametric proportional-hazards and proportional-oddsmodels for censored survival data, with application to prognostic modelling and estimationof treatment effects. Statistics in Medicine 2002;21:2175–2197.

[9] Andersson TML, Dickman PW, Eloranta S, Lambert PC. Estimating and modelling curein population-based cancer studies within the framework of flexible parametric survivalmodels. BMC Med Res Methodol 2011;11:96.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 133

Page 46: Session 15 Modelling net survival - Paul Dickman 15 Modelling net survival Paul W Dickman 1 and Paul C Lambert 1 ;2 1 Department of Medical Epidemiology and Biostatistics, Karolinska

References 2

[10] Lambert PC, Royston P. Further development of flexible parametric models for survivalanalysis. The Stata Journal 2009;9:265–290.

[11] Royston P, Lambert PC. Flexible parametric survival analysis in Stata: Beyond the Coxmodel . Stata Press, 2011.

[12] Reid N. A conversation with Sir David Cox. Statistical Science 1994;9:439–455.

[13] Rutherford MJ, Hinchliffe SR, Abel GA, Lyratzopoulos G, Lambert PC, Greenberg DC.How much of the deprivation gap in cancer survival can be explained by variation in stageat diagnosis: An example from breast cancer in the east of england. International Journalof Cancer 2013;.

[14] Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression models for relative survival.Stat Med 2004;23:51–64.

[15] Esteve J, Benhamou E, Croasdale M, Raymond L. Relative survival and the estimation ofnet survival: elements for further discussion. Statistics in Medicine 1990;9:529–538.

[16] Hakulinen T, Tenkanen L, Abeywickrama K, Paivarinta L. Testing equality of relativesurvival patterns based on aggregated data. Biometrics 1987;43:313–325.

[17] Berry G. The analysis of mortality by the subject-years method. Biometrics 1983;39:173–184.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 134

References 3

[18] Pocock S, Gore S, Kerr G. Long term survival analysis: the curability of breast cancer.Stat Med 1982;1:93–104.

[19] Cronin KA, Feuer EJ. Cumulative cause-specific mortality for cancer patients in thepresence of other causes: a crude analogue of relative survival. Statistics in Medicine2000;19:1729–1740.

[20] Dickman PW, Coviello E, Hills M. Estimating and modelling relative survival. The StataJournal 2012;(in press). http://pauldickman.com/survival/strs.pdf.

Dickman and Lambert Population-Based Cancer Survival LSHTM, July 2014 135