math 659: survival analysis - web.njit.edu

32
Math 659: Survival Analysis Chapter 4 — Nonparametric Estimation of Basic Quantities (II) Wenge Guo September 14, 2011 Wenge Guo Math 659: Survival Analysis

Upload: others

Post on 13-Nov-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Math 659: Survival Analysis - web.njit.edu

Math 659: Survival AnalysisChapter 4 — Nonparametric Estimation of Basic Quantities (II)

Wenge Guo

September 14, 2011

Wenge Guo Math 659: Survival Analysis

Page 2: Math 659: Survival Analysis - web.njit.edu

Simultaneous Confidence Bands

I Pointwise confidence intervals are only valid for a singlefixed time at which the inference is to be made

I In some application, it is desirable to find a confidenceband that for a given confidence level, the survival functionfalls within the band for all t in some interval,

I That is, we want to find two random functions L(t) andU(t), so that

1− α = Pr[L(t) ≤ S(t) ≤ U(t), for all tL ≤ t ≤ tU ]

I [L(t), U(t)] is called a 100(1− α)% simultaneousconfidence band for S(t)

I Two approaches for constructing SCB: Equal probability(EP) bands and Hall-Wellner bands

17/30

Page 3: Math 659: Survival Analysis - web.njit.edu

EP Band

I The bounds of the EP band are proportional to thepointwise confidence intervals, by Nair (1984)

I The CB is constructed for tL ≤ t ≤ tUI Denote aL =

nσ2S(tL)

1+nσ2S(tL)

, and aU =nσ2

S(tU)

1+nσ2S(tU)

I The key thing is to compute the confidence coefficient,cα(aL, aU)

I The critique values are computed from functions related toBrownian bridge

I Linear EP CB:[S(t)− cα(aL, aU)σS(t)S(t), S(t) + cα(aL, aU)σS(t)S(t)

]I The log-log transformed EP CB:

[S(t)1/θ, S(t)θ

]where

θ = exp{

cα(aL,aU)σS(t)ln[bS(t)]

}18/30

Page 4: Math 659: Survival Analysis - web.njit.edu

Hall-Wellner (HW) Band

I Suggested by Hall and Wellner (1980)I The bounds of the HW band are not proportional to the

pointwise confidence intervalsI The critique value is kα(aL, aU)

I Linear HW CB:[S(t)− kα(aL,aU)[1+nσS(t)]

n1/2 S(t), S(t) + kα(aL,aU)[1+nσS(t)]n1/2 S(t)

]I The log-log transformed HW CB:

[S(t)1/θ, S(t)θ

]where

θ = exp{

kα(aL,aU)[1+nσS(t)]n1/2 ln[bS(t)]

}

19/30

Page 5: Math 659: Survival Analysis - web.njit.edu

Example and Computing

I Suppose we want to construct confidence bands for S(t)over the range 100 ≤ t ≤ 600

I S(t) is the disease-free survival function for ALL patientsI library(km.ci)I fit.km.bmt.all.ep=km.ci(fit.km.bmt.all,tl=100,

tu=600,method="epband")

fit.km.bmt.all.eplog=km.ci(fit.km.bmt.all,tl=100,tu=600,method="logep")

fit.km.bmt.all.hw=km.ci(fit.km.bmt.all,tl=100,tu=600,method="hall-wellner")

fit.km.bmt.all.hwlog=km.ci(fit.km.bmt.all,tl=100,tu=600,method="loghall")

20/30

Page 6: Math 659: Survival Analysis - web.njit.edu

Plot for SCB

21/30

Page 7: Math 659: Survival Analysis - web.njit.edu

Plot for SCB Using the log-log Transformation

22/30

Page 8: Math 659: Survival Analysis - web.njit.edu

Remarks

I For the EP bounds, it is shown that the linear confidenceband performs very poorly when the sample size is small(≤ 200). The coverage probability for this bound isconsiderably smaller

I For the log-log one, the coverage probability isapproximately correct for smaller sample sizes, whichgives reasonable results for samples with as few as 20observed events

I For the Hall-Wellner bounds, it is shown that all bands forS(t) perform reasonably well for samples with as few as 20observed events

I Confidence bands for the cumulative hazard rate can alsobe constructed by either the EP or Hall-Wellner method

23/30

Page 9: Math 659: Survival Analysis - web.njit.edu

S(t) for Left Truncated and Right Censored Data

I Modify KM estimator to handle left-truncated andright-censored data

I Assume individual j , at a random age Lj , he/she enters thestudy

I At a time Tj at which he/she either dies or is censoredI Define t1 < t2 < · · · < tD as the distinct death timesI Let di be the number of individuals who experience the

event of interest at time tiI Define Yi as the number of individuals who entered the

study prior to time ti and who have a study time of at leastti , that is, Yi is the number of individuals with Lj ≤ ti ≤ Tj

I For right-censored data, Yi was the number of individualson study at time 0 with a study time of at least ti

24/30

Page 10: Math 659: Survival Analysis - web.njit.edu

The Risk Set and the Problem

I KM estimator of the sf at a time t is now an estimator of theprobability of survival beyond t , conditional on survival tothe smallest of the entry times L, that is,Pr[X > t |X ≥ L]| = S(t)/S(L)

I For left-truncated, it is possible for Yi to be quite small forsmall values of ti

I If Yi and di are equal, the KM estimator will be zero for all tbeyond this point, even though we are observing survivorsand deaths beyond this point

I It is common to estimate the sf conditional on survival to atime where this will not happen by considering only thosedeath times beyond this point

I Example: Consider the Channing House data described insection 1.16

25/30

Page 11: Math 659: Survival Analysis - web.njit.edu

Risk Set as a Function of Time

26/30

Page 12: Math 659: Survival Analysis - web.njit.edu

The Estimator and Interpretation

I Rather than estimating the unconditional survival function,we estimate the conditional probability of surviving beyondage t , given survival to age a

I We estimate by considering only those deaths (or events)that occur after age a, that is,Sa(t) =

∏a≤ti≤t

[1− di

Yi

], t ≥ a

I The estimates have to be interpreted as the estimatedconditional survival probability

I Similarly for Greenwood’s formula, only deaths beyond aare considered

I Example: estimate the probability of surviving beyond aget , given survival to 68 or 80 years for both males andfemales

27/30

Page 13: Math 659: Survival Analysis - web.njit.edu

Estimated Conditional Survival Function

28/30

Page 14: Math 659: Survival Analysis - web.njit.edu

Computing

I > library(KMsurv)> data(channing)> channing

obs death ageentry age time gender1 1 1 1042 1172 130 22 2 1 921 1040 119 23 3 1 885 1003 118 2......

I Fit the KM for the Male with entry age greater than 68> channing=channing[channing[,"ageentry"]<channing[,"age"],]> fit=survfit(Surv(ageentry,age,death),data=channing,+ subset=(gender==1 & ageentry>=12*68))> plot(fit)> plot(fit,xlim=c(68*12,1200))

29/30

Page 15: Math 659: Survival Analysis - web.njit.edu

KM Curve

850 900 950 1000 1050 1100 1150 1200

0.0

0.2

0.4

0.6

0.8

1.0

30/30

Page 16: Math 659: Survival Analysis - web.njit.edu

Estimation of sf for Left, Double, and IntervalCensoring

I Chapter 5 discusses how to estimate the sf for othersampling schemes; left, double, and interval censoring,right-truncation, and grouped data

I Each sampling scheme provides different informationabout the survival function and requires a differenttechnique for estimation

I We focus on estimation of sf for left, double, and intervalcensoring

I Left censoring, censored individuals provide informationindicating only that the event has occurred prior to entryinto the study

I Double-censored samples include some individuals that areleft-censored and some individuals that are right-censored

I Interval censoring, where individual event times are knownto occur only within an interval

2/36

Page 17: Math 659: Survival Analysis - web.njit.edu

Self Consistency Algorithm

I Estimate the sf for right censored data (Details onTheoretical Notes 3, page 102)

I If there is no censored observations, the estimator of the sfat a time t is the proportion of observations which arelarger than t

I That is,

S(t) =1n

n∑i=1

φ(Xi)

where

φ(y) =

{1 if y > t0 if y ≤ t

3/36

Page 18: Math 659: Survival Analysis - web.njit.edu

The Idea

I For right-censored data, construct the estimator in a similarmanner by redefining the scoring function φ(X )

I Let T1,T2, · · · ,Tn be the observed times on studyI If Ti is a death time, we know with certainty whether Ti is

smaller or greater than tI If Ti is a censored time greater than or equal to t , we know

that the true death time must be larger than tI For a censored observation less than t , we do not know if

the corresponding death time is greater than t because itcould fall between Ti and t

I If we knew S(t), we could estimate the probability of thiscensored observation being larger than t by S(t)/S(Ti)

4/36

Page 19: Math 659: Survival Analysis - web.njit.edu

The Algorithm

I An estimator S(t) is a self-consistent if

S(t) =1n

∑Ti>t

φ(Ti) +∑

δi=0,Ti≤t

S(t)

S(Ti)

(1)

I To find S(t)I starts with any estimate of S and substitutes this in the right

hand side of (1) to get an updated estimate of SI this new estimate of S(t) is used in the next step to obtain a

revised estimateI continues until convergence

I Efron (1967) shows that the final estimate of S is exactlythe Product-Limit estimator for t less than the largestobserved time

5/36

Page 20: Math 659: Survival Analysis - web.njit.edu

Left and Right Censoring Case

I Use a modified Product-Limit estimator which wassuggested by Turnbull(1974)

I This estimator is based on an iterative procedure whichextends the notion of a self-consistent estimator

I NotationI Denote a grid of time points by 0 < t0 < t1 < t2 < · · · < tm at

which subjects are observedI Note here the ti ’s are not all event timesI di be the number of deaths at time tiI di may be zero for some pointsI ri be the number of individuals right-censored at time tiI ci be the number of left-censored observations at ti

6/36

Page 21: Math 659: Survival Analysis - web.njit.edu

The Idea

I The only information the left-censored observations at tigive us is that the event of interest has occurred at sometj ≤ ti

I The self-consistent estimator estimates the probability thatthis event occurred at each possible tj less than ti basedon an initial estimate of the survival function

I Using this estimate, we compute an expected number ofdeaths at tj , which is then used to update the estimate ofthe survival function

I The procedure is repeated until the estimated survivalfunction stabilizes

7/36

Page 22: Math 659: Survival Analysis - web.njit.edu

The Algorithm

I Step 0: Produce an initial estimate of the survival functionat each tj , S0(tj)

I Turnbull suggested use the Product-Limit estimateobtained by ignoring the left-censored observations

I Step 1: Using the current estimate of S, estimatepij = Pr[tj−1 < X ≤ tj |X ≤ ti ] =

SK (tj−1)−SK (tj )1−SK (ti )

, for j ≤ iI Step 2: Using the results of the previous step, estimate the

number of events at time tj by dj = dj +∑m

i=j cipij

I Step 3: Compute the usual Product-Limit estimator basedon the estimated right-censored data with dj events and rjright-censored observations at tj , ignoring theleft-censored data

I If this estimate, SK+1(t), is close to SK (t) for all ti , stop theprocedure; if not, go to step 1

8/36

Page 23: Math 659: Survival Analysis - web.njit.edu

Example

I To illustrate Turnbull’s algorithm, consider the data insection 1.17 on the time at which California high schoolboys first smoked marijuana

I Here left censoring occurs when boys respond that theyhave used marijuana but can not recall the age of first use

I Right-censored observations occur when boys have neverused marijuana

9/36

Page 24: Math 659: Survival Analysis - web.njit.edu

First-Use-of-Marijuana Dataset

10/36

Page 25: Math 659: Survival Analysis - web.njit.edu

Computing of pij ’s in Step 1

11/36

Page 26: Math 659: Survival Analysis - web.njit.edu

First Step of the Self-Consistency Algorithm

12/36

Page 27: Math 659: Survival Analysis - web.njit.edu

Interval Censoring

I For interval-censored data, the only information we havefor each individual is that their event time falls in an interval(Li ,Ri ], i = 1, · · · ,n but the exact time is unknown

I An estimate of the sf can be found by a modification ofabove iterative procedure as proposed by Turnbull (1976)

I Let 0 = τ0 < τ1 < · · · < τm be a grid of time points whichincludes all the points Li ,Ri for i = 1, · · · ,n

I For the i th observation, define a weight αij to be 1 if theinterval (τj−1, τj ] is contained in the interval (Li ,Ri ], and 0otherwise

I τij is an indicator of whether the event which occurs in theinterval (Li ,Ri ] could have occurred at τj

13/36

Page 28: Math 659: Survival Analysis - web.njit.edu

The Algorithm

I An initial guess at S(τj) is madeI Step 1: Compute the probability of an event’s occurring at

time τj , pj = S(τj−1)− S(τj), j = 1, · · · ,mI Step 2: Estimate the number of events which occurred at τj

by dj =∑n

i=1αij pj∑k αik pk

I Note the denominator is the total probability assigned topossible event times in the interval (Li ,Ri ]

I Step 3: Compute the estimated number at risk at time τj byYj =

∑mk=j dk

I Step 4: Compute the updated Product-Limit estimatorusing the pseudo data found in steps 2 and 3

I Stop if the updated estimate of S is close to the old versionof S for all τi ’s

14/36

Page 29: Math 659: Survival Analysis - web.njit.edu

Example

I Consider the data on time to cosmetic deterioration forearly breast cancer patients (Details in section 1.18)

I The goal of the study is to compare the cosmetic effects ofradiotherapy alone versus radiotherapy and adjuvantchemotherapy on women with early breast cancer

I 46 patients in the radiation only group and 48 in theradiation plus chemotherapy

I At each visit, the clinician recorded a measure of breast ona three-point scale (none, moderate, severe).

I The event of interest was the time to first appearance ofmoderate or severe breast retraction

15/36

Page 30: Math 659: Survival Analysis - web.njit.edu

The Data

16/36

Page 31: Math 659: Survival Analysis - web.njit.edu

Estimation of Survival Curves Using Interval-CensoredData

17/36

Page 32: Math 659: Survival Analysis - web.njit.edu

Remarks

I For the case of combined left and right censoring, Turnbull(1974) gives an estimator of the variance-covariancematrix of S(t) (Details in Practical Note 2, page 146 of[KM])

I The estimator of the survival function, based on Turnbull’salgorithms for combined left and right censoring or forinterval censoring, are generalized maximum likelihoodestimators

I They can be derived by a self-consistency argument or byusing a modified EM algorithm

18/36