9. estimating survival distribution for a ph model
TRANSCRIPT
9. Estimating Survival Distribution for a PH Model
Objective:
โข Estimating the survival distribution for individuals with a certain combination of covariates.
PH model assumption:
โข ๐ ๐ก ๐ง = ๐0(๐ก)exp(๐ฝ๐๐)
โข Given Z=๐งโ, it is easily to derive the relationship between S(t|z) and covariates as the following function:
๐ ๐ก ๐งโ = ๐โ 0๐ก๐ ๐ข ๐งโ ๐๐ข = ๐โ 0
๐ก๐0 ๐ข ๐ง
โ ๐๐ฅ๐(๐ฝ๐๐งโ)๐ ๐ข = ๐โ๐๐ฅ๐(๐ฝ๐๐งโ)๐ฌ0(๐ก)
This means in order to estimate S(t|๐ = ๐งโ), we only need to estimate ๐ฝ and ฮ0(๐ก), where ๐ฝ can be estimated by MPLE.
Another Goal of the COX model
Estimating ๐ฆ๐(๐ญ)
โข The same logic of deriving Nelson-Aalen estimate of the cumulative hazard function in one sample problem will be used.
โข Nelson-Aalen estimate for ฮ ๐ก = ฯ๐ฅ<๐ก๐๐(๐ฅ)
๐(๐ฅ)
โข In the one-sample problem, all individuals in the sample have the same hazard of failing, implying the same cause-specific hazard. However, in a proportional hazard model, the individuals in the sample do not have hazard of failing at time x but rather have a hazard which depends on their covariate values. That is, for ๐๐กโ
individual with covariate ๐๐ = (๐๐1 , โฆ , ๐๐๐)๐, has hazard
๐๐ ๐ก = ๐0(๐ก)exp(๐ฝ๐๐๐)
Estimating ๐ฆ๐(๐ญ) โข ๐๐๐ ๐ฅ |๐น ๐ฅ ~Bin ๐๐ x , ฯ๐ x ,where ฯ๐ x โ ๐๐(๐ฅ)ฮ๐ฅ
โ ๐ธ[๐๐๐ ๐ฅ |๐น(๐ฅ)] = ๐๐๐๐(๐ฅ)ฮ๐ฅ
= ๐0(๐ฅ)exp(๐ฝ๐๐๐) ๐๐ฮ๐ฅ
โข ๐๐ ๐ฅ = ฯ๐=1๐ ๐๐๐ ๐ฅ
โ ๐ธ ๐๐ ๐ฅ ๐น ๐ฅ = ๐ธ[ฯ๐=1๐ ๐๐๐ ๐ฅ |๐น(๐ฅ)]
= ฯ๐=1๐ ๐ธ[๐๐๐ ๐ฅ |๐น(๐ฅ)]
= ฯ๐=1๐ ๐0(๐ฅ)exp(๐ฝ
๐๐๐) ๐๐ฮ๐ฅ
= ๐0(๐ฅ) ฮ๐ฅ ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐(๐ฅ)
โข Therefore we estimate ๐0(๐ฅ) ฮ๐ฅ by using
๐๐(๐ฅ)
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐(๐ฅ)
Estimating ๐ฆ๐(๐ญ)
โข ฮ0(๐ก) โ ฯ๐ฅ<๐ก ๐0(๐ฅ) ฮ๐ฅ
โ ฮ0 (๐ก) = ฯ๐ฅ<๐ก๐๐(๐ฅ)
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐(๐ฅ)
โข Specific, if all the ๐ฝโฒ๐ were equal to zero, then the
previous formula would reduce to ฯ๐ฅ<๐ก๐๐(๐ฅ)
๐(๐ฅ), giving
us back the Nelson-Aalen estimator.
Property of ๐ฆ๐(๐ญ)โข ฮ0(๐ก)is approximately unbiased for ฮ0(๐ก)
Proof: ๐ธ(ฯ๐ฅ<๐ก(๐๐(๐ฅ)
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐(๐ฅ)
))
= ฯ๐ฅ<๐ก๐ธ[๐๐(๐ฅ)
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐(๐ฅ)
]
= ฯ๐ฅ<๐ก๐ธ{๐ธ๐๐ ๐ฅ
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ
๐น ๐ฅ }
since ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ is fixed conditional on F(x)
๐ธ๐๐ ๐ฅ
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ
๐น ๐ฅ =๐ธ[๐๐(๐ฅ)|๐น(๐ฅ)]
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ
=๐ธ(ฯ๐๐๐ ๐ฅ |๐น(๐ฅ))
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ
=ฯ๐0(๐ฅ)exp(๐ฝ
๐๐๐) ๐๐ฮ๐ฅ
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ
= ๐0 ๐ฅ ฮ๐ฅ
๐๐, ๐ฅ<๐ก
๐ธ{๐ธ๐๐ ๐ฅ
ฯ๐=1๐ exp(๐ฝ๐๐๐) ๐๐ ๐ฅ
๐น ๐ฅ } =๐ฅ<๐ก
๐0 ๐ฅ ฮ๐ฅ โ ฮ0 (๐ก)
Estimate Survival Distribution
โข Estimate the survival distribution forindividuals with a certain combination ofcovariates ๐ง0 (for a randomly sampled subject).
โข ๐ ๐ก ๐ = ๐0 ๐ก ๐๐ฅ๐ ๐ฝ๐๐
โข ๐ ๐ก ๐ = ๐ง0 = ๐0 ๐ก ๐๐ฅ๐ ๐ฝ๐๐ง0
Estimate Survival Function
โข ๐ ๐ก ๐ = ๐ง0 = ๐โฮ ๐ก ๐ = ๐ง0
โข ฮ ๐ก ๐ = ๐ง0 = 0๐ก๐0 ๐ก ๐๐ฅ๐ ๐ฝ๐๐ง0 ๐๐ข
= ๐๐ฅ๐ ๐ฝ๐๐ง0 เถฑ0
๐ก
๐0 ๐ก ๐๐ข = ๐๐ฅ๐ ๐ฝ๐๐ง0 ฮ0 ๐ก
โข แ๐ ๐ก ๐ = ๐ง0 = ๐โ๐๐ฅ๐๐ฝ๐๐ง0 ฮ0 ๐ก
( แ๐ฝ is the MPLE of ๐ฝ)
แ๐ ๐ก ๐ = ๐0โข Asymptotic Gaussian distribution
โข ๐ธ แ๐ ๐ก ๐ = ๐0 โ๐๐ ๐ก ๐ = ๐0
โข เท๐ฃ๐๐ แ๐ ๐ก ๐ = ๐0 = แ๐2 ๐ก ๐ = ๐0 ๐2๐ฝ๐๐ง0 ๐1 ๐ก + ๐2 ๐ก; ๐0
โข ๐1 ๐ก = ฯ๐ก๐
๐๐
๐2 ๐ก๐;๐ฝ, where ๐๐ is the number of deaths at time ๐ก๐ ,
๐ ๐ก๐; แ๐ฝ = ฯ๐๐๐ ๐ก๐๐๐ฝ๐๐ง๐, ๐ ๐ก๐ = {๐|๐๐ ๐ก๐ = 1}
โข ๐2 ๐ก; ๐0 = ๐3 ๐ก; ๐0๐ก ๐๐๐ แ๐ฝ ๐3 ๐ก; ๐0
๐3 ๐ก; ๐0 =
ฯ๐ก๐โค๐ก๐ 1 ๐ก๐;๐ฝ
๐ ๐ก๐;๐ฝโ ๐01
๐๐
๐ ๐ก๐;๐ฝ
โฎ
ฯ๐ก๐โค๐ก๐ ๐ ๐ก๐;๐ฝ
๐ ๐ก๐;๐ฝโ ๐0๐
๐๐
๐ ๐ก๐;๐ฝ
, ๐ ๐ ๐ก๐; แ๐ฝ = ฯ๐๐๐ ๐ก๐๐๐๐๐
๐ฝ๐๐ง๐
Link CL. Confidence intervals for the survival function using Cox's proportional-hazard model with covariates. Biometrics. 1984, 40(3):601-9.
Note:
โข ๐2 reflects the uncertainty in the estimation process
โข ๐3 is large when ๐ง0 is far from the average covariate in the risk set
โข Confidence interval of Survival Function
แ๐ ๐ก ๐ = ๐0 ยฑ ๐ง๐ผ2
เท๐ฃ๐๐ แ๐ ๐ก ๐ = ๐0
Example
โข Data on 90 males with larynx cancer
โข Variables โ
โข Stage of disease (stages 1 to 4)
โข Age at diagnosis of larynx cancer
โข Time of death or on-study time in months
โข Year of diagnosis of larynx cancer
โข Death Indicator (0=alive, 1=dead)
See SAS output
R Codelibrary(survival);larynx <- read.table(file="data_chap9_larynx.txt", skip=11,
col.names=c("stage", "time", "age", "year", "status"));ageCat = larynx$age>60;stage.age = as.factor(larynx$stage+(larynx$age>60)*4);larynx = cbind(larynx, ageCat, cat=stage.age);larynx = larynx[order(larynx$stage, larynx$ageCat), ]larynx.ph = coxph(Surv(time,status) ~ stage+ageCat, data=larynx, ties='breslow')
stage.age2 = larynx[match(levels(larynx$cat), larynx$cat),c("stage", "ageCat")];s <- summary(survfit(larynx.ph, newdata=stage.age2));
cols=rep(c("black","red","green", "blue"),2); ltys = rep(c(1,2), each=4);plot(0,0,type="n", xlab="Survival time",ylab="Survival probabilities",xlim=c(0,8),ylim=c(0,1))for (i in 1:8) lines(s$time, s$surv[,i], lty=ltys[i], col=cols[i]);legend("bottomleft", legend=paste("Cat ",1:8),lty=ltys, col = cols, cex=0.9,title.adj=0.2);