survival analysis bandit thinkhamrop, phd. (statistics) department of biostatistics and demography...
TRANSCRIPT
Survival AnalysisSurvival Analysis
Bandit Thinkhamrop, PhD. (Statistics)Bandit Thinkhamrop, PhD. (Statistics)Department of Biostatistics and DemographyDepartment of Biostatistics and Demography
Faculty of Public Health, Khon Kaen UniversityFaculty of Public Health, Khon Kaen University
Begin at the conclusionBegin at the conclusion
7
Begin at the conclusionBegin at the conclusion
Type of the study outcome: Key for selecting Type of the study outcome: Key for selecting appropriate statistical methodsappropriate statistical methods
• Study outcomeStudy outcome• Dependent variable or response variableDependent variable or response variable• Focus on primary study outcome if there are moreFocus on primary study outcome if there are more
• Type of the study outcomeType of the study outcome• ContinuousContinuous• Categorical (dichotomous, polytomous, ordinal)Categorical (dichotomous, polytomous, ordinal)• Numerical (Poisson) countNumerical (Poisson) count• Event-free durationEvent-free duration
The outcome determine statisticsThe outcome determine statistics
Continuous Categorical Count Survival
MeanMedian
Proportion(PrevalenceOrRisk)
Rate per “space”
Median survivalRisk of events at T(t)
Linear Reg. Logistic Reg. Poisson Reg. Cox Reg.
Statistics quantify errors for judgmentsStatistics quantify errors for judgmentsParameter estimation
[95%CI]
Hypothesis testing[P-value]
Parameter estimation[95%CI]
Hypothesis testing[P-value]
Back to the conclusionBack to the conclusion
Continuous Categorical Count Survival
Magnitude of effect95% CIP-value
Magnitude of effect95% CIP-value
MeanMedian
Proportion(Prevalence or Risk)
Rate per “space”
Median survivalRisk of events at T(t)
Answer the research questionbased on lower or upper limit of the CI
Appropriate statistical methods
Study outcomeStudy outcome• Survival outcome = event-free durationSurvival outcome = event-free duration• Event (1=Yes; 0=Censor)Event (1=Yes; 0=Censor)• Duration or length of time between:Duration or length of time between:
• Start date ()Start date ()• End date ()End date ()
• At the start, no one had event (event = 0) at time At the start, no one had event (event = 0) at time tt(0)(0)
• At any point since the start, event could occur, hence, failure (event = 1) at At any point since the start, event could occur, hence, failure (event = 1) at time time tt(t)(t)
• At the end of the study period, if event did not occur, hence, censored At the end of the study period, if event did not occur, hence, censored (event = 0)(event = 0)
• Thus, the duration could be either ‘time-to-event’ or ‘time-to-censoring’Thus, the duration could be either ‘time-to-event’ or ‘time-to-censoring’
CensoringCensoring• Censored data = incomplete ‘time to event’ dataCensored data = incomplete ‘time to event’ data
• In the present of censoring, the ‘time to event’ is not knownIn the present of censoring, the ‘time to event’ is not known
• The duration indicates there has been no event occurred since the start The duration indicates there has been no event occurred since the start date up to last date assessed or observed, a.k.a., the end date.date up to last date assessed or observed, a.k.a., the end date.
• The end date could beThe end date could be
• End of the studyEnd of the study
• Last observed prior to the end of the study due toLast observed prior to the end of the study due to• Lost to follow-upLost to follow-up
• Withdrawn consentWithdrawn consent
• Competing events occurred, prohibiting progression to the event under observationCompeting events occurred, prohibiting progression to the event under observation
• Explanatory variables changed, irrelevance to occurrence of event under observation Explanatory variables changed, irrelevance to occurrence of event under observation
Magnitude of effects
• Median survival• Survival probability• Hazard ratio
SURVIVAL ANALYSISSURVIVAL ANALYSIS
Study aims:• Median survival
• Median survival of liver cancer
• Survival probability• Five-year survival of liver cancer• Five-year survival rate of liver cancer
• Hazard ratio• Factors affecting liver cancer survival• Effect of chemotherapy on liver cancer survival
SURVIVAL ANALYSISSURVIVAL ANALYSIS
Event
Dead, infection, relapsed, etc
Cured, improved, conception, discharged, etc
Smoking cessation, ect
Negative
Positive
Neutral
Natural History of CancerNatural History of Cancer
Accrual, Follow-up, and EventAccrual, Follow-up, and Event
ID 2 009 2010 2011 2012
Begin the study End of the study
DeadDead
Start of accrual End of accrual End of follow-up
1
2
3
4
5
6
Recruitment period - Follow up period
Time since the beginning of the Time since the beginning of the studystudy
ID 0 1 2 3 4
DeadDead
1
2
3
4
5
6
48 months
22 months
14 months
40 months
26 months
13 months
The data : >48 >22 14 40 >26 >13
DATADATA
1 48 Still alive at the end of the study Censored2 22 Dead due to accident
Censored3 14 Dead caused by the disease under investigation Dead4 40 Dead caused by the disease under investigation Dead5 26 Still alive at the end of the study Censored6 13 Lost to follow-up
Censored
ID SURVIVAL TIME OUTCOME AT THE END EVENT
(Months) OF THE STUDY
DATADATA
1 48 Censored
2 22 Censored
3 14 Dead
4 40 Dead
5 26 Censored
6 13 Censored
ID TIMEEVENT
1 48
0 2 22
0 3 14
1 4 40
1 5 26
0 6 13
0
ID TIMEEVENT
ANALYSISANALYSIS
1 48
0 2 22
0 3 14
1 4 40
1 5 26
0 6 13
0
ID TIMEEVENT
Prevalence = 26
21Incidence density = / -63 person months Proportion of surviving at month ‘t’
Median survival time
RESULTSRESULTS
1 48
0 2 22
0 3 14
1 4 40
1 5 26
0 6 13
0
ID TIMEEVENT
Incidence density = 1 .2 per -100 person months 95 01 44( %CI: . to . )
Proportion of surviving at 24 month = 80% 95 20 97( %CI: to )
Median survival time = 40 Months 95 14 48( %CI: to )
Type of CensoringType of Censoring
1)1) Left censoringLeft censoring: When the patient experiences the event in : When the patient experiences the event in question before the beginning of the study observation question before the beginning of the study observation period.period.
2)2) Interval censoringInterval censoring: When the patient is followed for : When the patient is followed for awhile and then goes on a trip for awhile and then returns awhile and then goes on a trip for awhile and then returns to continue being studied. to continue being studied.
3)3) Right censoringRight censoring: : 1)1) single censoring: does not experience event during the study single censoring: does not experience event during the study
observation periodobservation period2)2) A patient is lost to follow-up within the study period.A patient is lost to follow-up within the study period.3)3) Experiences the event after the observation periodExperiences the event after the observation period4)4) multiple censoring: May experience event multiple times after multiple censoring: May experience event multiple times after
study observation ends, when the event in question is not death.study observation ends, when the event in question is not death.
Summary description of survival data setSummary description of survival data setstdes stdes
• This command describes summary This command describes summary information about the data set. It provides information about the data set. It provides summary statistics about the number of summary statistics about the number of subjects, records, time at risk, failure subjects, records, time at risk, failure events, etc.events, etc.
Computation of S(t)Computation of S(t)
1)1) Suppose the study time is divided into periods, the Suppose the study time is divided into periods, the number of which is designated by the letter, number of which is designated by the letter, t.t.
2)2) The survivorship probability is computed by The survivorship probability is computed by multiplying a proportion of people surviving for multiplying a proportion of people surviving for each period of the study.each period of the study.
3)3) If we subtract the conditional probability of the If we subtract the conditional probability of the failure event for each period from one, we obtain failure event for each period from one, we obtain that quantity.that quantity.
4)4) The product of these quantities constitutes the The product of these quantities constitutes the survivorship function.survivorship function.
Kaplan-Meier MethodsKaplan-Meier Methods
Kaplan-Meier survival curve
Median survival time
Survival FunctionSurvival Function
• The number in the risk set is used as the The number in the risk set is used as the denominator.denominator.
• For the numerator, the number dying in For the numerator, the number dying in period period tt is subtracted from the number in is subtracted from the number in the risk set. The product of these ratios the risk set. The product of these ratios over the study time=over the study time=
( )
( ) t t
t i T t
n dS t
n
Survival experienceSurvival experience
Survival curve more than one groupSurvival curve more than one group
Comparing survival between groups Comparing survival between groups ID TIME DEAD DRUG
1 48 0 12 22 0 13 14 1 14 40 1 15 26 0 16 13 0 17 13 0 08 6 1 09 12 1 0
10 14 1 011 22 1 012 13 1 0
Kaplan-Meier surveKaplan-Meier surveKaplan-Meier survival estimates, by drug
analysis time0 20 40 60
0.00
0.25
0.50
0.75
1.00
drug 0
drug 1
Log-rank testLog-rank test
• t = Time • n = Number at risk for both groups at time t• n1 = Number at risk for group 1 at time t• n2 = Number at risk for group 2 at time t• d = Dead for both groups at time t• c = Censored for both groups at time t• O1 = Number of dead for group 1 at time t• O2 = Number of dead for group 2 at time t• E1 = Number of expected dead for group 1 at time t• E2 = Number of expected dead for group 2 at time t
groupsof
i i
ii
E
EO# 22
Log-rank test exampleLog-rank test example• DRUG1 = 48+, 22+, 26+, 13+,14,40• DRUG0 = 13+, 6, 12, 14, 22, 13
Hazard FunctionHazard Function
t
tTttTtPth
t
0lim)(
Survival Function vs Hazard FunctionSurvival Function vs Hazard Function
H(t) = -ln(S(t))
(S(t)) = EXP(-H(t))
Hazard rateHazard rate
• The conditional probability of the event The conditional probability of the event under study, provided the patient has under study, provided the patient has survived up to an including that time periodsurvived up to an including that time period
• Sometimes called the intensity function, the Sometimes called the intensity function, the failure rate, the instantaneous failure ratefailure rate, the instantaneous failure rate
Formulation of the hazard rateFormulation of the hazard rate
Pr( |( ) lim
( )
( )
t
t t T t T th t
t
f t
S t
0
The HR can vary from 0 to infinity. It can increase or decrease or remain constant over time. It can become the focal point of much survival analysis.
Cox RegressionCox Regression
• The Cox model presumes that the ratio of the The Cox model presumes that the ratio of the hazard rate to a baseline hazard rate is an hazard rate to a baseline hazard rate is an exponential function of the parameter vector.exponential function of the parameter vector.
h(t) = h0(t) EXP(b1X1 + b2X2 + b3X3 + . . . + bpXp )
Hazard ratioHazard ratio
Testing the Adequacy of the Testing the Adequacy of the modelmodel
1.1. We save the Schoenfeld residuals of the We save the Schoenfeld residuals of the model and the scaled Schoenfeld model and the scaled Schoenfeld residuals.residuals.
2.2. For persons censored, the value of the For persons censored, the value of the residual is set to missing.residual is set to missing.
borrowed from Professor Robert A. Yaffee
A graphical test of the proportion A graphical test of the proportion hazards assumptionhazards assumption
• A graph of the log hazard would reveal 2 lines A graph of the log hazard would reveal 2 lines over time, one for the baseline hazard (when x=0) over time, one for the baseline hazard (when x=0) and the other for when x = 1and the other for when x = 1
• The difference between these two curves over The difference between these two curves over time should be constant = Btime should be constant = B
If we plot the Schoenfeld residuals over the line y=0, the best fitting line should be parallel to y=0.
borrowed from Professor Robert A. Yaffee
Graphical testsGraphical tests
• Criteria of adequacy: Criteria of adequacy:
The residuals, particularly the rescaled residuals, The residuals, particularly the rescaled residuals, plotted against time should show no trend(slope) plotted against time should show no trend(slope) and should be more or less constant over time.and should be more or less constant over time.
borrowed from Professor Robert A. Yaffee
Other issuesOther issues
• Time-Varying Covariates Time-Varying Covariates
• Interactions may be plottedInteractions may be plotted
• Conditional Proportional Hazards models:Conditional Proportional Hazards models:
• Stratification of the model may be Stratification of the model may be performed. Then the stphtest should be performed. Then the stphtest should be performed for each stratum.performed for each stratum.
borrowed from Professor Robert A. Yaffee
Suggested Readings for beginnersSuggested Readings for beginners
Suggested ReadingsSuggested Readings for advanced learnersfor advanced learners
Survival analysis in practiceSurvival analysis in practice
• What is the type of research question that What is the type of research question that survival analysis should be used?survival analysis should be used?
Stata for one-group survival analysisStata for one-group survival analysis
• stset stset timetime, failure(, failure(eventevent))
• stdescribestdescribe
• tab tab eventevent
• stsumstsum
• stratestrate
• stcistci
• sts list, at(sts list, at(12 2412 24))
Stata for one-group survival analysis (cont.)Stata for one-group survival analysis (cont.)
• sts gsts g• sts g, atrisksts g, atrisk• sts g, loststs g, lost• sts g, entersts g, enter• sts g, risktablests g, risktable• sts g, cumhazsts g, cumhaz• sts g, cumhaz cists g, cumhaz ci• sts g, hazardsts g, hazard
• stset stset timetime, failure(, failure(eventevent))• stdescribestdescribe• stsum, by(stsum, by(groupgroup))• sts test sts test groupgroup• sts test sts test group, group, wilcoxonwilcoxon• strate strate groupgroup• stci , by(stci , by(groupgroup))• sts g, by(sts g, by(groupgroup) atrisk) atrisk• sts g, by(sts g, by(groupgroup) risktable) risktable• sts g, by(sts g, by(groupgroup) cumhaz lost) cumhaz lost• sts g, by(sts g, by(groupgroup) hazard ci) hazard ci
Stata for multiple-group survival analysisStata for multiple-group survival analysis
• sts list, , by(sts list, , by(groupgroup) at() at(12 24)12 24)• sts list, , by(sts list, , by(groupgroup) at() at(12 24) 12 24) comparecompare• ltableltable group, group, interval(interval(##))• ltableltable group, group, graphgraph• ltableltable group, group, hazardhazard• stmh stmh groupgroup• stmh stmh groupgroup, by(, by(stratastrata))• stmc stmc groupgroup• stcox stcox groupgroup• stir stir groupgroup
Stata for multiple-group survival analysisStata for multiple-group survival analysis
Stata for Model FittingStata for Model Fitting
• Continuous covariateContinuous covariate• xtile xtile newvar = varlist newvar = varlist , nq(, nq(44))• tabstat tabstat varlistvarlist, stat(n min max) by(, stat(n min max) by(newvarnewvar) ) • xi:stcox i.xi:stcox i.newvarnewvar• stsum, by(stsum, by(newvarnewvar))
• Categorical covariateCategorical covariate• tab exposure outcome, coltab exposure outcome, col• xi:stcox i.xi:stcox i.exposureexposure
Sample size for Cox ModelSample size for Cox Model
• stpower cox, failprob(.2) hratio(0.1 0.3) sstpower cox, failprob(.2) hratio(0.1 0.3) sd(.3) r2(.1) power(0.8 0.9) hrd(.3) r2(.1) power(0.8 0.9) hr