chapter 4: analysis of a single numerical...

27
STAT 405 – BIOSTATISTICS Handout 18 – Survival Analysis: Kaplan-Meier Method EXAMPLE : Survival of Leukemia Patients Suppose a study is conducted to compare survival times of patients receiving two different types of leukemia treatments. The data are given in the file Leukemia.JMP. The raw data for Treatment Group 1 only are presented in the following graphic. The horizontal axis represents survival time, and each of the horizontal lines represents a patient. An ‘X’ indicates that a death occurred at that point in time. An ‘O’ indicates the last point in time the patient was observed. The current survival status for those marked with an ‘O’ cannot be obtained because observation was terminated before death occurred, and we know only that these patients were alive at last observation. 1

Upload: others

Post on 23-Apr-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

STAT 405 – BIOSTATISTICS Handout 18 – Survival Analysis: Kaplan-Meier Method

EXAMPLE: Survival of Leukemia Patients

Suppose a study is conducted to compare survival times of patients receiving two different types of leukemia treatments. The data are given in the file Leukemia.JMP.

The raw data for Treatment Group 1 only are presented in the following graphic. The horizontal axis represents survival time, and each of the horizontal lines represents a patient. An ‘X’ indicates that a death occurred at that point in time. An ‘O’ indicates the last point in time the patient was observed. The current survival status for those marked with an ‘O’ cannot be obtained because observation was terminated before death occurred, and we know only that these patients were alive at last observation.

1

Page 2: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

The points marked with an ‘O’ are examples of right censored observations. We call this “right censored” because all we know about the survival time of these patients is that it is GREATER THAN some value.

Questions:

1. Using “conventional” statistical methods, how might you approach the analysis of these data?

2. What are the problems with these “conventional” methods? Explain.

Survival Analysis

Conventional methods discussed earlier in the semester are not appropriate for dealing with either censored data or time-dependent variables. In contrast, survival analysis consists of methods for studying both the occurrence and timing of events, and these methods allow for censoring. Several approaches to survival analysis exist, and we will start by discussing one known as the Kaplan-Meier method.

Kaplan-Meier Method

In biostatistics, the Kaplan-Meier (KM) estimator is the most widely used method for estimating survivor functions. When there are no censored data, the KM estimator is simple and intuitive. For example, suppose that five patients were observed for six months, and the following was observed:

Patie Month in Which Death

2

Page 3: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

nt Occurred1 12 13 24 45 5

The survivor function, S(t), is the probability that an event time is greater than t, where t can be any nonnegative number. In this example, an event time refers to the time at which death occurs.

Therefore, since there is no censoring, the KM estimator S( t )is simply the sample proportion of observations which are still alive at each time point:

i.e.

S ( t )=P (T ≥t )=¿ individualswithT ≥ttotal sample¿¿¿

Time, t S( t )0

1

2

3

4

5

Kaplan-Meier estimator for censored data

When the data are censored and some of the censoring times are smaller than some event times, the observed proportion of cases with event times greater than t can be biased. This happens because cases that are censored before time t may have died before time t, unbeknownst to us.

To handle this can consider using the following from conditional probability. Suppose t k<t ≤ tk+1 , then

S ( t )=P (T ≥ tk +1 )=P (T ≥ t1 , T ≥ t2 ,…,T ≥t k +1 )

3

Page 4: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

¿P(T ≥ t 1)×∏j=1

k

P (T ≥ t j+1∨T ≥t j)

¿∏j=1

k

[1−P (T=t j|T ≥ t j )]

¿∏j=1

k

[1−λ j]

where λ j=P(deathat timet j)

So

S ( t )≅∏j=1

k

(1−d jr j )

¿ ∏j : t j< t

(1−d jr j )

Here we are assuming there are k distinct event times. At each of these event times tj, there are rj individuals who are said to be “at risk” (they have not yet experienced an event and they have not been censored prior to time tj). Finally, we let dj be the number who experience an event (death) at time tj and let cj denote the number of censored observations between the j-th and (j+1)-st observed event time.

Useful identities:

r j=r j−1−d j−1−c j−1

r j=∑k ≥ j

(ck+dk )

The product derived above is called the Kaplan-Meier estimator or product-limit estimator and is defined as:

S ( t )=∏j : t j<t

(1−d jr j )

EXAMPLE: Back to Survival of Leukemia Patients

For the 21 patients who received Treatment 1, the survival times of the patients are given as follows:

6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+

We can use the KM method to estimate the survivor function for Treatment Group 1:

4

Page 5: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Timetj

# at riskrj

# of events

dj

# censored

cj

Estimated Probability of

Surviving

[1−d jr j ]

Estimated Survivor Function

S( t )= ∏j:t j≤t

[1−d jr j ]

Using JMP for Kaplan-Meier Estimation of the Survivor Function

5

Page 6: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

First, note that the data set should be constructed as follows by specifying three columns of data. One column indicates the treatment the subject received, the event time, and a censoring indicator which denotes whether the event time corresponds to the actual event of interest or a censoring time.

As mention above for each case in the data set, there must be one variable in the data set which contains either the time that an event occurred or, for censored cases, the last time at which that case was observed. A second variable is necessary if some of the cases are censored. It is common to set this equal to 1 for uncensored cases and 0 for censored cases.

Specifically for our data set, the variable Time gives the time in months from the beginning of the study to either death or censoring. The variable Censor has a value of 1 for those who died and a value of 0 for those who were censored. It is not uncommon to code the censoring in reverse, i.e. 1 = censored observation and 0 = observed event. An additional variable, group, is an indicator variable for Treatment 1 vs. Treatment 2 patients. We enter these into JMP as show below.

6

Page 7: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

To get the KM estimator for each treatment group separately we use Analyze > Reliability and Survival > Survival and set up the dialog box as shown below.

The following results are obtained for Treatment group 1.

Note: Censor = 1 means the event was observed and Censor = 0 means the observation was censored.

7

Page 8: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Each of the lines in the above output corresponds to one of the unique time points (except for the first line, which is for time 0). Note that the estimated survivor function, S(t ), the failure function , (1− S (t )), the SE( S ( t )), the number of failures and the number censored cases are reported for each of the unique time points. The number of individuals at risk at the start of each time period is also reported.

The Survival Plot shows the KM estimate of the survivor function S(t ). You can add confidence confidence bands and show the data points by selecting those options from the Product-Limit … > Plot Options pull-out menu.

To obtain the Kaplan-Meier estimate table shown under the heading Combined you need to toggle it on.

The estimated median survival time is the first time point where the survivor function S(t ) drops below .50 or 50%. The mean generally is not used to summarize typically survival time because of the censoring and therefore is biased.

8

Page 9: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Questions:

1. What is the estimated probability that a patient will survive for 16 months or more?

2. What is the estimated survival probability for any time from 10 months up to (but not including) 13 months?

The estimated survivor and failure functions are graphed below:

9

Page 10: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

10

Page 11: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Here, the 25th percentile refers to the smallest event time such that the probability of dying earlier is greater than 25%. No value is reported for the 75th percentile because the KM estimator for these data never reaches a failure probability greater than 55.18% or a survival probability lower than 44.82%.

Note that the 50th percentile represents the median death time (here, it is 23 months). JMP also provides an estimated mean time of death, but this estimate is biased downward since there are censoring times greater than the largest event time. Even if this is not the case, when a substantial number of cases are censored, the median is a much preferred measure of central tendency for censored survival data.

Definition for Mean, Medians, and Quantiles based on the KM Estimators

Mean – ∑j=1

k

t jP (T=t j )

Median – by definition, this is the time τ such that S (τ )=0.5 . However, in practice, it is defined as the smallest time such that S(τ)≤0.5 . The median is more appropriate for censored survival data than the mean.

Lower quartile – the smallest time (Q1 ¿ such that S(Q1)≤0.75 Upper quartile – the smallest time (Q3 ) such that S(Q3)≤0.25

Constructing Confidence Intervals for the KM estimators

Note that JMP also provided us with a standard error for the “survival” at each point in time. These standard errors can be used to construct confidence intervals, which are easily obtained by selecting that option from the Product-Limit… pull-down menu.

Estimating the var (S (t ))

Uses the delta method which says, if Y N (μ , σ2) then g(Y )is approximately normally distributed with mean g(μ) and variance [g' (μ ) ]2σ 2. For example if we consider g (Y )=log (Y ) then

g (Y ) N (g (μ ) , [g' (μ ) ]2σ 2) or g (Y ) N (log (μ ) ,( 1μ )

2

σ2)

and if g (Y )=eY then

11

Page 12: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

g (Y ) N (eμ , [eμ ]2σ 2)

Instead estimating the var (S ( t )) we can use the delta method to approximate

the var¿. The log (S (t ))=∑j : t j< t

log (1− λ j) and using independence of the λ j ' s we have

var [ log (S ( t ) ) ]=∑j : t j<t

var [ log (1− λ j )]

¿ ∑j : t j< t ( 1

1− λ j )2

var ( λ j)

¿ ∑j : t j< t ( 1

1− λ j )2

λ j(1− λ j)/r j

¿ ∑j : t j< t

λ j(1− λ j )r j

¿ ∑j : t j< t

d j(r j−d j ) r j

Now S ( t )=exp [log ( S ( t ) ) ] thus by the delta method again we have

var ( S (t ) )=[ S (t ) ]2 var [log ( S ( t ) )]thus,

var ( S (t ) )=[ S ( t ) ]2 ∑j :t j<t

d j(r j−d j ) r j

This is called Greenwood’s Formula. The square root of this variance approximation gives approximate standard errors for the estimated survivor function, i.e. SE( S (t )). Using this standard error we can compute a 95% CI for S(t) as:

S (t )±1.96 ∙ SE[ S ( t )]

This can yield values outside the range [0,1]. A better approach is to exploit the delta method even further to obtain a CI for L ( t )=log [−log (S (t ) )]. Since this quantity is unrestricted, the confidence interval will be in the proper range when we transform back.

Log-log Approach for Confidence Intervals

1) Define L (t )=log (−log (S (t ) ) )2) Form a 95% CI for L(t)

L (t )±1.96 ∙ SE( L (t ))

12

Page 13: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

3) Since S ( t )=exp (−exp [L (t ) ]), the 95% CI for S(t) based on the CI for L(t) isgiven by

[exp (−e ( L ( t )+A ) ) ,exp (−e L (t )−A )]4) Substituting L ( t )=log (−log (S (t ) )) into the upper and lower bounds

gives confidence limits([ S (t ) ]e

A

, [ S (t ) ]e−A

)What is SE( L ( t ))? Using the delta method again…

var ( L (t ) )=var [log (−log ( S ( t )))]=( 1log ( S (t ) ) )

2

∑j : t j<t

d j( r j−d j ) r j

and the standard error is the square root of this estimated variance.

Constructing Confidence Bands for the Survivor FunctionThe pointwise confidence interval for the survivor function S(t) discussed above is valid for ONLY a SINGLE FIXED TIME at which the inference is to be made. In some cases, it is of interest to find the upper and lower confidence bands that guarantee, with a given confidence level, that the survivor function falls within the band for all t in some interval. One such approach was proposed by Hall and Wellner and is also implemented in JMP.

Pointwise CI displays a CI for the survivor function at each individual time point!

Simultaneous CI gives a CI for the entire survivor function S(t ) using the Hall & Wellener method.

For both you can select Shaded options which I have done for the simultaneous CI in the plot below.

13

Page 14: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

14

Page 15: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Testing for Differences in Survivor Functions

Now, let’s consider the observations from both treatment groups 1 and 2. To determine whether the survivor functions differ across treatment, we test the following hypotheses:

Ho: S1(t) = S2(t) for all tHa: S1(t) ≠ S2(t) for all t

JMP provides two test statistics for this alternative hypothesis, the log-rank test and the Wilcoxon test, both of which yield a Chi-square statistic. The larger the chi-square statistic, the stronger the evidence against the null hypothesis that the survival experience of the two population of subjects is the same. Recall large chi-square statistics yield small p-values!

Analyze > Reliability and Survival > Survival

Here we can see the log rank test yields a χ2=16.79 ( p<.0001 ), thus we conclude the survival experiences differs between treatment groups. In particular those receiving treatment 1 have a better survival experience.

15

Page 16: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

We also obtain a plot of both survivor functions with point-wise confidence intervals added.

We can also plot each curve with simultaneous CI’s.

16

Page 17: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

17

Page 18: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Kaplan-Meier Estimation and Testing in RFor the 21 patients who received Treatment 1, the survival times of the patients are given as follows: 6, 6, 6, 6+, 7, 9+, 10, 10+, 11+, 13, 16, 17+, 19+, 20+, 22, 23, 25+, 32+, 32+, 34+, 35+

Time Censor [1,] 6 1 [2,] 6 1 [3,] 6 1 [4,] 6 0 [5,] 7 1 [6,] 9 0 [7,] 10 1 [8,] 10 0 [9,] 11 0[10,] 13 1[11,] 16 1[12,] 17 0[13,] 19 0[14,] 20 0[15,] 22 1[16,] 23 1[17,] 25 0[18,] 32 0[19,] 32 0[20,] 34 0[21,] 35 0

> fit = survfit(Surv(Time,Censor)~1,type="kaplan",conf.type="plain")> summary(fit)Call: survfit(formula = Surv(Time, Censor) ~ 1, type = "kaplan", conf.type = "plain")

time n.risk n.event survival std.err lower 95% CI upper 95% CI 6 21 3 0.857 0.0764 0.707 1.000 7 17 1 0.807 0.0869 0.636 0.977 10 15 1 0.753 0.0963 0.564 0.942 13 12 1 0.690 0.1068 0.481 0.900 16 11 1 0.627 0.1141 0.404 0.851 22 7 1 0.538 0.1282 0.286 0.789 23 6 1 0.448 0.1346 0.184 0.712

> plot(fit)> title(main="Kaplan-Meier with Greenwood-Based CI's")

18

Page 19: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

> fit = survfit(Surv(Time,Censor)~1,type="kaplan",conf.type="log-log")> summary(fit)Call: survfit(formula = Surv(Time, Censor) ~ 1, type = "kaplan", conf.type = "log-log")

time n.risk n.event survival std.err lower 95% CI upper 95% CI 6 21 3 0.857 0.0764 0.620 0.952 7 17 1 0.807 0.0869 0.563 0.923 10 15 1 0.753 0.0963 0.503 0.889 13 12 1 0.690 0.1068 0.432 0.849 16 11 1 0.627 0.1141 0.368 0.805 22 7 1 0.538 0.1282 0.268 0.747 23 6 1 0.448 0.1346 0.188 0.680> plot(fit)> title(main="Kaplan-Meier with Log-Log CI's")

19

Page 20: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Examining Difference in Survival Across Groups> Leuk.fit = survfit(Surv(Time,Censor)~Group,data=leukemia)> summary(Leuk.fit)Call: survfit(formula = Surv(Time, Censor) ~ Group, data = leukemia)

Group=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 6 21 3 0.857 0.0764 0.720 1.000 7 17 1 0.807 0.0869 0.653 0.996 10 15 1 0.753 0.0963 0.586 0.968 13 12 1 0.690 0.1068 0.510 0.935 16 11 1 0.627 0.1141 0.439 0.896 22 7 1 0.538 0.1282 0.337 0.858 23 6 1 0.448 0.1346 0.249 0.807

Group=2 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1 21 2 0.9048 0.0641 0.78754 1.000 2 19 2 0.8095 0.0857 0.65785 0.996 3 17 1 0.7619 0.0929 0.59988 0.968 4 16 2 0.6667 0.1029 0.49268 0.902 5 14 2 0.5714 0.1080 0.39455 0.828 8 12 4 0.3810 0.1060 0.22085 0.657 11 8 2 0.2857 0.0986 0.14529 0.562 12 6 2 0.1905 0.0857 0.07887 0.460 15 4 1 0.1429 0.0764 0.05011 0.407 17 3 1 0.0952 0.0641 0.02549 0.356 22 2 1 0.0476 0.0465 0.00703 0.322 23 1 1 0.0000 NaN NA NA

20

Page 21: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

> plot(Leuk.fit,col=3:4,lty=1:2)> legend(locator(),legend=c("Maintained","Nonmaintained"),col=3:4,lty=1:2)

> plot(Leuk.fit,conf.int=T,col=3:4,lty=1:2)> legend(locator(),legend=c("Maintained","Nonmaintained"),col=3:4,lty=1:2)> title(main="Survival Curves with Confidence Bands")

Log Rank Test for Comparing Survival Experience> Leuk.test = survdiff(Surv(Time,Censor)~Group,data=leukemia)> print(Leuk.test)Call:survdiff(formula = Surv(Time, Censor) ~ Group, data = leukemia)

N Observed Expected (O-E)^2/E (O-E)^2/VGroup=1 21 9 19.3 5.46 16.8Group=2 21 21 10.7 9.77 16.8

Chisq= 16.8 on 1 degrees of freedom, p= 4.17e-05

Example 2: Oropharynx Cancer

21

Page 22: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

Name: A clinical Trial in the Trt. of Carcinoma of the Oropharynx SIZE: 195 observations, 13 variables.

SOURCE: The Statistical Analysis of Failure Time Data, by JD Kalbfleisch & RL Prentice, (1980), Published by John Wiley & Sons

DESCRIPTIVE ABSTRACT:

These data from a part of a large clinical trial carried out by the Radiation Therapy Oncology Group in the United States. The full study included patients with squamous carcinoma of 15 sites in the mouth and throat, with 16 participating institutions, though only data on three sites in the oropharynx reported by the six largest institutions are considered here. Patients entering the study were randomly assigned to one of two treatment groups, radiation therapy alone or radiation therapy together with a chemotherapeutic agent.

One objective of the study was to compare the two treatment policies with respect to patient survival.

LIST OF VARIABLES:

Variable Description_______________________________________________________________________________

CASE Case Number

INST Participating Institution

SEX 1=male, 2=female

TX Treatment: 1=standard, 2=test

GRADE 1=well differentiated, 2=moderately differentiated, 3=poorly differentiated, 9=missing

AGE In years at time of diagnosis

COND Condition: 1=no disability, 2=restricted work, 3=requires assistance with self care, 4=bed confined, 9=missing

SITE 1=faucial arch, 2=tonsillar fossa, 3=posterior pillar, 4=pharyngeal tongue, 5=posterior wall

T_STAGE 1=primary tumor measuring 2 cm or less in largest diameter, 2=primary tumor measuring 2 cm to 4 cm in largest diameter with minimal infiltration in depth, 3=primary tumor measuring more than 4 cm, 4=massive invasive tumor

N_STAGE 0=no clinical evidence of node metastases, 1=single positive node 3 cm or less in diameter, not fixed, 2=single positive node more than 3 cm in diameter, not fixed, 3=multiple positive nodes or fixed positive nodes

ENTRY_DT Date of study entry: Day of year and year, dddyy

STATUS 0=censored, 1=dead

TIME Survival time in days from day of diagnosis _______________________________________________________________________________

22

Page 23: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

STORY BEHIND THE DATA:

Approximately 30% of the survival times are censored owing primarily to patients surviving to the time of analysis. Some patients were lostto follow-up because the patient moved or transferred to an institution notparticipating in the study, though these cases were relatively rare. From a statistical point of view, an important feature of these data is the considerable lack of homogeneity between individuals being studied. Of course, as part of the study design, certain criteria for patient eligibility had to be met which eliminated extremes in the extent of disease, but still many factors are not controlled.

This study included measurements of many covariates which would be expected to relate to survival experience. Six such variables are given in the data (sex, T staging, N staging, age, general condition, and grade). The site of the primary tumor and possible differences between participating institutions require consideration as well. The T,N staging classification gives a measure of the extent of the tumor at the primary site and at regional lymph nodes. T=1, refers to a small primary tumor, 2 centimeters or less in largest diameter, whereas T=4 is a massive tumor with extension to adjoining tissue. T=2 and T=3 refer to intermediatecases. N=0 refers to there being no clinical evidence of a lymph node metastasis and N=1, N=2, N=3 indicate, in increasing magnitude, the extent of existing lymph node involvement. Patients with classifications T=1,N=0; T=1,N=1; T=2,N=0; or T=2,N=1, or with distant metastases were excluded from study.

The variable general condition gives a measure of the functional capacity of the patient at the time of diagnosis (1 refers to no disability whereas4 denotes bed confinement; 2 and 3 measure intermediate levels). The variablegrade is a measure of the degree of differentiation of the tumor (the degreeto which the tumor cell resembles the host cell) from 1 (well differentiated) to 3 (poorly differentiated)

In addition to the primary question whether the combined treatment mode ispreferable to the conventional radiation therapy, it is of considerable interest to determine the extent to which the several covariates relate tosubsequent survival. It is also imperative in answering the primary question to adjust the survivals for possible imbalance that may be present in the study with regard to the other covariates. Such problems are similar to those encountered in the classical theory of linear regression and the analysis of covariance. Again, the need to accommodate censoring is an important distinguishing point. In many situations it is also important to develop nonparametric and robust procedures since there is frequently little empiricalor theoretical work to support a particular family of failure time distributions.

Compare Survival Experience of Patients in the Treatment Groups> oro.fit = survfit(Surv(TIME,STATUS)~as.factor(TX),data=Pharynx)> plot(oro.fit,col=3:4,lty=1:2,conf.int=T)> title(main="Comparison of Therapies for Oropharynx Cancer",xlab=”Time (days)",ylab="S(t)")> legend(locator(),legend=c("Standard","Test"),col=3:4,lty=1:2)

23

Page 24: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

> oro.diff = survdiff(Surv(TIME,STATUS)~as.factor(TX),data=Pharynx)> print(oro.diff)Call:survdiff(formula = Surv(TIME, STATUS) ~ as.factor(TX), data = Pharynx) N Observed Expected (O-E)^2/E (O-E)^2/Vas.factor(TX)=1 100 73 78.7 0.410 0.926as.factor(TX)=2 95 69 63.3 0.509 0.926

Chisq= 0.9 on 1 degrees of freedom, p= 0.336Use Treatment Regimen and Patient Gender to Define Cohorts> oro.trtsex = survfit(Surv(TIME,STATUS)~as.factor(SEX)+as.factor(TX),data=Pharynx)> plot(oro.trtsex,col=1:4,lty=1:4)

24

Page 25: Chapter 4: Analysis of a Single Numerical Variablecourse1.winona.edu/bdeppa/Biostatistics/Handouts/HO18 D... · Web viewSeveral approaches to survival analysis exist, and we will

> oroTS.diff = survdiff(Surv(TIME,STATUS)~as.factor(SEX)+as.factor(TX),data=Pharynx)> print(oroTS.diff)Call:survdiff(formula = Surv(TIME, STATUS) ~ as.factor(SEX) + as.factor(TX), data = Pharynx)

N Observed Expected (O-E)^2/E (O-E)^2/Vas.factor(SEX)=1, as.factor(TX)=1 77 58 56.2 0.0558 0.093as.factor(SEX)=1, as.factor(TX)=2 72 52 49.5 0.1290 0.199as.factor(SEX)=2, as.factor(TX)=1 23 15 22.4 2.4715 2.976as.factor(SEX)=2, as.factor(TX)=2 23 17 13.8 0.7170 0.800

Chisq= 3.4 on 3 degrees of freedom, p= 0.33

In the next section we will examine how to better handle covariates in studies such as this, by using a regression model. For survival data the most common regression model is the Cox Proportional Hazards Model.

25