a comparison of modeling scales in flexible parametric models noori akhtar-danesh, phd mcmaster...

Post on 12-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Comparison of Modeling Scales in Flexible Parametric Models

Noori Akhtar-Danesh, PhDMcMaster UniversityHamilton, Canadadaneshn@mcmaster.ca

2

Outline

BackgroundA review of splinesFlexible parametric modelsResults

• Ovarian cancer

• Colorectal cancer

Conclusions

Stata Conference 2015, Columbus, USA

Stata Conference 2015, Columbus, USA 3

Background

Cox-regression and parametric survival models are quite common in the analysis of survival data

Recently, Flexible Parametric Models (FPM), have been introduced as an extension to the parametric models such as Weibull model (hazard- scale), loglogistic model (odds-scale), and lognormal model (probit-scale)

Stata Conference 2015, Columbus, USA 4

Objectives & Methods

In this presentation different FPMs will be compared based on these modeling scales

Used two subsets of the U.S. National Cancer Institute's Surveillance, Epidemiology and End Results (SEER) dataset from the original 9 registries; • Ovarian cancer diagnosed 1991 - 2010

• Colorectal cancer in men 60+ diagnosed 2001 - 2010

5

A review of splines

The daily statistical practice usually involves assessing relationship between one outcome variable and one or more explanatory variables

We usually assume linear relationship between some function of the outcome variable and the explanatory variables

However, in many situations this assumption may not be appropriate

Stata Conference 2015, Columbus, USA

6

Linear splinesysuse auto

twoway (scatter mpg weight) (lfit mpg weight), xlabel(1700(300)4900)

Stata Conference 2015, Columbus, USA

10

20

30

40

1,700 2,000 2,300 2,600 2,900 3,200 3,500 3,800 4,100 4,400 4,700 5,000Weight (lbs.)

7

Linear splinesysuse auto

twoway (scatter mpg weight) (lfit mpg weight), xlabel(1700(300)4900)

Stata Conference 2015, Columbus, USA

10

20

30

40

1,700 2,000 2,300 2,600 2,900 3,200 3,500 3,800 4,100 4,400 4,700 5,000Weight (lbs.)

8

Linear spline mkspline lnweight1 2400 lnweight2= weight

regress mpg lnweight*

predict linsp

twoway (scatter mpg weight) (line linsp weight, sort)

10

20

30

40

2,000 3,000 4,000 5,000Weight (lbs.)

Mileage (mpg) Fitted values

Stata Conference 2015, Columbus, USA

9

Cubic Splines

Cubic splines are piecewise cubic polynomials with a separate cubic polynomial fit in each of the predefined number of intervals

The number of intervals is chosen by the user and the split points are known as knots

Continuity restrictions are imposed to join the splines at knots to fit a smooth function

Stata Conference 2015, Columbus, USA

10

Restricted Cubic Splines

In RCS the spline function is forced (restricted) to be linear before the first and after the last knot (the boundary knots)

When modeling survival time, the boundary knots are usually defined as the minimum and maximum of the uncensored survival times

Stata Conference 2015, Columbus, USA

11

Restricted Cubic SplinesLet s(x) be the restricted cubic spline function, if

we define m interior knots, k1,…, km, and two boundary knots, kmin and kmax, we can write s(x) as a function of parameters and some newly defined variables z1,…, zm+1,

0 1 1 2 2 1 1( ) ... m ms x z z z

Stata Conference 2015, Columbus, USA

12

Restricted Cubic SplinesThe derived variables (zj, also know as the basis

functions) are calculated as following

where for j=2,…,m+1, and

(Royston & Lambert, 2011)

1

3 3 3min max( ) ( ) (1 )( )j j j j

z x

z x k x k x k

max

max min

jj

k k

k k

Stata Conference 2015, Columbus, USA

3 3( ) ( ) if it is positive and 0 otherwisej jx k x k

13

Restricted Cubic Splines

These RCSs can be calculated using a number of Stata commands, including mkspline (an official Stata command), rcsgen, and splinegen (two user written commands)

The rcsgen command can orthogonalize the derived spline variables which can lead to more stable parameter estimates and quicker model convergence

Stata Conference 2015, Columbus, USA

14

rcsgen; an alternative spline macro

sysuse auto, clear

rcsgen weight, gen(rcs) df(3)

regress mpg rcs1-rcs3

predictnl pred = xb(), ci(lci uci)

twoway (rarea lci uci weight, sort) ///

(scatter mpg weight, sort) ///

(line pred weight, sort lcolor(black)), /// legend(off)

Stata Conference 2015, Columbus, USA

15

Flexible Parametric Models rcsgen; an alternative spline macro

10

20

30

40

2,000 3,000 4,000 5,000Weight (lbs.)

Stata Conference 2015, Columbus, USA

16

FPM: Royston-Parmer (RP) Models

RP models are a extension of the parametric models (Weibull, log-logistic, and log-normal) which offer greater flexibility with respect to shape of the survival distribution

The additional flexibility of an RP model is because, for instance for a hazard model, it represents the baseline distribution function as a restricted cubic spline function of log time instead of simply as a linear function of log time

The complexity of modeling spline functions is determined by the number and positions of the knots in the log time

Stata Conference 2015, Columbus, USA

17

FPM: Royston-Parmer (RP) Models

By default, the internal knots for modeling baseline distribution function in RP models are positioned on the distribution of uncensored log event-times

Internal knots d.f. Knot positions (centiles)1 2 50

2 3 33, 67

3 4 25, 50, 75

4 5 20, 40, 60, 80

5 6 17, 33, 50, 67, 83

6 7 14, 29, 43, 57, 71, 86

7 8 12.5, 25, 37.5, 50, 625.5, 75, 87.5

8 9 11.1, 22.2, 33.3, 44.4, 55.6, 66.7, 77.8. 88.9

9 10 10, 20, 30, 40, 50, 60, 70, 80, 90

Stata Conference 2015, Columbus, USA

18

FPM: Royston-Parmer (RP) Models

Spline models can be chosen by the appearance of the survival functions, hazard functions, etc. or more formally, by minimizing the value of an information criterion [Akaike (AIC) or Bayes (BIC)]Estimation of parameters is by maximum likelihood

Stata Conference 2015, Columbus, USA

19

FPM: A review of Weibull distribution

The cumulative hazard function for a Weibull distribution is

To make it consistent with rest of this presentation let’s change the notation as

where 1 is the shape parameter. Then, the Weibull hazard function is

( ) pH t t

1( )H t t

1 11( ) ( ) /h t dH t dt t

Stata Conference 2015, Columbus, USA

20

FPM: A review of Weibull distributionuse "C:Desktop/ovary.dta", clear

keep if agegrp==3

stset SRV_TIME_MON, f( STAT_REC=4) scale(12) exit(time 10*12)

sts gen S=s sts gen Slb=lb(s) sts gen Sub=ub(s)

sts gen Hna=nasts gen Hlb=lb(na)sts gen Hub=ub(na)

quietly streg, d(w)

gen Hweib=exp(_b[_cons])*(_t)^(e(aux_p))gen Sweib=exp(-Hweib)

bysort _t: drop if _n>1

twoway(rarea Hlb Hub _t, pstyle(ci) sort) /// (line Hna Hweib _t, sort), leg(off) ytitle(" Cumulative hazard function") /// xtitle("") ylab(, angle(h)) name(g1, replace) nodraw

twoway(rarea Slb Sub _t, pstyle(ci) sort) /// (line S Sweib _t, sort), leg(off) ytitle(" Survival function") /// xtitle("") ylab(, angle(h)) name(g2, replace) nodraw

graph combine g1 g2, b2title(Years from diagnosis)

Stata Conference 2015, Columbus, USA

21

FPM: A review of Weibull distribution Women age 60-69 diagnosed with ovarian cancer

Stata Conference 2015, Columbus, USA

0

.5

1

1.5

Cum

ulat

ive

haza

rd fu

nctio

n

0 2 4 6 8 10

.2

.4

.6

.8

1

Sur

viva

l fun

ctio

n

0 2 4 6 8 10

Years from diagnosis

22

FPM: A review of Weibull distribution

One reason that a Weibull model does not fit very well to the dataset is that it has a monotonic hazard function To have a more flexible form, we begin by writing the Weibull cumulative hazard function in logarithmic form

Now, suppose that f(t;) represents some general family of nonlinear functions of time t, with some parameter vector and

1 0 1ln ( ) ln ln lnH t t t

Stata Conference 2015, Columbus, USA

ln ( ) ( ; )H t f t

23

FPM: Royston-Parmer (RP) Models

Because cumulative hazard functions are monotonic in time, f(t;) must be monotonic too

Two potentially appropriate functions are fractional polynomials (Royston & Altman 1994) and splines (de Boor 2001)

Stata Conference 2015, Columbus, USA

24

FPM: Royston-Parmer (RP) Models

We write a restricted cubic spline function as s(lnt;) instead of f(t;) with s standing for spline and lnt to emphasize that we are working on the scale of log time

where lnt, z1(lnt), z2(lnt), ..., are the basis functions of the restricted cubic spline

2 1 30 1 2ln ( ) (ln ; ) ln (ln ) (ln ) ...z t zH s t t tt

Stata Conference 2015, Columbus, USA

25

FPM: Royston-Parmer (RP) Models

When we specify one or more knots, the spline function includes a constant term (0), a linear function of lnt with parameter 1, and a basis function for each knot

By convention, the “no knots” case for a hazard model corresponds to the linear function, s(lnt;)= 0+ 1 lnt, which is the Weibull model

Stata Conference 2015, Columbus, USA

26

FPM: Royston-Parmer (RP) Models

We estimate the parameters by maximum likelihood method using the stpm2 routine (Lambert & Royston 2009)

We identify df for each model based on AIC criteria and evaluate the variables in the model using lrtest

We use options of hazard, odds, and normal in stpm2 for fitting different scales

Stata Conference 2015, Columbus, USA

27

FPM: Ovarian cancer

. tab agegrpAge group | Freq. Percent Cum.-------------+-----------------------------------40- 49 years | 2,700 19.55 19.5550- 59 years | 3,896 28.21 47.7660- 69 years | 3,466 25.10 72.8670- 79 years | 2,606 18.87 91.73 >=80 years | 1,142 8.27 100.00-------------+----------------------------------- Total | 13,810 100.00

gen year=DATE_yr-1990mkspline yearsp=year, cubic nknots(3)

stpm2 agegrp2-agegrp5 yearsp*, df(7) tvc(agegrp2- /// agegrp5 yearsp1) dftvc(2) ///

scale(hazard) eform nolog

Stata Conference 2015, Columbus, USA

28

FPM: Ovarian cancer

. estat ic 

Scale | AIC

-------------+--------------

Hazard | 35615.92

Odds | 35616.51

Normal | 35564.12

Stata Conference 2015, Columbus, USA

29

FPM: Ovarian cancer

Stata Conference 2015, Columbus, USA

30

FPM: Ovarian cancer

Stata Conference 2015, Columbus, USA

0

.2

.4

.6

.8

1

Re

lativ

e su

rviv

al r

atio

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

40- 49 years

0

.2

.4

.6

.8

1

Re

lativ

e su

rviv

al r

atio

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

50- 59 years

0

.2

.4

.6

.8

1

Re

lativ

e su

rviv

al r

atio

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

60- 69 years

0

.2

.4

.6

.8

1

Re

lativ

e su

rviv

al r

atio

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

70- 79 years

0

.2

.4

.6

.8

1

Re

lativ

e su

rviv

al r

atio

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

>= 80 years

95%CI HazardOdds Normal

Scale

31

FPM: Ovarian cancer

Stata Conference 2015, Columbus, USA

0

.2

.4

.6

.8

1

On

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

40- 49 years

0

.2

.4

.6

.8

1

On

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

50- 59 years

0

.2

.4

.6

.8

1

On

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

60- 69 years

0

.2

.4

.6

.8

1

On

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

70- 79 years

0

.2

.4

.6

.8

1

On

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

>= 80 years

HazardOddsNormal

Scale

32

FPM: Ovarian cancer

Stata Conference 2015, Columbus, USA

.3

.5

.7

Fiv

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

40- 49 years

.3

.5

.7

Fiv

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

50- 59 years

.3

.5

.7

Fiv

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

60- 69 years

.3

.5

.7

Fiv

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

70- 79 years

.3

.5

.7

Fiv

e-y

ea

r R

SR

1991 1994 1997 2000 2003 2006 2009Years of diagnosis

>= 80 years

HazardOddsNormal

Scale

33

FPM: Ovarian cancer

Stata Conference 2015, Columbus, USA

0

10

20

30

40

50

60

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

40- 49 years

0

10

20

30

40

50

60

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

50- 59 years

0

10

20

30

40

50

60

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

60- 69 years

0

10

20

30

40

50

60

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

70- 79 years

0

10

20

30

40

50

60

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

>= 80 years

Hazard

OddsNormal

Scale:

34

FPM: Colorectal cancer

. tab agegrp 

Age group | Freq. Percent Cum.-------------+-----------------------------------60- 69 years | 14,837 35.32 35.3270- 79 years | 16,026 38.16 73.48 >=80 years | 11,139 26.52 100.00-------------+----------------------------------- Total | 42,002 100.00

stpm2 agegrp2-agegrp3 year, df(9) tvc(agegrp2- /// agegrp3) dftvc(2) scale(hazard) eform nolog

Stata Conference 2015, Columbus, USA

35

FPM: Colorectal cancer

. estat ic 

Scale | AIC

-------------+--------------

Hazard | 111997.6

Odds | 112056.2

Normal | 111829.9

Stata Conference 2015, Columbus, USA

36

FPM: Colorectal cancer

Stata Conference 2015, Columbus, USA

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

Su

rviv

al r

ate

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

60- 69 years

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

Su

rviv

al r

ate

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

70- 79 years

0

.1

.2

.3

.4

.5

.6

.7

.8

.9

1

Su

rviv

al r

ate

0 1 2 3 4 5 6 7 8 9 10Years from diagnosis

>= 80 years

95%CI HazardOdds Normal

Scale

37

FPM: Colorectal cancer

Stata Conference 2015, Columbus, USA

.7

.75

.8

.85

.9

On

e-y

ea

r su

rviv

al r

ate

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010Years of diagnosis

60- 69 years

.7

.75

.8

.85

.9

On

e-y

ea

r su

rviv

al r

ate

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010Years of diagnosis

70- 79 years

.7

.75

.8

.85

.9

On

e-y

ea

r su

rviv

al r

ate

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010Years of diagnosis

>= 80 years

Hazard

OddsNormal

Scale

38

FPM: Colorectal cancer

Stata Conference 2015, Columbus, USA

0

10

20

30

40

50

60

70

80

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

60- 69 years

0

10

20

30

40

50

60

70

80

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

70- 79 years

0

10

20

30

40

50

60

70

80

Mo

rta

lity

rate

(p

er

10

0 P

Y)

0 1 2 3 4 5Years from Diagnosis

>= 80 years

HazardOdds

Normal

Scale

39

Conclusion

In general, there were no substantial differences between the estimates from the three modeling scales, although the probit-scale showed slightly better fit based on the Akaike information criterion (AIC) for both datasets

Stata Conference 2015, Columbus, USA

40

References de Boor, C. 2001. A Practical Guide to Splines, Revised ed ed. New York,

Springer. Durrleman, S. & Simon, R. 1989. Flexible regression models with cubic splines.

Stat.Med., 8, (5) 551-561 available from: PM:2657958 Lambert, P.C. & Royston, P. 2009. Further development of flexible parametric

models for survival analysis. The Stata Journal, 9, 265-290 Royston, P. & Altman, D.G. 1994. Regression using fractional polynomials of

continuous covariates: Parsimonious parametric modelling (with discussion). Applied Statistics, 43, 429-467

Royston, P. & Lambert, P.C. 2011. Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model Stata Press.

Royston, P. & Parmar, M.K. 2002. Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat.Med., 21, (15) 2175-2197 available from: PM:12210632

Stata Conference 2015, Columbus, USA

top related