s052/ii.2(a2): applied data analysis roadmap of the course – what is today’s topic area?

35
© Willett, Harvard University Graduate School of Education, 11/03/2022 S052/II.2(a2) – Slide 1 More details can be found in the Course Objectives and Content” handout on the course webpage. Multiple Regression Analysis (MRA) i i i i X X Y 2 2 1 1 0 Do your residuals meet the required assumptions? Test for residual normalit y Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If your sole predictor is continuous, MRA is identical to correlational analysis If your sole predictor is dichotomous, MRA is identical to a t-test If your several predictors are categorical, MRA is identical to ANOVA If time is a predictor, you need discrete- time survival analysisIf your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotom ous outcome) Multinomia l logistic regression analysis (polytomo us outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non- linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, How do you deal with missing data? S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area? Today’s Topic Area

Upload: xantha-beck

Post on 31-Dec-2015

17 views

Category:

Documents


0 download

DESCRIPTION

If your several predictors are categorical , MRA is identical to ANOVA. If your sole predictor is continuous , MRA is identical to correlational analysis. If your sole predictor is dichotomous , MRA is identical to a t-test. Do your residuals meet the required assumptions ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 1

More details can be found in the “Course Objectives and Content” handout on the course webpage.More details can be found in the “Course Objectives and Content” handout on the course webpage.

Multiple RegressionAnalysis (MRA)

Multiple RegressionAnalysis (MRA) iiii XXY 22110

Do your residuals meet the required assumptions?

Test for residual

normality

Use influence statistics to

detect atypical datapoints

If your residuals are not independent,

replace OLS by GLS regression analysis

Use Individual

growth modeling

Specify a Multi-level

Model

If your sole predictor is continuous, MRA is

identical to correlational analysis

If your sole predictor is dichotomous, MRA is identical to a t-test

If your several predictors are

categorical, MRA is identical to ANOVA

If time is a predictor, you need discrete-

time survival analysis…

If your outcome is categorical, you need to

use…

Binomial logistic

regression analysis

(dichotomous outcome)

Multinomial logistic

regression analysis

(polytomous outcome)

If you have more predictors than you

can deal with,

Create taxonomies of fitted models and compare

them.Form composites of the indicators of any common

construct.

Conduct a Principal Components Analysis

Use Cluster Analysis

Use non-linear regression analysis.

Transform the outcome or predictor

If your outcome vs. predictor relationship

is non-linear,

How do you deal with missing

data?

S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

Today’s Topic Area

Page 2: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 2

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Printed Syllabus – What Is Today’s Topic?

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Printed Syllabus – What Is Today’s Topic?

Today, in the second part of Syllabus Section II.2(a), on Discrete-Time Survival Analysis, I will:Replicate life-table analyses by conducting logistic

regression analyses of EVENT as a function of PERIOD in the person-period dataset, using a general specification for PERIOD (Slides #4 - #17).

Show how the completely general specification for PERIOD can be represented in a useful “no intercept” version (Slides #18 - #23).

Appendix 1 compares the “intercept” and “no intercept” specifications of the DTSA model algebraically (Slide #24).

Appendix 2 demonstrates the arithmetic equivalence of the “intercept” and “no intercept” specifications (Slide #25).

Appendix 3 shows how the general specification of PERIOD can be replaced by more parsimonious polynomial specifications (Slides #26 - #35).

Today, in the second part of Syllabus Section II.2(a), on Discrete-Time Survival Analysis, I will:Replicate life-table analyses by conducting logistic

regression analyses of EVENT as a function of PERIOD in the person-period dataset, using a general specification for PERIOD (Slides #4 - #17).

Show how the completely general specification for PERIOD can be represented in a useful “no intercept” version (Slides #18 - #23).

Appendix 1 compares the “intercept” and “no intercept” specifications of the DTSA model algebraically (Slide #24).

Appendix 2 demonstrates the arithmetic equivalence of the “intercept” and “no intercept” specifications (Slide #25).

Appendix 3 shows how the general specification of PERIOD can be replaced by more parsimonious polynomial specifications (Slides #26 - #35).

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Please check inter-connections among the Roadmap, the Daily Topic Area, the Printed Syllabus, and the content of today’s class when you pre-read the day’s materials.

Page 3: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 3

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Three Kinds Of Survival Analysis

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Three Kinds Of Survival Analysis

Classical Methods of Survival Analysis

Simple data-analytic approaches for summarizing survival data appropriately:• Estimation of the sample

hazard function.• Estimation of the sample

survivor function.• Estimation of the median

lifetime. Simple tests of differences in

survivor function, by “group”:• Survival analytic equivalent

of the t-test.

Classical Methods of Survival Analysis

Simple data-analytic approaches for summarizing survival data appropriately:• Estimation of the sample

hazard function.• Estimation of the sample

survivor function.• Estimation of the median

lifetime. Simple tests of differences in

survivor function, by “group”:• Survival analytic equivalent

of the t-test.

Last TimeLast Time

Discrete-TimeSurvival Analysis

Easily replicates classical methods of survival analysis, using logistic regression analysis.

Reframes classical survival analytic methods in a regression format:• Permits the inclusion of

multiple predictors, including interactions.

• Provides testing with the Wald test & differences in the –2LL statistic.

• Fitted hazard & survivor functions, & median lifetimes, are easily recovered from the fitted logistic model.

Discrete-TimeSurvival Analysis

Easily replicates classical methods of survival analysis, using logistic regression analysis.

Reframes classical survival analytic methods in a regression format:• Permits the inclusion of

multiple predictors, including interactions.

• Provides testing with the Wald test & differences in the –2LL statistic.

• Fitted hazard & survivor functions, & median lifetimes, are easily recovered from the fitted logistic model.

Today, & Next TimeToday, & Next Time

Continuous-TimeSurvival Analysis

A replacement for discrete-time survival analysis when time has been measured continuously.

Imposes additional assumptions on the data.

Reframes classical survival analytic methods in a regression format:• Permits the inclusion of

predictors, including interactions.

• Accompanied by its own testing procedures, based on standard practices.

• Fitted hazard & survivor functions, & median lifetimes, are easily recovered from fitted models.

Continuous-TimeSurvival Analysis

A replacement for discrete-time survival analysis when time has been measured continuously.

Imposes additional assumptions on the data.

Reframes classical survival analytic methods in a regression format:• Permits the inclusion of

predictors, including interactions.

• Accompanied by its own testing procedures, based on standard practices.

• Fitted hazard & survivor functions, & median lifetimes, are easily recovered from fitted models.

Next year, … ?Next year, … ?

Page 4: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 4

Dataset SPEC_ED_PP.txt

Overview Person-period dataset containing the same information as the SPEC_ED.txt person dataset, on the career duration of special education teachers who began their teaching careers in the Michigan public schools between 1972 and 1978, and who were followed uninterruptedly until 1985.

Source State Department of Education, Michigan.

Sample size 24875 annual person-period records.

More Info Singer & Willett, 2003

In order to proceed, let’s continue to work in the person-period dataset and with the new summary statistics I have introduced:

Hazard probability & the hazard function. Survival probability & the survivor function. Median lifetime.

But, let’s use logistic regression analysis to model & estimate them ..

In order to proceed, let’s continue to work in the person-period dataset and with the new summary statistics I have introduced:

Hazard probability & the hazard function. Survival probability & the survivor function. Median lifetime.

But, let’s use logistic regression analysis to model & estimate them ..

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Person-Period Dataset for the Special Educator Data

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Person-Period Dataset for the Special Educator Data

Page 5: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 5

Recall that, in the person-period dataset from the previous class, each teacher has one row of data for each year of their career, and that each row contains the following information …Recall that, in the person-period dataset from the previous class, each teacher has one row of data for each year of their career, and that each row contains the following information …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Person-Period Dataset for the Special Educator Data

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Person-Period Dataset for the Special Educator Data

Col Var Variable Description Labels

1 ID Teacher identification code. Integer

2 PERIOD Records the discrete time period to which each record refers. Integer

3 EVENT Dummy variable indicating whether the teacher experienced the event of interest in this period. 0 = no; 1 = yes

4 P1

5 P2

6 P3

Etc.

The earlier YRSTCH variable,

which recorded the duration of the teaching career in the person-level dataset, has been

replaced by variable PERIOD, which labels the time-period to

which each row of the person-period

dataset refers.

We’ve also acquired a new variable called EVENT, which records whether a teacher

experienced the event of interest (“quit teaching”) in the particular discrete time-period in question

The person-period dataset contains other variables too, that are labeled and

explained in these rows of the codebook. We will incorporate these variables into

the analysis in today’s presentation.

Page 6: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 6

Person-PeriodDataset

ID PERIOD EVENT1 1 12 1 02 2 13 1 14 1 15 1 05 2 05 3 05 4 05 5 05 6 05 7 05 8 05 9 05 10 05 11 05 12 06 1 17 1 07 2 07 3 07 4 07 5 07 6 07 7 07 8 07 9 07 10 07 11 07 12 0Etc.

Person-PeriodDataset

ID PERIOD EVENT1 1 12 1 02 2 13 1 14 1 15 1 05 2 05 3 05 4 05 5 05 6 05 7 05 8 05 9 05 10 05 11 05 12 06 1 17 1 07 2 07 3 07 4 07 5 07 6 07 7 07 8 07 9 07 10 07 11 07 12 0Etc.

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Person-Period Dataset for the Special Educator Data

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Person-Period Dataset for the Special Educator Data

So, why not replace life table analysis by the logistic regression analysis of EVENT on PERIOD in the person-period dataset?From a technical perspective,

this turns out to be exactly the right thing to do.

It’s then called Discrete-Time Survival Analysis.

So, why not replace life table analysis by the logistic regression analysis of EVENT on PERIOD in the person-period dataset?From a technical perspective,

this turns out to be exactly the right thing to do.

It’s then called Discrete-Time Survival Analysis.

In our earlier life-table analysis in the person-period dataset: EVENT recorded whether the

teacher experienced the event of interest (quitting teaching) in each time PERIOD.

Conceptually, in these analyses: EVENT served as a

(dichotomous) outcome. PERIOD served as a

predictor.

In our earlier life-table analysis in the person-period dataset: EVENT recorded whether the

teacher experienced the event of interest (quitting teaching) in each time PERIOD.

Conceptually, in these analyses: EVENT served as a

(dichotomous) outcome. PERIOD served as a

predictor.

In a person-period dataset: Each person has one row of data in

each time-period. Their data record continues until, and

includes, the time-period in which they experience the event of interest, or are censored:

A person cannot be present in a time-period unless they had a value of 0 for EVENT in the previous period. In other words, they must have

survived the prior period. So, the person-period dataset has been

formatted to permit each person to be present in a particular time period only if they are a legitimate member of the risk set in that period.

In a person-period dataset: Each person has one row of data in

each time-period. Their data record continues until, and

includes, the time-period in which they experience the event of interest, or are censored:

A person cannot be present in a time-period unless they had a value of 0 for EVENT in the previous period. In other words, they must have

survived the prior period. So, the person-period dataset has been

formatted to permit each person to be present in a particular time period only if they are a legitimate member of the risk set in that period.

Notice how, in the person-period dataset, outcome EVENT has been encoded to

embody the same conditionality present in the definition of the hazard probability …

Notice how, in the person-period dataset, outcome EVENT has been encoded to

embody the same conditionality present in the definition of the hazard probability …

Page 7: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 7

Col Var Variable Description Labels

1 ID Teacher identification code. Integer

2 PERIOD Indicates discrete time period to which record refers. Integer

3 EVENT Dummy variable indicating whether the teacher experienced the event of interest in this period. 0 = no; 1 = yes

4 P1 Is this the first year of the teaching career? 0 = no; 1= yes

5 P2 Is this the second year of the teaching career? 0 = no; 1= yes

6 P3 Is this the third year of the teaching career? 0 = no; 1= yes

7 P4 Is this the fourth year of the teaching career? 0 = no; 1= yes

8 P5 Is this the fifth year of the teaching career? 0 = no; 1= yes

9 P6 Is this the sixth year of the teaching career? 0 = no; 1= yes

10 P7 Is this the seventh year of the teaching career? 0 = no; 1= yes

11 P8 Is this the eighth year of the teaching career? 0 = no; 1= yes

12 P9 Is this the ninth year of the teaching career? 0 = no; 1= yes

13 P10 Is this the tenth year of the teaching career? 0 = no; 1= yes

14 P11 Is this the eleventh year of the teaching career? 0 = no; 1= yes

15 P12 Is this the twelfth year of the teaching career? 0 = no; 1= yes

To conduct logistic regression analyses in the person-period dataset, we must think about how we represent time PERIOD in our models -- recall that the dataset contains a vector of predictors that we have not yet used …To conduct logistic regression analyses in the person-period dataset, we must think about how we represent time PERIOD in our models -- recall that the dataset contains a vector of predictors that we have not yet used …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Most General Way of Specifying Time PERIOD That Is Possible?

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Most General Way of Specifying Time PERIOD That Is Possible?

Dichotomous predictors, P1 thru P12 are defined to distinguish among the discrete time periods.

For each person in each period, each of the time period indicators, P1 thru P12, is set to 1 in the corresponding period, and 0 in other periods.

Representing PERIOD by these time indicators in our logistic regression analysis provides the most general

specification possible for any potential relationship between EVENT and

PERIOD.

Page 8: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 8

ID EVENT PERIOD P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 1 Quit 1 1 0 0 0 0 0 0 0 0 0 0 02 No Quit 1 1 0 0 0 0 0 0 0 0 0 0 02 Quit 2 0 1 0 0 0 0 0 0 0 0 0 03 Quit 1 1 0 0 0 0 0 0 0 0 0 0 04 Quit 1 1 0 0 0 0 0 0 0 0 0 0 05 No Quit 1 1 0 0 0 0 0 0 0 0 0 0 05 No Quit 2 0 1 0 0 0 0 0 0 0 0 0 05 No Quit 3 0 0 1 0 0 0 0 0 0 0 0 05 No Quit 4 0 0 0 1 0 0 0 0 0 0 0 05 No Quit 5 0 0 0 0 1 0 0 0 0 0 0 05 No Quit 6 0 0 0 0 0 1 0 0 0 0 0 05 No Quit 7 0 0 0 0 0 0 1 0 0 0 0 05 No Quit 8 0 0 0 0 0 0 0 1 0 0 0 05 No Quit 9 0 0 0 0 0 0 0 0 1 0 0 05 No Quit 10 0 0 0 0 0 0 0 0 0 1 0 05 No Quit 11 0 0 0 0 0 0 0 0 0 0 1 05 No Quit 12 0 0 0 0 0 0 0 0 0 0 0 16 Quit 1 1 0 0 0 0 0 0 0 0 0 0 0

ID EVENT PERIOD P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 1 Quit 1 1 0 0 0 0 0 0 0 0 0 0 02 No Quit 1 1 0 0 0 0 0 0 0 0 0 0 02 Quit 2 0 1 0 0 0 0 0 0 0 0 0 03 Quit 1 1 0 0 0 0 0 0 0 0 0 0 04 Quit 1 1 0 0 0 0 0 0 0 0 0 0 05 No Quit 1 1 0 0 0 0 0 0 0 0 0 0 05 No Quit 2 0 1 0 0 0 0 0 0 0 0 0 05 No Quit 3 0 0 1 0 0 0 0 0 0 0 0 05 No Quit 4 0 0 0 1 0 0 0 0 0 0 0 05 No Quit 5 0 0 0 0 1 0 0 0 0 0 0 05 No Quit 6 0 0 0 0 0 1 0 0 0 0 0 05 No Quit 7 0 0 0 0 0 0 1 0 0 0 0 05 No Quit 8 0 0 0 0 0 0 0 1 0 0 0 05 No Quit 9 0 0 0 0 0 0 0 0 1 0 0 05 No Quit 10 0 0 0 0 0 0 0 0 0 1 0 05 No Quit 11 0 0 0 0 0 0 0 0 0 0 1 05 No Quit 12 0 0 0 0 0 0 0 0 0 0 0 16 Quit 1 1 0 0 0 0 0 0 0 0 0 0 0

Here I have printed out the values of the time-period indicators for a few folk from the person-period dataset …Here I have printed out the values of the time-period indicators for a few folk from the person-period dataset …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Most General Way of Specifying Time PERIOD That Is Possible?

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The Most General Way of Specifying Time PERIOD That Is Possible?

Here’s the original 12 years of data on the time periods in which Teacher #5 was present in the person-period datasetHere’s the original 12 years of data on the time periods in which Teacher #5 was present in the person-period dataset

The time-period indicators, P1 - P12, identify each time-period in a very general way

The time-period indicators, P1 - P12, identify each time-period in a very general way

In the 1st time period:• P1 = 1• P2 thru P12 = 0

In the 1st time period:• P1 = 1• P2 thru P12 = 0

……In the 2nd time period:• P2 = 1• P1 & P3 thru P12 = 0

In the 2nd time period:• P2 = 1• P1 & P3 thru P12 = 0

In the 12th time period:• P12 = 1, • P1 thru P11 = 0.

In the 12th time period:• P12 = 1, • P1 thru P11 = 0.

Page 9: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 9

DATA SPEC_ED_PP; INFILE 'C:\DATA\S052\SPEC_ED_PP.txt'; INPUT ID PERIOD EVENT P1-P12; LABEL ID = 'Teacher Identification Code' PERIOD = 'Current Time Period' EVENT = 'Did Teacher Quit in this Time Period?';

PROC FORMAT; VALUE EFMT 0='No Quit' 1='Quit';

* Print first 33 rows from the person-period dataset, to reveal the coding of time period dummies, P1-P12; PROC PRINT DATA=SPEC_ED_PP(OBS=33); VAR ID EVENT PERIOD P1-P12; FORMAT EVENT EFMT.;

* Predict event occurrence ("quitting teaching") by P2-P12; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = P2-P12; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED1 PREDICTED=PREDQUIT1;

DATA SPEC_ED_PP; INFILE 'C:\DATA\S052\SPEC_ED_PP.txt'; INPUT ID PERIOD EVENT P1-P12; LABEL ID = 'Teacher Identification Code' PERIOD = 'Current Time Period' EVENT = 'Did Teacher Quit in this Time Period?';

PROC FORMAT; VALUE EFMT 0='No Quit' 1='Quit';

* Print first 33 rows from the person-period dataset, to reveal the coding of time period dummies, P1-P12; PROC PRINT DATA=SPEC_ED_PP(OBS=33); VAR ID EVENT PERIOD P1-P12; FORMAT EVENT EFMT.;

* Predict event occurrence ("quitting teaching") by P2-P12; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = P2-P12; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED1 PREDICTED=PREDQUIT1;

Here’s the SAS code for Data-Analytic Handout II.2(a).3, in which I conduct the suggested logistic regression analyses of EVENT for the first time … Here’s the SAS code for Data-Analytic Handout II.2(a).3, in which I conduct the suggested logistic regression analyses of EVENT for the first time …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

Here I list the values of EVENT and P1 thru P12 for the few cases we inspected on the previous slide.

Here I list the values of EVENT and P1 thru P12 for the few cases we inspected on the previous slide.

Here I output predicted values, PREDQUIT1, into a newdataset called PREDICTED1, to facilitate subsequent

listing of the fitted hazard probabilities and plotting of the fitted hazard function.

Here I output predicted values, PREDQUIT1, into a newdataset called PREDICTED1, to facilitate subsequent

listing of the fitted hazard probabilities and plotting of the fitted hazard function.

Here are the time-period indicators -- P1 through P12 -- that were present in the person-period dataset, but

were input and ignored up to this point.

Here are the time-period indicators -- P1 through P12 -- that were present in the person-period dataset, but

were input and ignored up to this point.

Here, I specify EVENT as a logistic function of time-period indicators, P2 thru P12, and fit the model in the person-period dataset: I have omitted one time-period indicator, by

choice – here, P1 – as usual, to avoid complete multi-collinearity among them all.

As with dichotomous predictors in any analysis, the omission of one dummy predictor defines a “reference category” for interpretation later.

The hypothesized probability of event occurrence for the ith person, in the jth time-period, is then:

Here, I specify EVENT as a logistic function of time-period indicators, P2 thru P12, and fit the model in the person-period dataset: I have omitted one time-period indicator, by

choice – here, P1 – as usual, to avoid complete multi-collinearity among them all.

As with dichotomous predictors in any analysis, the omission of one dummy predictor defines a “reference category” for interpretation later.

The hypothesized probability of event occurrence for the ith person, in the jth time-period, is then:

ijijij PPPji

eth

121233220 ...1

1)(

Page 10: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 10

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

Using this difference in –2LL statistic between unconditional and current models,

we can test the null hypothesis that time period indicators, P2 thru P12, have no joint

effect on EVENT, in the population.

Using this difference in –2LL statistic between unconditional and current models,

we can test the null hypothesis that time period indicators, P2 thru P12, have no joint

effect on EVENT, in the population.

Model Fit Statistics:Intercept only, -2LL = 14903.8Intercept & covariates, -2LL = 14583.7

Difference in –2LL = 320.1

Model Fit Statistics:Intercept only, -2LL = 14903.8Intercept & covariates, -2LL = 14583.7

Difference in –2LL = 320.1

Here’s the fitted logistic regression model … interpreting the associated hypothesis tests is straightforward!Here’s the fitted logistic regression model … interpreting the associated hypothesis tests is straightforward!

We reject the H0 that the time-period indicators, P2 thru P12,

have no joint effect on EVENT, in the population

(p<.0001).

We reject the H0 that the time-period indicators, P2 thru P12,

have no joint effect on EVENT, in the population

(p<.0001).

Notice that the current model contains time-period

indicators, P2 thru P12, as predictors

Notice that the current model contains time-period

indicators, P2 thru P12, as predictors

Page 11: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 11

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

Fitted values of the outcome are obtained as usual by substituting predictor values into the fitted model ..Fitted values of the outcome are obtained as usual by substituting predictor values into the fitted model ..

To compute fitted probability of quitting teaching in TIME PERIOD #1, I set all included time-period indicators -- P2 thru P12 -- to a value of 0, as follows:

To compute fitted probability of quitting teaching in TIME PERIOD #1, I set all included time-period indicators -- P2 thru P12 -- to a value of 0, as follows:

1157.0)(ˆ642.71

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1

0337.21

0337.21

ˆ1

)0(ˆ...)0(ˆ)0(ˆˆ1

0

12320

th

th

eth

eth

eth

eth

i

i

i

i

i

i

Fitted hazard probability in time period #1:

Fitted hazard probability in time period #1:

Page 12: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 12

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

Fitted values of the outcome are obtained by substituting predictor values into the fitted model, as usual!Fitted values of the outcome are obtained by substituting predictor values into the fitted model, as usual!

To compute fitted probability of quitting teaching in TIME PERIOD #2, I set time-period indicator P2 to 1, and the rest of the indicators to 0, as follows:

To compute fitted probability of quitting teaching in TIME PERIOD #2, I set time-period indicator P2 to 1, and the rest of the indicators to 0, as follows:

1102.0)(ˆ075.81

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

2

2

0888.22

0551.00337.22

ˆˆ2

)0(ˆ...)0(ˆ)1(ˆˆ1

20

12320

th

th

eth

eth

eth

eth

i

i

i

i

i

i

Fitted hazard probability in time period #2:

Fitted hazard probability in time period #2:

Page 13: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 13

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 320.1019 11 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq Intercept -2.0337 0.0498 1667.8155 <.0001 P2 -0.0551 0.0735 0.5617 0.4536 P3 0.000610 0.0750 0.0001 0.9935 P4 -0.0819 0.0792 1.0687 0.3012 P5 -0.2911 0.0867 11.2736 0.0008 P6 -0.3745 0.0917 16.6879 <.0001 P7 -0.7152 0.1055 45.9575 <.0001 P8 -0.9512 0.1256 57.3427 <.0001 P9 -1.0886 0.1489 53.4261 <.0001 P10 -1.2277 0.1793 46.8833 <.0001 P11 -1.6425 0.2580 40.5338 <.0001 P12 -2.3104 0.4524 26.0867 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitting the First DTSA Model, Using the Most General Specification for PERIOD

Fitted values of the outcome are obtained by substituting predictor values into the fitted model, as usual!Fitted values of the outcome are obtained by substituting predictor values into the fitted model, as usual!

To compute fitted probability of quitting teaching in the TIME PERIOD #3, I set time-period indicator P3 to 1, and the rest of the indicators to 0, as follows:

To compute fitted probability of quitting teaching in the TIME PERIOD #3, I set time-period indicator P3 to 1, and the rest of the indicators to 0, as follows:

1158.0)(ˆ647.71

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

3

3

0343.23

0006.0337.23

ˆˆ3

)0(ˆ...)1(ˆ)0(ˆˆ1

30

12320

th

th

eth

eth

eth

eth

i

i

i

i

i

i

Etc. … Of course, you don’t need to do these calculations yourself … you can use the predicted values! Etc. … Of course, you don’t need to do these calculations yourself … you can use the predicted values!

Fitted hazard probability for time period #3:

Fitted hazard probability for time period #3:

Page 14: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 14

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Obtaining, Inspecting and Plotting Fitted Hazard Probabilities Automatically

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Obtaining, Inspecting and Plotting Fitted Hazard Probabilities Automatically

* Predict event occurrence ("quitting teaching") by P2-P12;

PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = P2-P12; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED1 PREDICTED=PREDQUIT1;

* Re-sort output dataset and pick out the 12 unique values of predicted hazard probability, one per discrete time period;

PROC SORT DATA=PREDICTED1; BY PERIOD; DATA PREDICTED1; SET PREDICTED1; BY PERIOD; IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one/discrete period;

PROC PRINT DATA=PREDICTED1; VAR PERIOD PREDQUIT1; PROC PLOT DATA=PREDICTED1; PLOT PREDQUIT1*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

* Predict event occurrence ("quitting teaching") by P2-P12;

PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = P2-P12; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED1 PREDICTED=PREDQUIT1;

* Re-sort output dataset and pick out the 12 unique values of predicted hazard probability, one per discrete time period;

PROC SORT DATA=PREDICTED1; BY PERIOD; DATA PREDICTED1; SET PREDICTED1; BY PERIOD; IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one/discrete period;

PROC PRINT DATA=PREDICTED1; VAR PERIOD PREDQUIT1; PROC PLOT DATA=PREDICTED1; PLOT PREDQUIT1*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

PC-SAS code for obtaining, inspecting & plotting the predicted values in each of the discrete time periods …PC-SAS code for obtaining, inspecting & plotting the predicted values in each of the discrete time periods …

Here’s standard output of predicted values, PREDQUIT1, into a new dataset called PREDICTED1,

to facilitate subsequent listing of the fitted hazard probabilities and plotting of the fitted hazard function.

Here’s standard output of predicted values, PREDQUIT1, into a new dataset called PREDICTED1,

to facilitate subsequent listing of the fitted hazard probabilities and plotting of the fitted hazard function.

Here, I sort the predicted values by time-period, picking out the first value listed in each time period.

Here, I sort the predicted values by time-period, picking out the first value listed in each time period.

List the fitted values for inspection. These turn out

to be the fitted hazard probabilities.

List the fitted values for inspection. These turn out

to be the fitted hazard probabilities.

Plot the fitted values versus time-period. This turnd out to be the fitted

hazard function.

Plot the fitted values versus time-period. This turnd out to be the fitted

hazard function.

Page 15: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 15

EVENT(Did Teacher Quit in this Time Period?) 

Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo Quit ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆQuit ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229

EVENT(Did Teacher Quit in this Time Period?) 

Frequency‚Col Pct ‚ 1‚ 2‚ 3‚ 4‚ 5‚ 6‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆNo Quit ‚ 3485 ‚ 3101 ‚ 2742 ‚ 2447 ‚ 2229 ‚ 2045 ‚ ‚ 88.43 ‚ 88.98 ‚ 88.42 ‚ 89.24 ‚ 91.09 ‚ 91.75 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆQuit ‚ 456 ‚ 384 ‚ 359 ‚ 295 ‚ 218 ‚ 184 ‚ ‚ 11.57 ‚ 11.02 ‚ 11.58 ‚ 10.76 ‚ 8.91 ‚ 8.25 ‚ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3941 3485 3101 2742 2447 2229

PERIOD(Current Time Period) 

‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875

PERIOD(Current Time Period) 

‚ 7‚ 8‚ 9‚ 10‚ 11‚ 12‚ Totalˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 1922 ‚ 1563 ‚ 1203 ‚ 913 ‚ 632 ‚ 386 ‚ 22668‚ 93.99 ‚ 95.19 ‚ 95.78 ‚ 96.31 ‚ 97.53 ‚ 98.72 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ‚ 123 ‚ 79 ‚ 53 ‚ 35 ‚ 16 ‚ 5 ‚ 2207‚ 6.01 ‚ 4.81 ‚ 4.22 ‚ 3.69 ‚ 2.47 ‚ 1.28 ‚ˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2045 1642 1256 948 648 391 24875

And here are the sample hazard probabilities, from the life-table analysis

And here are the sample hazard probabilities, from the life-table analysis

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Inspecting the Fitted Probabilities by PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Inspecting the Fitted Probabilities by PERIOD

PERIOD PREDQUIT1 1 0.11571 2 0.11019 3 0.11577 4 0.10759 5 0.08909 6 0.08255 7 0.06015 8 0.04811 9 0.04220 10 0.03692 11 0.02469 12 0.01282

PERIOD PREDQUIT1 1 0.11571 2 0.11019 3 0.11577 4 0.10759 5 0.08909 6 0.08255 7 0.06015 8 0.04811 9 0.04220 10 0.03692 11 0.02469 12 0.01282

Here are the fitted probabilities, direct from the PC-SAS output …Here are the fitted probabilities, direct from the PC-SAS output …

Notice that the fitted probabilities obtained in the logistic regression analysis are identical to the sample hazard probabilities obtained in the life table analysis …

Notice that the fitted probabilities obtained in the logistic regression analysis are identical to the sample hazard probabilities obtained in the life table analysis …

Page 16: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 16

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitted Hazard Probabilities vs. PERIOD

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Fitted Hazard Probabilities vs. PERIOD

Fitted Hazard FunctionMost General Specification of PERIOD

0.14 ˆ ‚ ‚ ‚E 0.12 ˆs ‚ P Pt ‚ P Pi ‚m 0.10 ˆa ‚t ‚ Pe ‚ Pd 0.08 ˆ ‚P ‚r ‚o 0.06 ˆ Pb ‚a ‚ Pb ‚i 0.04 ˆ Pl ‚ Pi ‚t ‚ Py 0.02 ˆ ‚ P ‚ ‚ 0.00 ˆ Šƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Current Time Period

Fitted Hazard FunctionMost General Specification of PERIOD

0.14 ˆ ‚ ‚ ‚E 0.12 ˆs ‚ P Pt ‚ P Pi ‚m 0.10 ˆa ‚t ‚ Pe ‚ Pd 0.08 ˆ ‚P ‚r ‚o 0.06 ˆ Pb ‚a ‚ Pb ‚i 0.04 ˆ Pl ‚ Pi ‚t ‚ Py 0.02 ˆ ‚ P ‚ ‚ 0.00 ˆ Šƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Current Time Period

PERIOD PREDQUIT1 1 0.11571 2 0.11019 3 0.11577 4 0.10759 5 0.08909 6 0.08255 7 0.06015 8 0.04811 9 0.04220 10 0.03692 11 0.02469 12 0.01282

PERIOD PREDQUIT1 1 0.11571 2 0.11019 3 0.11577 4 0.10759 5 0.08909 6 0.08255 7 0.06015 8 0.04811 9 0.04220 10 0.03692 11 0.02469 12 0.01282

And, here are the fitted probabilities plotted versus time period, from the PC-SAS output …

And, here are the fitted probabilities plotted versus time period, from the PC-SAS output …

Notice that the fitted probabilities from the

logistic regression analysis provide the same sample hazard

function that we obtained in the life table

analysis

Notice that the fitted probabilities from the

logistic regression analysis provide the same sample hazard

function that we obtained in the life table

analysis

We conclude that we can replicate life-table analysis by conducting logistic regression analyses in the person-period dataset … we refer to this as Discrete-Time Survival Analysis.

We conclude that we can replicate life-table analysis by conducting logistic regression analyses in the person-period dataset … we refer to this as Discrete-Time Survival Analysis.

Page 17: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 17

Fitted Survivor Function

0.0000

0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

1.0000

0 2 4 6 8 10 12

Year in Teaching

Sam

ple

Su

rviv

or P

rob

abili

ty

Once you’ve estimated the fitted hazard probabilities in each time period, you can plot the hazard function and, from it, estimate the fitted survivor function and median lifetime in the usual way …Once you’ve estimated the fitted hazard probabilities in each time period, you can plot the hazard function and, from it, estimate the fitted survivor function and median lifetime in the usual way …

6.6 years

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Finishing The Job – Fitted Survivor Function & Median Lifetime Statistic

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Finishing The Job – Fitted Survivor Function & Median Lifetime Statistic

Fitted Hazard Function

0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

1 2 3 4 5 6 7 8 9 10 11 12

Year in Teaching Career

Haz

ard

Pro

bab

ility

Page 18: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 18

* Predict event occurrence ("quitting teaching") again by time-period dummies, but avoid collinearity by retaining all time-period dummies & dropping intercept; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = P1-P12 / NOINT ; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED2 PREDICTED=PREDQUIT2;

* Re-sort output dataset and pick out the twelve unique values of predicted hazard probability, one per discrete time period; PROC SORT DATA=PREDICTED2; BY PERIOD; DATA PREDICTED2; SET PREDICTED2;

BY PERIOD;IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one per discrete period; PROC PRINT DATA=PREDICTED2; VAR PERIOD PREDQUIT2; PROC PLOT DATA=PREDICTED2; PLOT PREDQUIT2*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

* Predict event occurrence ("quitting teaching") again by time-period dummies, but avoid collinearity by retaining all time-period dummies & dropping intercept; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = P1-P12 / NOINT ; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED2 PREDICTED=PREDQUIT2;

* Re-sort output dataset and pick out the twelve unique values of predicted hazard probability, one per discrete time period; PROC SORT DATA=PREDICTED2; BY PERIOD; DATA PREDICTED2; SET PREDICTED2;

BY PERIOD;IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one per discrete period; PROC PRINT DATA=PREDICTED2; VAR PERIOD PREDQUIT2; PROC PLOT DATA=PREDICTED2; PLOT PREDQUIT2*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

Usefully, you can specify the discrete-time hazard model in another equivalent way … with “no intercept”Usefully, you can specify the discrete-time hazard model in another equivalent way … with “no intercept”

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

Here are the usual listing and bivariate plot of the fitted (hazard) probabilities versus time-period.

Here are the usual listing and bivariate plot of the fitted (hazard) probabilities versus time-period.

Notice you can regress EVENT on all the time-period dummies, P1 thru P12, by dropping the intercept from the model:

Notice the “NOINT” option.

Omission of the intercept parameter changes the interpretation of the parameters associated with the time-dummies, but the new interpretation is very useful!

The new discrete-time hazard model, for the ith person on the jth occasion, is then:

Notice you can regress EVENT on all the time-period dummies, P1 thru P12, by dropping the intercept from the model:

Notice the “NOINT” option.

Omission of the intercept parameter changes the interpretation of the parameters associated with the time-dummies, but the new interpretation is very useful!

The new discrete-time hazard model, for the ith person on the jth occasion, is then:

ijijijij PPPPji

eth

1212332211 ...1

1)(

Page 19: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 19

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

The -2LL statistic for this model is identical to its earlier value in the general “intercept” specification.The -2LL statistic for this model is identical to its

earlier value in the general “intercept” specification.But, the difference in the -2LL

statistic between the unconditional and current models

is not the same as the value obtained under the “intercept”

specification.

But, the difference in the -2LL statistic between the

unconditional and current models is not the same as the value

obtained under the “intercept” specification.

Notice the current model contains all the time-period indicators, P1

thru P12, as predictors

Notice the current model contains all the time-period indicators, P1

thru P12, as predictors

Why? Because now a comparison of model -2LL statistics is testing

the null hypothesis that “all hazard probabilities are jointly equal to zero” in the population,

rather than “all hazard probabilities are jointly equal to the hazard probability in time

period #1” (Appendix I).

Page 20: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 20

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Under the general “no intercept” specification for PERIOD, the recovery of the fitted hazard probabilities in each time period is simpler … here’s the computation of the fitted hazard probability in time period #1 …Under the general “no intercept” specification for PERIOD, the recovery of the fitted hazard probabilities in each time period is simpler … here’s the computation of the fitted hazard probability in time period #1 …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

To compute fitted probability of quitting teaching in time period #1, I set time indicator P1 to 1 and all other time indicators to 0, as follows:

To compute fitted probability of quitting teaching in time period #1, I set time indicator P1 to 1 and all other time indicators to 0, as follows:

1157.0)(ˆ642.71

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1

0337.21

0337.21

ˆ1

)0(ˆ...)0(ˆ)0(ˆ)1(ˆ1

1

12321

th

th

eth

eth

eth

eth

i

i

i

i

i

Fitted hazard probability in time period #1:• Identical to the value

obtained in earlier life table and discrete-time survival analyses.

Fitted hazard probability in time period #1:• Identical to the value

obtained in earlier life table and discrete-time survival analyses.

Page 21: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 21

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Here’s the computation of the fitted hazard probability in time period #2 …Here’s the computation of the fitted hazard probability in time period #2 …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

To compute fitted probability of quitting teaching in time period #2, I set time indicator P2 to 1 and all other time indicators to 0, as follows:

To compute fitted probability of quitting teaching in time period #2, I set time indicator P2 to 1 and all other time indicators to 0, as follows:

1102.0)(ˆ075.81

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

1

1)(ˆ

2

2

0888.22

0888.22

ˆ2

)0(ˆ...)0(ˆ)1(ˆ)0(ˆ1

2

12321

th

th

eth

eth

eth

eth

i

i

i

i

i

Fitted hazard probability for time period #2:• Identical to the value

obtained in earlier life table and discrete-time survival analyses.

Fitted hazard probability for time period #2:• Identical to the value

obtained in earlier life table and discrete-time survival analyses.

Page 22: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 22

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Model Fit Statistics

Without With Criterion Covariates Covariates -2 Log L 34484.072 14583.742

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 19900.3302 12 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter Estimate Error Chi-Square Pr > ChiSq P1 -2.0337 0.0498 1667.8155 <.0001 P2 -2.0888 0.0541 1490.8689 <.0001 P3 -2.0331 0.0561 1312.1587 <.0001 P4 -2.1156 0.0616 1178.3470 <.0001 P5 -2.3248 0.0710 1073.2694 <.0001 P6 -2.4082 0.0770 979.0219 <.0001 P7 -2.7489 0.0930 873.5642 <.0001 P8 -2.9849 0.1153 670.0029 <.0001 P9 -3.1223 0.1404 494.8756 <.0001 P10 -3.2614 0.1722 358.5382 <.0001 P11 -3.6763 0.2531 210.9037 <.0001 P12 -4.3464 0.4501 93.2502 <.0001

Here’s the computation of the fitted hazard probability during time period j …Here’s the computation of the fitted hazard probability during time period j …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

S052/II.2(a2): Introducing Discrete-Time Survival Analysis An Interesting Alternative General Specification of PERIOD, This Time With “No Intercept”

And the “no intercept” specification is so useful because this formula can be

programmed in PC-SAS, as we will see!

And the “no intercept” specification is so useful because this formula can be

programmed in PC-SAS, as we will see!

In general, with the “no intercept” specification, the fitted hazard probability in any time period tj is:

In general, with the “no intercept” specification, the fitted hazard probability in any time period tj is:

jeth j ̂1

1)(ˆ

Page 23: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 23

Timeperiod

# Teachersin the risk set

in this time period

# Teachers who quit in this time period

Samplehazardprob

Fittedhazard,general

PERIOD

1 3941 456 0.1157 0.11572 3485 384 0.1102 0.11023 3101 359 0.1158 0.11584 2742 295 0.1076 0.10765 2447 218 0.0891 0.08916 2229 184 0.0825 0.08267 2045 123 0.0601 0.06028 1642 79 0.0481 0.04819 1256 53 0.0422 0.042210 948 35 0.0369 0.036911 648 16 0.0247 0.024712 391 5 0.0128 0.0128

0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

1 2 3 4 5 6 7 8 9 10 11 12

Year in Teaching Career

Haz

ard

Pro

babi

lity

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The “Intercept” and “No Intercept” Specifications Provide the Same Hazard Function

S052/II.2(a2): Introducing Discrete-Time Survival Analysis The “Intercept” and “No Intercept” Specifications Provide the Same Hazard Function

Life table (sample) estimates of the hazard probability

Life table (sample) estimates of the hazard probability

Discrete-time survival analysis estimates of hazard probability, assuming a general

specification of PERIOD, using time indicators P1 through P12.

Discrete-time survival analysis estimates of hazard probability, assuming a general

specification of PERIOD, using time indicators P1 through P12.

Fitted hazard functions are identical – we can replicate life-table analysis with DTSA “no intercept” approach!!!Fitted hazard functions are identical – we can replicate life-table analysis with DTSA “no intercept” approach!!!

Sample hazard probabilities obtained from the earlier life-

table analysis.

Sample hazard probabilities obtained from the earlier life-

table analysis.

Predicted values of EVENT, PREDQUIT2, obtained from the no intercept specification of logistic regression model.

Predicted values of EVENT, PREDQUIT2, obtained from the no intercept specification of logistic regression model.

Page 24: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 24

Log-oddsi(tj)

Period1 2 3 11 124

02

311

124 ……

iiiji PPPtoddslog 121233220 ...

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 1: Null Hypotheses Tested Under Each Time Indicator Specification

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 1: Null Hypotheses Tested Under Each Time Indicator Specification

2 ……3

411

12

1

Period1 2 3 11 124

… tests that all population values of the outcome in periods #1 through #12 are zero

… tests that all population values of the outcome in periods #1 through #12 are zero

0;0;;0;0: 1211210 H

Coding of Time Period DummiesPERIOD P1 P2 P3 .. P11 P12  1 1 0 0 .. 0 0 2 0 1 0 .. 0 0 3 0 0 1 .. 0 0 4 0 0 0 .. 0 0

11 0 0 0 .. 1 0 12 0 0 0 .. 0 1

Coding of Time Period DummiesPERIOD P1 P2 P3 .. P11 P12  1 1 0 0 .. 0 0 2 0 1 0 .. 0 0 3 0 0 1 .. 0 0 4 0 0 0 .. 0 0

11 0 0 0 .. 1 0 12 0 0 0 .. 0 1 iiiiji PPPPtoddslog 1212332211 ... Log-oddsi(tj)

… tests that the population values of the outcome in periods #2 through #12 are identical to the population value of the

outcome in the reference period (Period #1).

… tests that the population values of the outcome in periods #2 through #12 are identical to the population value of the

outcome in the reference period (Period #1).

0;0;;0;0: 1211320 H

Page 25: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 25

Model Fit Statistics  Intercept Intercept and Only Covariates-2 Log L 14903.844 14583.742    Maximum Likelihood Estimates

Parameter Estimate

Intercept -2.0337 P2 -0.0551 P3 0.000610 P4 -0.0819 P5 -0.2911 P6 -0.3745 P7 -0.7152 P8 -0.9512 P9 -1.0886 P10 -1.2277 P11 -1.6425 P12 -2.3104

Model Fit Statistics  Intercept Intercept and Only Covariates-2 Log L 14903.844 14583.742    Maximum Likelihood Estimates

Parameter Estimate

Intercept -2.0337 P2 -0.0551 P3 0.000610 P4 -0.0819 P5 -0.2911 P6 -0.3745 P7 -0.7152 P8 -0.9512 P9 -1.0886 P10 -1.2277 P11 -1.6425 P12 -2.3104

With an intercept …With an intercept …

Model Fit Statistics

Without With Covariates Covariates-2 Log L 34484.072 14583.742    Maximum Likelihood Estimates  Parameter Estimate  P1 -2.0337 P2 -2.0888 P3 -2.0331 P4 -2.1156 P5 -2.3248 P6 -2.4082 P7 -2.7489 P8 -2.9849 P9 -3.1223 P10 -3.2614 P11 -3.6763 P12 -4.3464

Model Fit Statistics

Without With Covariates Covariates-2 Log L 34484.072 14583.742    Maximum Likelihood Estimates  Parameter Estimate  P1 -2.0337 P2 -2.0888 P3 -2.0331 P4 -2.1156 P5 -2.3248 P6 -2.4082 P7 -2.7489 P8 -2.9849 P9 -3.1223 P10 -3.2614 P11 -3.6763 P12 -4.3464

Without an intercept …Without an intercept …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 2: The Arithmetic Equivalence of the Two Time Indicator Specifications

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 2: The Arithmetic Equivalence of the Two Time Indicator Specifications

Identical goodness of fit statisticsIdentical goodness of fit statistics

Identical estimates & identical interpretation for the fitted logit

hazard associated with Time Period #1

Identical estimates & identical interpretation for the fitted logit

hazard associated with Time Period #1

Coeffient is the difference in fitted logit hazard between

time periods #2 & #1.

Coeffient is the difference in fitted logit hazard between

time periods #2 & #1.

Coeff is the fitted logit hazard in time period #2

Coeff is the fitted logit hazard in time period #2

-2.0888-2.0888(-2.0337) + (-0.0551)(-2.0337) + (-0.0551)

Etc.Etc.

==

Page 26: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 26

*---------------------------------------------------------------------------------* Now refit the discrete-time hazard model, replacing the general specification of PERIOD -- which used the time-period dummies P1-P12 -- by more parsimonious polynomial representations of period.*---------------------------------------------------------------------------------*; DATA SPEC_ED_PP; SET SPEC_ED_PP;

* Create power transformations of PERIOD, that will serve as predictors in place of time-period dummies P1-P12, in the discrete-time hazard model; * Create the square of PERIOD; PERIODSQ = PERIOD*PERIOD; * Create the square of PERIOD; PERIODCUB = PERIOD*PERIOD*PERIOD; * Could also create the quartic, quintic, etc. of PERIOD, if needed;KEEP ID PERIOD EVENT PERIODSQ PERIODCUB;

* Print first few rows of person-period dataset, showing PERIOD and its corresponding power transformations; PROC PRINT DATA=SPEC_ED_PP(OBS=33); VAR ID EVENT PERIOD PERIODSQ PERIODCUB; FORMAT EVENT EFMT.;

*---------------------------------------------------------------------------------* Now refit the discrete-time hazard model, replacing the general specification of PERIOD -- which used the time-period dummies P1-P12 -- by more parsimonious polynomial representations of period.*---------------------------------------------------------------------------------*; DATA SPEC_ED_PP; SET SPEC_ED_PP;

* Create power transformations of PERIOD, that will serve as predictors in place of time-period dummies P1-P12, in the discrete-time hazard model; * Create the square of PERIOD; PERIODSQ = PERIOD*PERIOD; * Create the square of PERIOD; PERIODCUB = PERIOD*PERIOD*PERIOD; * Could also create the quartic, quintic, etc. of PERIOD, if needed;KEEP ID PERIOD EVENT PERIODSQ PERIODCUB;

* Print first few rows of person-period dataset, showing PERIOD and its corresponding power transformations; PROC PRINT DATA=SPEC_ED_PP(OBS=33); VAR ID EVENT PERIOD PERIODSQ PERIODCUB; FORMAT EVENT EFMT.;

You can replace the general “dummy” specification by a polynomial specification, often quite successfully … I tried this several times in the back half of Data Analytic Handout II_2a_3 … You can replace the general “dummy” specification by a polynomial specification, often quite successfully … I tried this several times in the back half of Data Analytic Handout II_2a_3 …

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3: Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3: Conducting DTSA Using Polynomial Functions of Period

Since we are investigating the relationship between EVENT and PERIOD, let’s not assume that it is completely general: Let’s create some

polynomial transformations of PERIOD to try out as potential predictors.

Linear, quadratic, cubic?, quartic?, etc.

Since we are investigating the relationship between EVENT and PERIOD, let’s not assume that it is completely general: Let’s create some

polynomial transformations of PERIOD to try out as potential predictors.

Linear, quadratic, cubic?, quartic?, etc.

Print out a few cases for inspection.

Print out a few cases for inspection.

Page 27: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 27

Here’s the discrete-time survival analyses … first, let’s use logistic regression analysis to check whether the log-odds of event occurrence is linear in PERIOD …Here’s the discrete-time survival analyses … first, let’s use logistic regression analysis to check whether the log-odds of event occurrence is linear in PERIOD …

*---------------------------------------------------------------------------------* Fit discrete-time hazard models, in which event occurrence ("quitting teaching") is predicted by polynomial functions of PERIOD of gradually increasing complexity, rather than by the time-period dummies, P1-P12*---------------------------------------------------------------------------------*;* Include only the linear effect of PERIOD; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = PERIOD; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED3 PREDICTED=PREDQUIT3;* Re-sort output dataset and pick out the twelve unique values of predicted hazard probability, one per discrete time period; PROC SORT DATA=PREDICTED3; BY PERIOD; DATA PREDICTED3; SET PREDICTED3;

BY PERIOD;IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one per discrete period; PROC PRINT DATA=PREDICTED3; VAR PERIOD PREDQUIT3; PROC PLOT DATA=PREDICTED3; PLOT PREDQUIT3*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

*---------------------------------------------------------------------------------* Fit discrete-time hazard models, in which event occurrence ("quitting teaching") is predicted by polynomial functions of PERIOD of gradually increasing complexity, rather than by the time-period dummies, P1-P12*---------------------------------------------------------------------------------*;* Include only the linear effect of PERIOD; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = PERIOD; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED3 PREDICTED=PREDQUIT3;* Re-sort output dataset and pick out the twelve unique values of predicted hazard probability, one per discrete time period; PROC SORT DATA=PREDICTED3; BY PERIOD; DATA PREDICTED3; SET PREDICTED3;

BY PERIOD;IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one per discrete period; PROC PRINT DATA=PREDICTED3; VAR PERIOD PREDQUIT3; PROC PLOT DATA=PREDICTED3; PLOT PREDQUIT3*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Conduct a logistic regression analysis of EVENT on PERIOD in the person-period dataset.

Conduct a logistic regression analysis of EVENT on PERIOD in the person-period dataset.

Regress the (log-odds of) EVENT on a linear function of PERIOD in the usual way.

Regress the (log-odds of) EVENT on a linear function of PERIOD in the usual way.

Output the predicted values of EVENT – these will be the fitted hazard probabilities – here, called PREDQUIT3, into the person-period dataset.

Output the predicted values of EVENT – these will be the fitted hazard probabilities – here, called PREDQUIT3, into the person-period dataset.

Plot the fitted probabilities of EVENT occurrence against PERIOD – this provides the fitted hazard

function.

Plot the fitted probabilities of EVENT occurrence against PERIOD – this provides the fitted hazard

function.

Print out values of the fitted hazard probability of a few cases for inspection …

Print out values of the fitted hazard probability of a few cases for inspection …

Page 28: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 28

Here’s the fitted discrete-time hazard model with PERIOD specified as a linear effect …Here’s the fitted discrete-time hazard model with PERIOD specified as a linear effect …

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14627.030

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 276.8144 1 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.7560 0.0395 1976.4130 <.0001 PERIOD 1 -0.1353 0.00854 251.0336 <.0001

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14627.030

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 276.8144 1 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -1.7560 0.0395 1976.4130 <.0001 PERIOD 1 -0.1353 0.00854 251.0336 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

-2LL statistic:Intercept only, -2LL = 14903.8Intercept & covariates, -2LL = 14627.0

Difference in –2LL = 276.8

-2LL statistic:Intercept only, -2LL = 14903.8Intercept & covariates, -2LL = 14627.0

Difference in –2LL = 276.8

Using either the approximate test based on the Wald 2 statistic or the preferred difference in –2LL test, we can reject the null hypothesis that linear PERIOD has no effect on EVENT occurrence in the population (2 = 251.0,

df =1, p<.0001; 2 = 276.8, df =1, p<.0001, respectively)

Using either the approximate test based on the Wald 2 statistic or the preferred difference in –2LL test, we can reject the null hypothesis that linear PERIOD has no effect on EVENT occurrence in the population (2 = 251.0,

df =1, p<.0001; 2 = 276.8, df =1, p<.0001, respectively)

Page 29: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 29

Fitted Hazard FunctionAssuming A Linear Specification of PERIOD

0.14 ˆ ‚ ‚ P ‚E 0.12 ˆs ‚ Pt ‚i ‚ Pm 0.10 ˆa ‚t ‚ Pe ‚d 0.08 ˆ P ‚P ‚ Pr ‚ Po 0.06 ˆb ‚ Pa ‚ Pb ‚ Pi 0.04 ˆ Pl ‚ Pi ‚t ‚y 0.02 ˆ ‚ ‚ ‚ 0.00 ˆ Šƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Current Time Period

Fitted Hazard FunctionAssuming A Linear Specification of PERIOD

0.14 ˆ ‚ ‚ P ‚E 0.12 ˆs ‚ Pt ‚i ‚ Pm 0.10 ˆa ‚t ‚ Pe ‚d 0.08 ˆ P ‚P ‚ Pr ‚ Po 0.06 ˆb ‚ Pa ‚ Pb ‚ Pi 0.04 ˆ Pl ‚ Pi ‚t ‚y 0.02 ˆ ‚ ‚ ‚ 0.00 ˆ Šƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Current Time Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

These predicted values represent the fitted probabilities of EVENT

occurrence in each period, assuming that PERIOD appears as a linear

function in the logistic model.

These predicted values represent the fitted probabilities of EVENT

occurrence in each period, assuming that PERIOD appears as a linear

function in the logistic model.

Page 30: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 30

Timeperiod

# Teachersin the risk set

in this time period

# Teachers who quit in this time period

Samplehazardprob

Fittedhazard,linear

PERIOD

1 3941 456 0.1157 0.13112 3485 384 0.1102 0.11643 3101 359 0.1158 0.10324 2742 295 0.1076 0.09135 2447 218 0.0891 0.08076 2229 184 0.0825 0.07127 2045 123 0.0601 0.06288 1642 79 0.0481 0.05539 1256 53 0.0422 0.048610 948 35 0.0369 0.042711 648 16 0.0247 0.037512 391 5 0.0128 0.0329

0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

1 2 3 4 5 6 7 8 9 10 11 12

Year in Teaching Career

Haz

ard

Pro

babi

lity

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Sample hazard probabilities obtained from the earlier life-

table analysis.

Sample hazard probabilities obtained from the earlier life-

table analysis.

Predicted values of EVENT, PREDQUIT3, obtained from the logistic regression output

Predicted values of EVENT, PREDQUIT3, obtained from the logistic regression output

Life table (sample) estimates of the hazard probability

Life table (sample) estimates of the hazard probability

Discrete-time survival analysis estimates of hazard probability, assuming linear

specification of PERIOD

Discrete-time survival analysis estimates of hazard probability, assuming linear

specification of PERIOD

It’s a pretty good fit – But, can we do better by using different specifications of PERIOD?It’s a pretty good fit – But, can we do better by using different specifications of PERIOD?

Page 31: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 31

* Now add the quadratic effect of PERIOD to check if this improves the fit; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = PERIOD PERIODSQ; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED4 PREDICTED=PREDQUIT4;* Re-sort output dataset and pick out the twelve unique values of predicted hazard probability, one per discrete time period; PROC SORT DATA=PREDICTED4; BY PERIOD; DATA PREDICTED4; SET PREDICTED4;

BY PERIOD;IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one per discrete period; PROC PRINT DATA=PREDICTED4; VAR PERIOD PREDQUIT4; PROC PLOT DATA=PREDICTED4; PLOT PREDQUIT4*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

* Now add the quadratic effect of PERIOD to check if this improves the fit; PROC LOGISTIC DATA=SPEC_ED_PP; MODEL EVENT(event='Quit') = PERIOD PERIODSQ; FORMAT EVENT EFMT.; OUTPUT OUT=PREDICTED4 PREDICTED=PREDQUIT4;* Re-sort output dataset and pick out the twelve unique values of predicted hazard probability, one per discrete time period; PROC SORT DATA=PREDICTED4; BY PERIOD; DATA PREDICTED4; SET PREDICTED4;

BY PERIOD;IF FIRST.PERIOD=1;

* List & plot the unique predicted hazard probabilities, one per discrete period; PROC PRINT DATA=PREDICTED4; VAR PERIOD PREDQUIT4; PROC PLOT DATA=PREDICTED4; PLOT PREDQUIT4*PERIOD='P' / VAXIS=0 TO .14 BY .02 HAXIS=0 TO 13 BY 1;

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Conduct a logistic regression analysis of EVENT in the person-period dataset.

Conduct a logistic regression analysis of EVENT in the person-period dataset.

Regress the (log-odds of) EVENT on a linear and a quadratic function of PERIOD.

Regress the (log-odds of) EVENT on a linear and a quadratic function of PERIOD.

Output the fitted hazard probabilities – here, called PREDQUIT4, into the person-period dataset.

Output the fitted hazard probabilities – here, called PREDQUIT4, into the person-period dataset.

Plot the fitted hazard function.Plot the fitted hazard function.Print out values of the fitted hazard probability of a few cases for inspection …

Print out values of the fitted hazard probability of a few cases for inspection …

Now, let’s check whether we should add the quadratic effect of PERIOD …Now, let’s check whether we should add the quadratic effect of PERIOD …

Page 32: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 32

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14590.517

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 313.3273 2 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.0628 0.0661 974.6043 <.0001 PERIOD 1 0.0485 0.0323 2.2610 0.1327 PERIODSQ 1 -0.0188 0.00323 33.7750 <.0001

Model Fit Statistics

Intercept Intercept and Criterion Only Covariates -2 Log L 14903.844 14590.517

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq Likelihood Ratio 313.3273 2 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.0628 0.0661 974.6043 <.0001 PERIOD 1 0.0485 0.0323 2.2610 0.1327 PERIODSQ 1 -0.0188 0.00323 33.7750 <.0001

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Using either the approximate test based on the Wald 2 statistic or the preferred difference in –2LL test, we can reject the null hypothesis that the

linear & quadratic effects of PERIOD have no joint effect on EVENT occurrence in the population (2 = 313.3, df =2, p<.0001)

Using either the approximate test based on the Wald 2 statistic or the preferred difference in –2LL test, we can reject the null hypothesis that the

linear & quadratic effects of PERIOD have no joint effect on EVENT occurrence in the population (2 = 313.3, df =2, p<.0001)

-2LL statistic:Intercept only, -2LL = 14903.8Intercept & covariates, -2LL = 14590.0

Difference in –2LL = 313.3

-2LL statistic:Intercept only, -2LL = 14903.8Intercept & covariates, -2LL = 14590.0

Difference in –2LL = 313.3

Page 33: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 33

Fitted Hazard FunctionAssuming Linear & Quadratic Specifications of PERIOD

0.14 ˆ ‚ ‚ ‚E 0.12 ˆs ‚ P Pt ‚ Pi ‚ Pm 0.10 ˆa ‚t ‚ Pe ‚d 0.08 ˆ P ‚P ‚r ‚ Po 0.06 ˆb ‚ Pa ‚b ‚i 0.04 ˆ Pl ‚i ‚ Pt ‚y 0.02 ˆ P ‚ P ‚ ‚ 0.00 ˆ Šƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Current Time Period

Fitted Hazard FunctionAssuming Linear & Quadratic Specifications of PERIOD

0.14 ˆ ‚ ‚ ‚E 0.12 ˆs ‚ P Pt ‚ Pi ‚ Pm 0.10 ˆa ‚t ‚ Pe ‚d 0.08 ˆ P ‚P ‚r ‚ Po 0.06 ˆb ‚ Pa ‚b ‚i 0.04 ˆ Pl ‚i ‚ Pt ‚y 0.02 ˆ P ‚ P ‚ ‚ 0.00 ˆ Šƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒƒƒˆƒƒ 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Current Time Period

Notice that the shape of the fitted hazard function is now a little more curvilinear, since it contains both the linear and quadratic specifications of PERIOD.

Perhaps it is now capturing the underlying risk profile a little better than a purely linear specification of PERIOD?

Notice that the shape of the fitted hazard function is now a little more curvilinear, since it contains both the linear and quadratic specifications of PERIOD.

Perhaps it is now capturing the underlying risk profile a little better than a purely linear specification of PERIOD?

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Page 34: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 34

Timeperiod

# Teachersin the risk set

in this time period

# Teachers who quit in this time period

Samplehazardprob

Fittedhazard,

lin & quadPERIOD

1 3941 456 0.1157 0.11582 3485 384 0.1102 0.1153 3101 359 0.1158 0.11044 2742 295 0.1076 0.10255 2447 218 0.0891 0.0926 2229 184 0.0825 0.0767 2045 123 0.0601 0.06648 1642 79 0.0481 0.05339 1256 53 0.0422 0.041210 948 35 0.0369 0.030611 648 16 0.0247 0.021812 391 5 0.0128 0.015

0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

1 2 3 4 5 6 7 8 9 10 11 12

Year in Teaching Career

Haz

ard

Pro

babi

lity

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Sample hazard probabilities obtained from the earlier life-

table analysis.

Sample hazard probabilities obtained from the earlier life-

table analysis.

Predicted values of EVENT, PREDQUIT4, obtained from the logistic regression output

Predicted values of EVENT, PREDQUIT4, obtained from the logistic regression output

Life table (sample) estimates of the hazard probability

Life table (sample) estimates of the hazard probability

Discrete-time survival analysis estimates of hazard probability, assuming linear &

quadratic specifications of PERIOD

Discrete-time survival analysis estimates of hazard probability, assuming linear &

quadratic specifications of PERIOD

It fits a little better – Let’s continue the process of seeking a better specification for PERIOD?It fits a little better – Let’s continue the process of seeking a better specification for PERIOD?

Page 35: S052/II.2(a2): Applied Data Analysis Roadmap of the Course – What Is Today’s Topic Area?

© Willett, Harvard University Graduate School of Education, 04/19/2023

S052/II.2(a2) – Slide 35

Timeperiod

# Teachersin the risk set

in this time period

# Teachers who quit in this time period

Samplehazardprob

Fittedhazard,

L, Q & CPERIOD

1 3941 456 0.1157 0.11372 3485 384 0.1102 0.11623 3101 359 0.1158 0.11264 2742 295 0.1076 0.1045 2447 218 0.0891 0.09216 2229 184 0.0825 0.07847 2045 123 0.0601 0.06468 1642 79 0.0481 0.05179 1256 53 0.0422 0.040510 948 35 0.0369 0.031211 648 16 0.0247 0.023812 391 5 0.0128 0.018

0.0000

0.0200

0.0400

0.0600

0.0800

0.1000

0.1200

0.1400

1 2 3 4 5 6 7 8 9 10 11 12

Year in Teaching Career

Haz

ard

Pro

babi

lity

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

S052/II.2(a2): Introducing Discrete-Time Survival Analysis Appendix 3 : Conducting DTSA Using Polynomial Functions of Period

Sample hazard probabilities obtained from the earlier life-

table analysis.

Sample hazard probabilities obtained from the earlier life-

table analysis.

Predicted values of EVENT, PREDQUIT3, obtained from the logistic regression output

Predicted values of EVENT, PREDQUIT3, obtained from the logistic regression output

Life table (sample) estimates of the hazard probability

Life table (sample) estimates of the hazard probability

Even better! – What would be the best specification for PERIOD?Even better! – What would be the best specification for PERIOD?

Discrete-time survival analysis estimates of hazard probability, assuming linear,

quadratic & cubic specifications of PERIOD

Discrete-time survival analysis estimates of hazard probability, assuming linear,

quadratic & cubic specifications of PERIOD