rob cribbie quantitative methods program – department of psychology coordinator - statistical...

42
ROB CRIBBIE QUANTITATIVE METHODS PROGRAM – DEPARTMENT OF PSYCHOLOGY COORDINATOR - STATISTICAL CONSULTING SERVICE COURSE MATERIALS AVAILABLE AT: WWW.PSYCH.YORKU.CA/CRIBBIE Introduction to Structural Equation Modeling (SEM) Day 2: November 15, 2012

Upload: alisha-payne

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

ROB CRIBBIE

Q UA N T I TAT I V E M E T H O D S P R O G R A M – D E PA RT M E N T O F P S YC H O L O G Y

C O O R D I N AT O R - S TAT I S T I C A L C O N S U LT I N G S E RV I C E

C O U R S E M AT E R I A L S AVA I L A B L E AT:W W W. P S YC H . Y O R KU. C A / C R I B B I E

Introduction to Structural Equation Modeling (SEM)

Day 2: November 15, 2012

What are we going to do today?

Confirmatory factor analysis

Full structural equation models

Review from last week….

• Definitions of SEM• SEM lingo• SEM assumptions• Model Identification• Fit indices (RMSEA, CFI, TLI, IFI, SRMR)

Least Squares Regression Example

This example helps to bridge the gap between regression and SEM

Study Description: 100 Psychology Graduate Students Outcome: Depression Predictors:

Hours Worked per Week Quality of Relationship with Supervisor Research Productivity

Multiple Regression Output

depression ~ hours + rel_superv + res_prod

Coefficients:

Estimate Std. Error t Pr(>|t|)

(Intercept) 8.81743 2.14132 4.118 8.1e-05 *** hours 0.13045 0.05561 2.346 0.02106 * rel_superv -0.05621 0.08027 -0.700 0.48544 res_prod -0.35047 0.10379 -3.377 0.00106 **

Residual standard error: 3.005 on 96 degrees of freedom Multiple R-squared: 0.1716, Adjusted R-squared: 0.1457

F-statistic: 6.628 on 3 and 96 DF, p-value: 0.0004072

SEM Model

SEM Results

Estimate S.E. C.R. P

depression res_prod -.350 .102 -3.429 ***

depression rel_superv -.056 .079 -.711 .477

depression hours .130 .055 2.382 .017

Number of distinct sample moments: 10

Number of distinct parameters to be estimated: 10

Degrees of freedom (10 - 10): 0

Squared Multiple Correlation (R2) = .17

Regression/SEM Example Summary

The only difference between the Regression and SEM analyses is the estimation method SEM: Maximum Likelihood

Iterative attempt to find parameter values that fit the data Regression: Least Squares

Parameter values that minimize the residuals (i.e., observed – predicted)

Regression and SEM produced parameter estimates and r-squared values for depression that were almost identical

Confirmatory Factor Analysis (CFA)

A structural equation model often consists of two components: a measurement model linking a set of observed

variables to a usually smaller set of latent variables a structural model linking the latent variables through

a series of specified relationships.

CFA corresponds to the measurement model of SEM

Exploratory Factor Analysis (EFA) versus CFA

With EFA, investigators are interested in exploring patterns within the data, whereas with CFA, investigators are interested in explicitly testing specific hypotheses about how the observed variable are related

Exploratory factor analysis (EFA imposes no substantive constraints on the data there are no restrictions on the pattern of relationships between

observed and latent variables (e.g., cross-loadings are permitted and the number of factors is generally not fixed)

EFA is data driven

EFA

• For EFA, each common factor is assumed to affect every observed variable, with the common factors being either all correlated or uncorrelated (i.e., orthogonal or oblique factors) – Can be estimated with ordinary statistical software

packages (e.g., R, SPSS)

• Once the model is estimated, factor scores, proxies of latent variables, are calculated and used for follow-up analysis – e.g., use factor scores to predict a different outcome

in a separate analysis

CFA

Confirmatory factor analysis (CFA), on the other hand, is theory- or hypothesis driven.

With CFA it is possible to place substantively meaningful constraints on the factor model For example, researchers can specify the number of factors,

which observed variables should load on which latent variables, which factors should be correlated, etc.

Unlike EFA, CFA produces many goodness-of-fit measures to evaluate the model Recall that it is the constraints on the model (e.g., limited

number of factors, observed variables that load on only one factor) that determines how well the model fits (i.e., are those constraints reasonable)

Path Diagrams with Latent Variables

• Measurement models– Generally, latent variables “cause” the

observed/indicator variables (reflective indicators), as shown by single-headed arrows pointing away from the latent variable and towards the observed variables– E.g., Latent depression with indicators representing

scores on three different depression scales (latent depression “causes” scores on the observed variables)

– However, in some instances the observe variables ‘combine’ to determine the latent variable (formative indicators)– E.g., Latent socioeconomic status variable with indicators

income, occupation prestige, and level of education

Sample CFA

Sample CFA with higher order factor

Model Identification- Review

Need to scale the latent variables in order to identify the model 1) set one of the regression coefficients for one

indicator equal to 1. All other indicators are interpreted relative to this

value

OR 2) set the variance of the latent variable to 1

(standardizing) Most common method for CFA

Confirmatory Factor Analysis

Indicators are assumed to be normally distributed variables What about items from a scale?

Likert-type items are by nature categorical: covariances are smaller than they should be, model fit tests are biased, parameter estimates and std. errors are biased.

However, research has found that ordered variables with more than 5 categories can often be treated as continuous

For categorical items (e.g., items with less than 5 categories), better to use polychoric correlations or item response theory

CFA Example

Greenglass et al. were interested in the influence of a construct called “energized state”; more specifically whether it could influence coping and stress outcomes Do individuals experiencing this energized state cope better

with stress?A measurement model was needed to examine the

validity of this constructSeveral positive personality variables were measured

as indicative of an energized state Optimism, positive affect, tendency to perceive difficulties as a

challenge, and vigor N = 404 Variance of the latent variables is set at 1

Example CFA

1

energized state

PositiveAffect

e1

1

Vigor

e2

1

Challenge

e3

1

Optimism

e4

1

CFA

Review: Stages of Modeling Ensure that model is identified Screen SEM assumptions:

multivariate outliers, univariate normality, multivariate normality, linearity in the relationships between your variables

Check overall model fit Check standardized residual covariance

matrix/modification indices if model fit is poor Post hoc/exploratory analyses: Make theoretically

appropriate changes to model and re-fit Interpret parameter estimates.

CFA example

Model fit was good (although RMSEA is a little high):

χ² df p-value CFI TLI RMSEA 90% CI SRMR

6.1 2 .047 .99 .97 .07 .006, .139 .0254

Example CFA

Parameter Estimates

Estimate S.E. C.R. P

Positive Affect energized state

8.9886581 .477506 18.824168 ***

Vigor energized state

4.2036306 .290409 14.474831 ***

Challenge energized state

2.3353240 .225466 10.357735 ***

Optimism energized state

1.4259721 .221145 6.4481178 ***

Example CFA

Standardized Parameter Estimates: Numerous rules of thumb, but standardized parameter

estimates are often expected to be >.5

Estimate

Positive Affect <--- energized state .9435927

Vigor <--- energized state .7250835

Challenge <--- energized state .5214540

Optimism <--- energized state .3317228

Example CFA

Model is a reasonable fit to the dataAll loadings on the general factor are

statistically significantThere is a question regarding whether

optimism is an important contributor to the latent construct since its loading is relatively small

Could use “energized state” variable as part of a full structural equation model

Full Structural Equation Models

Once you have established the measurement models for your latent variables, you can now evaluate the structural portion of your hypothesized model i.e., the relationships among the latent variables and

observed variables of interest.

Full SEM Example

A researcher was interested in whether attitudes regarding quantitative ability at the start of a statistics course predicted quantitative performance at the end of the course

2 latent variables Quant Attitudes – 3 indicators

Anxiety, hinderances to doing well in a stats course, self-efficacy

Quant Performance – 2 indicators Average homework grade, average exam grade

One indicator for each latent variable had its loading fixed to 1

Quantitative Attitudes and Performance

Full SEM example

N = 129χ² (4) = 3.23, p = .519 (Excellent!)

PROBLEM: The following variances are negative e10 is the residual variance for the observed

variable “exam average”

e10

-10.0533927

Improper Solutions

Tempting to look at the “problem” variables (e.g., the residual variance for exam average) and deal with the issue by “fixing” the variance to a positive value (e.g., .01) In some instances this is necessary, especially when

the value is close to 0 and all other parts of the model fit well

Better to think carefully about the variables in the whole model Is something misspecified? Are there important parameters missing?

Full SEM Example

If we look through the output, we see that homework average is not a significant indicator of quantitative performance Further, if we go back to the bivariate correlations

among our variables, we further see that homework average is not correlated with any of the indicator variables for quantitative attitudes

Perhaps exam average alone is a better representation of quantitative performance?

χ² (2) = 1.8, p = .413CFA = 1.00IFI = 1.00TLI = 1.00RMSEA = 090% CI = (0, .169)SRMR =.026

Quantitative Attitudes and PerformanceNo Homework Average

Quantitative Attitudes and PerformanceParameter Estimate

Estimate S.E. C.R. P

QPerf QAtt -9.955 2.575 -3.865 ***

HINDR1 QAtt 1.000

ANX1 QAtt 1.720 .380 4.531 ***

SEFF1 QAtt -1.043 .241 -4.331 ***

EXAMAVG QPerf 1.000

Note: The model on the previous slide is identical to just including the observed ‘exam average’ variable as the outcome (instead of creating a latent ‘quantitative performance’ variable)

Model fit well with homework average present, but it was not contributing to the model (and in fact it was leading to other issues)

Without homework average, the latent Quantitative Attitudes variable was a significant predictor of Quantitative Performance (now simply exam scores), explaining approximately 20% of the variability in Quantitative Performance The relationship was negative, as expected,

with higher levels of negative attitudes predicting lower scores (and vice versa)

Quantitative Attitudes and PerformanceSummary

Evaluate the effects of a sixth grade intervention for reducing early sexual behaviours More specifically, do these sixth grade

intervention strategies reduce the amount of sexual behaviour in grade 7 (time 2) and grade 8 (time 3)

‘Sexual Gestalt’ is a latent variable that is made up of psychosocial variables related to the individual’s views toward early sexual behaviour

SEM Example Two

SEM Example Two

SexualBehavior (T3)

PeerNorms

SexualLimits

UnwantedAdvances

Residual

ParentalViews

SexualBehavior (T2)

Residual

1

e11

e21

e31

e41

SexualGestalt

1

1

Results for Model

Chi-square test of absolute model fit Chi-square = 33.42 with 9 DF, p < .0001 Our model does not fit the data on an absolute basis

(which is extremely common given that sample sizes are usually large and any non-zero residuals will result in a significant chi-square)

Does our model fit the data on a descriptive or approximate basis?

Descriptive fit measures CFI = .96 RMSEA = .062 SRMR = .04

Reasonable fit …. but can we do better???

SEM Example: Model Modification

Largest standardized residual covariances: Sexual Limits - Sexual Behavior (t3): -2.43 Peer Norms - Sexual Behavior (t3): -1.84

Modification indices suggest that Sexual Limits to Sexual Behavior (t3) is the single best path to free for estimation Index value = 9.10

what the (minimum) expected drop in the model chi- square fit statistic would be if we were to free this parameter

Modification indices suggest that Sexual Gestalt to Sexual Behavior (t3) is the next best path to free for estimation

SEM Example: Model Modification

Makes more sense (probably) to connect Sexual Gestalt to Sexual Behavior at time 3 than it does to connect Sexual Limits to Behavior It is important to always consider which of the possible

modifications makes most sense (in terms of parsimony, theory, etc.), instead of blindly making modifications

Re-specify model with one additional path from the Sexual Gestalt factor to Sexual Behavior at time 3

SEM Example: Modified Model

SexualBehavior (T3)

PeerNorms

SexualLimits

UnwantedAdvances

Residual

ParentalViews

SexualBehavior (T2)

Residual

1

e11

e21

e31

e41

SexualGestalt

1

1

SEM Example: Modified Model Results

Chi-square = 17.91 with 8 DF, p =.02 15 unit drop in chi-square value for only one DF a good tradeoff!

Approximate Fit Indices: CFI = .98 RMSEA = .04 SRMR = .02 GFI = .98 AGFI = .95

No standardized residual covariances exceed |1.50|; most are below |1.00|

SEM Example: Modified Model Results

SexualBehavior (T3)

PeerNorms

SexualLimits

UnwantedAdvances

2.83

Residual

ParentalViews

Unstandardized estimates

Chi-square (8 df) = 17.912

p = .022

SexualBehavior (T2)

.52

Residual

1

.29

e11

.27

e21

.07

e31

.39

e41

.28

SexualGestalt

1.00

-.95*

-.28*

-.30*

-2.20*

.31*

1

-1.49*

SEM Example: Modified Model Results

.33

SexualBehavior (T3)

.48

PeerNorms

.49

SexualLimits

.25

UnwantedAdvances

Residual

.06

ParentalViews

.72

SexualBehavior (T2)

Residuale1

e2

e3

e4

SexualGestalt

.70

-.69

-.50

-.25

-.85

.21

-.39