Download - Statistical Modelling 2

8/3/2019 Statistical Modelling 2

http://slidepdf.com/reader/full/statistical-modelling-2 1/89

Statistical Modelling(Special Topic: SEM)

Statistik Terapan dan Manajemen Data.Email : [email protected]



Relationship Analysis

Awareness on the fact that somerelationships / models are ¶meaningful· and

some are not.

Meaningful relationships / modelsnormally have theoretical basis

(underlying theory) and exhibit ¶causality·or ¶cause-and-effect·

For those ¶cause-and-effect· relationships,

SEM provides a formal way of analysing



Agenda Agenda

Part I: SEM the Basic

i. SEM Nomenclature / Terminologies

ii. SEM & related Models

Part II: Modeling and Computingi. how to draw a model using AMOS.

ii. how to run the AMOS model and evaluate severalkey components of the AMOS graphics and textoutput, including overall model fit and test statistics

for individual path coefficients.iii. how to modify and respecify a non-fitting model.

Part III: SEM and Its Applications



Part One

SEM: The Basichttp://58.26.137.12/byatim/



SEM

Is a statistical methodology of the analysis

of a structural theory that bears on some

phenomenon using a confirmatory

(hypothesis testing) approach. Most othermultivariate procedures are descriptive/exploratory in nature.

The theory represent ́ causalµ processesthat generate observations on multiple

variables.



SEM

conveys 2 important aspects of the procedures..The causal processes under study are represented by a

series of structural equations, and

These structural equations can be modeled pictorially

to enable a clearer conceptualization of the theoryunder study.

The model can be tested simultaneously todetermine the extent to which it is consistent

with the data: if the goodness of fit adequate, themodel is not rejected, otherwise thehypothesized relations rejected.



SEM: A Note

SEM is a very general, very powerful and very popular multivariate analysis technique.

It provides a comprehensive method for thequantification and testing of theories.

Been applied in econometric, psychology, sociology,political science, education, market and medicalresearch etc.

Also known ascovariance structure analysis,

covariance structure modeling,latent vaviable modeling,

confirmatory factor analysis,

linear structural relationship and

analysis of covariance structures.



SEM is ««

a family of statistical techniqueswhich incorporates and

integratesPath analysis

Linear regression

Factor analysis



SEM

serves purposes similar to multipleregression, but in a more powerful way which takes into account the modeling

of interactions, nonlinearities, correlated

independents, measurement error, correlated errorterms, multiple latent independents each measured bymultiple indicators, and one or more latent dependentsalso each with multiple indicators.

may be used as a more powerful

alternative to multiple regression, pathanalysis, factor analysis, time series analysis,and analysis of covariance. Theseprocedures are special cases of SEM, or,

is an extension of the general linear model



Advantages of SEM compared to

multiple regression

more flexible assumptions (particularly allowing interpretationeven in the face of multicollinearity),

use of confirmatory factor analysis to reduce measurementerror by having multiple indicators per latent variable,

the attraction of SEM's graphical modeling interface, thedesirability of testing models overall rather than coefficientsindividually,

the ability to test models with multiple dependents,

model mediating variables, model error terms,

test coefficients across multiple between-subjects groups, and

handle difficult data (time series with autocorrelated error, non-normaldata, incomplete data).



Major applications of structural

equation modeling1. causal modeling, or path analysis - hypothesizes causalrelationships among variables and tests the causal models with alinear equation system. Causal models can involve either manifestvariables, latent variables, or both;

2. confirmatory factor analysis - extension of factor analysis in whichspecific hypotheses about the structure of the factor loadings and

intercorrelations are tested;3. regression models, in which regression weights may be constrained

to be equal to each other, or to specified numerical values;

4. covariance structure models, which hypothesize that a covariancematrix has a particular form. For example, you can test thehypothesis that a set of variables all have equal variances with this

procedure;5. correlation structure models, which hypothesize that a correlation

matrix has a particular form.



Aims and Objectives

By the end of this course you should: Have a working knowledge of the principles

behind causality. Understand the basic steps to building a model

of the phenomenon of interest. Be able to construct/ interpret path diagrams. Understand the basic principles of how models

are tested using SEM. Be able to test models adequacy using SEM Be able to use AMOS intelligently.



SEM: Another Note

Assumption 1: you are familiar with thebasic logic of statistical reasoning asdescribed in Elementary Concepts.

Assumption 2: you are familiar with theconcepts of variance, covariance,correlation and regression analysis; if not,you are advised to read the BasicStatistics.

It is highly desirable that you have somebackground in factor analysis beforeattempting to use structural modeling.



Introduction to SEMIntroduction to SEM

How Useful is Statistical Model?

The Basic Idea Behind SEM

Causality (Cause-and-EffectRelationship)

SEM Nomenclature/Terminologies

SEM & related Statistical Models



How Useful is Statistical Model?How Useful is Statistical Model?

y ³ All models are wrong, but some areuseful ́

G.E.P Box

y ³SEM models can never be accepted(as absolute truth); they can only failto be rejected. ³ This leads researchers to provisionally

acce pt a given model.

While models that fit the data well canonly be provisionally accepted, modelsthat do not fit the data well can be

absolutely rejected.




y In Distribution Theory course you aretaught that, if you multiply every number ina list by some constant K, you multiply the

mean of the numbers by K. Similarly, youmultiply the standard deviation by theabsolute value of K.

y Suppose you have the list of numbers 1,2,3

having a mean of 2 and a standarddeviation of 1. Suppose also you take these3 numbers and multiply them by 4. Thenthe mean would become 8, and thestandard deviation would become 4, the




y The point is, if you have a set of numbers Xrelated to another set of numbers Y by theequation Y = 4X, then the variance of Y must be16 times that of X, so you can test the

hypothesis that Y and X are related by theequation Y = 4X indirectly by comparing thevariances of the Y and X variables.This ideageneralizes, in various ways, to several variables

inter-related by a group of linear equations.Therules become more complex, the calculationsmore difficult, but the basic message remains thesame -- you can test whether variables areinterrelated through a set of linear relationshipsby examining the variances and covariances of



The Basic Idea Behind SEMStatisticians have developed procedures for testing whether a set of variances and covariances in a covariance matrix fits a specifiedstructure.The way SEM works is as follows:

1. You state the way that you (the theory) believe the variables areinter-related, often with the use of a path diagram.

2. You (AMOS) work out, via some complex internal rules, what theimplications of this are for the variances and covariances of thevariables.

3. You test whether the variances and covariances fit this model of them.

4. Results of the statistical testing, and also parameter estimates and

standard errors for the numerical coefficients in the linear equations are reported.

5. On the basis of this information, you decide whether the modelseems like a good fit to your data.



A Simple SEM

SEM is an attempt to model causal

relations between variables by including

all variables that are known to have some

involvement in the process of interest« test the effect of a drug on some

psychological disorder (e.g. obsessive

compulsive disorder, OCD)



CausalityCausality

Causality has theoretical basis

Education

Succes

s inLife

Price Deman

d

Suppl

y

Unemp-

loyment

Rate

No. of

Crimes

Windows of

Opportunity

for Crime



Cause and Effect

Philosophers have had a great deal to say

about the conditions necessary to infer

causality. Cause and effect

should occur close together in time,

cause should occur before an effect isobserved, and

the cause should never occur without thepresence of the effect.



John Stuart Mill (1865) described three

conditions necessary to infer cause

Cause has to precede effect Cause and effect must be related

All other explanations of the cause-effect

relationship must be ruled out.



To verify the third criterion, Mill

proposed the method of agreement which states that an

effect is present when the cause is present;

method of difference which states that whenthe cause is absent the effect will be absent alsoand;

method of concomitant variation which

states that when the above relationships areobserved, causal inference will be made strongerbecause most other interpretations of the cause-effect relationship will have been ruled out.



Example:If we wanted to say that me talking about causality causes

boredom, we would have to satisfy the following conditions:(1) I talk about causality before boredom occurs.

(2) Whenever I talk about causality, boredom occurs shortlyafterwards.

(3) The correlation between boredom and my talking about

causality must be strong (e.g. 4 out of 4 occasions when I talk about causality boredom is observed)

. (4) When cause is absent effect is absent: when I don·t talk aboutcausality no boredom is observed.

(5) The manipulation of cause leads to an associatedchange in effect. So, if we manipulated whether someone islistening to me talking about causality or to my cat is mewing, theeffect elicited should change according to the manipulation.

This final manipulation serves to rule out external variables thatmight affect the cause-effect relationship.



Continue«

in situations in which cause cannot be

manipulated we cannot make causal

attributions about our variables.

Statistically speaking, this means thatwhen we analyze data from non-experimental situations we cannot

conclude anything about cause an effect. Structural Equation Modeling (SEM) is an

attempt to provide a flexible framework within which causal models can be built.



Statistical ModelingStatistical Modeling

A Statistical Model DOES NOT necessarily havetheoretical basis ² It may be interpreted as either¶make sense· or ¶nonsense·

No. of

Road

Accidents

No. of

Newspape

r Readers

Heart

Disease

Weight

IncomeSmokin

g



SEM & Related StatisticalSEM & Related Statistical

Models

Modelsy General Linear Model (GLM)

y Regression Model

y Time Series Model

y Log-linear Model

y Mixed Models

y Survival Models

y Many more «

All these Statistical Models may or may

not have theoretical basis



GeneralIntelligence

y4

e4

1

1

y3

e3

1

y2

e2

1

y1

e1

1

Quantitative Ability

x8

d8

1

1

x7

d7

1

x6

d6

1

x5

d5

1

Verbal

Ability

x4

d4

1

1

x4

d3

1

x2

d2

1

x1

d1

1

Exogenous Latent Variable /Construct

Endogenous Latent Variable

Exogenous Latent Variable

Indicators

Indicators

Indicators



SEM NomenclatureSEM Nomenclature

y Independent variables, which are assumed tobe measured without error, are calledexogenous or upstream variables;

y Dependent or mediating variables are called

endogenous or downstream variables.y Manifest or observed variables or indicators

are directly measured by researchersy Latent or unobserved variables are not

directly measured but are inferred by therelationships or correlations amongmeasured variables in the analysis. Example,self-concept, motivation, powerlessness,

anomie, verbal ability, capitalism, socialclass.



SEM Nomenclature (cont.)SEM Nomenclature (cont.)

y SEM illustrates relationships among

observed and unobserved variables

using path diagrams.

y Ovals or circles represent latentvariables,

y Rectangles or squares represent

measured variables. y Residuals are always unobserved, so

they are represented by ovals or

circles.



SEM SEM ± ± DefinitionDefinition

SEM is an extension of the general linearmodel (GLM) that enables a researcher totest a set of regression equationssimultaneously.

SEM consists of TWO components;

Structural Model illustrates the relationships among the latent

constructs or endogenous variables Measurement Model represents how the constructs are related to

their indicators or manifest variables



ExampleExample

Ability /

IntelligenceAspirations Achievement

1L

2

L1

\

ExogenousLatent

Construct

EndogenousLatent

Construct

EndogenousLatent

Construct

In psychology, the theorypostulates that «



Full Latent Variable Model

Ability Aspiration Achievement

InterpersonalSkill, x2

Peer¶sInfluence

y3

FamilyStatus, y1

Father¶sOccupation,

y2

ProfessionalStatus, y5

SocialStatus,

y6

AcademicSkill, x1

CommunicationSkill, x3

PersonalActualization,

y4



Example:ONE Latent (unobserved) Exogenous Variable &

TWO Latent (unobserved) Endogenous Variables

1\ 1L 2L

1 x

2 x

3 x

1 y 2

y3

y4

y5

y6

y

11P

21P

21 F

x

11P

x

21P

x

31P

y

11P

y

21P

y

31P y

42P

y

52P

y

62P

1 ^ 2

^

1H

2H

3H 1

I2

I3

I3

I4

I6

I

Structural

Model

Measurement

Model



Structural ModelStructural Model

y The structural model allows for certainrelationships among the latent variables,depicted by lines or arrows (in a path diagram)

y In the path diagram, we specified that Ability andAchievement were related in a specific way. Thatis, intelligence had some influence on laterachievement.

y Thus, one result from the structural model is an

indication of the extent to which these a priorihypothesized relationships are supported byour sample data.



Structural ModelStructural Model (Cont.)(Cont.)

y The structural equation addresses the

following questions:

y Are Ability and Achievement related?

y Exactly how strong is the influence of

Ability on Achievement?

y Could there be other latent variables that

we need to consider to get a betterunderstanding of the influence on

Achievement?





1\ 1L 2L

1 x

2 x

3 x

1 y 2

y3

y4

y5

y6

y

11P

21P

21 F

x

11P

x

21P

x

31P

y

11P

y

21P

y

31P y

42P

y

52P

y

62P

1 ^ 2

^

1H

2H

3H 1

I2

I3

I3

I4

I6

I

Structural

Model

Measurement

Model



Mathematical Form of StructuralMathematical Form of Structural

ModelModel

^L\L

^

^

L

L

F\

P

P

L

L

^L F\PL

^\PL

+!

¹¹ º

¸©©ª

¨¹¹

º

¸©©ª

¨¹¹ º

¸©©ª

¨¹¹

º

¸©©ª

¨!¹¹

º

¸©©ª

¨

!

!

B

2

1

2

1

21

1

21

11

2

1

21211212

11111

0

00



Measurement ModelMeasurement Model

y Specifying the relationship between thelatent variables and the observed variables

y Answers the questions:1) To what extent are the observed variables

actually measuring the hypothesized latent

variables?2) Which observed variable is the best

measure of a particular latent variable?3) To what extent are the observed variables

actually measuring something other than

the hypothesized latent variable?y Using Exploratory Factor Analysis (EFA) or

Confirmatory Factor Analysis (CFA) todetermine the significant observedvariables related to each of the latentvariables



Exploratory FA (EFA)Exploratory FA (EFA)

In EFA the factor structure or theory about aphenomenon is NOT KNOWN.

For example, the researcher is interested inmeasuring ́ the achievement of a personnelµ.

Suppose he has no knowledge ( very little theory)regarding the factors that contribute to achievement

the no. of indicators of each factor

which indicators represent which factor

In such a case, the researcher may collect dataand ¶explore· for a factor or theory which canexplain the correlations among the indicators.



Confirmatory FA (CFA)Confirmatory FA (CFA) In CFA the precise factor structure or theory about a

phenomenon is KNOWN or specified priori.

For example, a researcher is interested in measuring´consumer preferenceµ to a product.

Suppose that ¶based on previous research· it is hypothesized

(the theory) that a construct or factor to measure¶consumer preference· is a one-dimensional construct with 7 indicators or items as its

measures

The obvious question is: How well do the empirical data conform to the theory of

consumer preferences? Or How well do the data fit the model?

In such a case, CFA is used to do empirical ¶confirmation· or¶testing· of the theory



Using Factor AnalysisUsing Factor Analysis

Ability

Academic

Skill

Inter-

personal

Skill

Communi-cation Skill

x1

x2

x3

1\

FactorLoadings




Aspiration

Family

Status

Father¶s

Occupation

Peer¶s

Influence

y1

y2

y3

1L

FactorLoadings




Achievement

Personal

Actualisation

Professional

Status

Social

Status

y4

y5

y6

2L

FactorLoading



Measurement ModelMeasurement Model (Cont.)(Cont.)

y The relationships between the observedvariables and the latent variables aredescribed by factor loadings

y Factor loadings provide information about

the extent to which a given observedvariable is able to measure the latentvariable. They serve as validity coefficients.

y Measurement error is defined as thatportion of an observed variable that is

measuring something other than what thelatent variable is hypothesized to measure. It serves as a measure of reliability.



Measurement ModelMeasurement Model (Cont.)(Cont.)

yMeasurement error could be the result

of:

1. An unobserved variable that is

measuring some other latent variable

2. Unreliability

3. A second-order factor



Mathematical Form of Mathematical Form of

Measurement ModelMeasurement Model

H\

H

H

H

\

P

P

P

H\P

H\P

H\P

50!

¹

¹¹

º

¸

©

©©

ª

¨

¹

¹¹

º

¸

©

©©

ª

¨

!

¹

¹¹

º

¸

©

©©

ª

¨

!!

!

x

x

x

x

x

x

x

x

or

x

x

x

x

x

x

3

2

1

1

31

21

11

3

2

1

31313

21212

11111

How the latent (unobservable) exogenous variable

are related to their indicators or manifest/observed

variables x1,x2 & x3

1\



Measurement Model (cont.)Measurement Model (cont.)

IL

I

I

I

I

I

I

L

L

P

P

P

P

P

P

ILP

ILP

ILP

ILP

ILP

ILP

50!

¹

¹¹¹¹¹¹¹

º

¸

©

©©©©©©©

ª

¨

¹¹ º

¸©©ª

¨

¹

¹¹¹¹¹¹¹

º

¸

©

©©©©©©©

ª

¨

!

¹

¹¹¹¹¹¹¹

º

¸

©

©©©©©©©

ª

¨

!

!

!

!

!

!

y

y

y

y

y

y

y

y

y

y

y

y

y

yor

y

y

y

y

y

y

formmatrixin

y

y

y

y

y

y

6

5

4

3

2

1

2

1

62

52

42

31

21

11

6

5

4

3

2

1

62626

52525

42424

31313

21212

11111

0

0

0

0

0

0

How the TWO latent (unobservable)

constructs or endogenous variables ,

are related to their indicators or manifest

variables y1

, ..y6

1L

2L



Full Latent Variable Model

Ability Aspiration Achievement

InterpersonalSkill, x2

Peer¶sInfluence

y3

FamilyStatus, y1

Father¶sOccupation,

y2

ProfessionalStatus, y5

SocialStatus,

y6

AcademicSkill, x1

CommunicationSkill, x3

PersonalActualization,

y4





1\ 1L 2L

1 x

2 x

3 x

1 y 2

y3

y4

y5

y6

y

11P

21P

21 F

x

11P

x

21P

x

31P

y

11P

y

21P

y

31P y

42P

y

52P

y

62P

1 ^ 2

^

1H

2H

3H 1

I2

I3

I3

I4

I6

I

Structural

Model

Measurement

Model



Reliabilityy Definition: Extent to which a variable or set of

variables or set of variables is consistent in

what it is intended to measure

y If multiple measurement are taken, the reliable

measures will all be consistent in their valuesy It is a degree to which the observed variable

measure the ́ trueµ value and is ́ error freeµ

y It is different from validity



True Score and Measurement

Error True score: a component which indicates

the subject actually stands on the variable

(statement) of interest Measurement error: A component which

indicates the inaccuracies when measuring´true scoresµ due to fallibility of survey

instrument, responses scales, data entryor respondent error



ReliabilityReliability

y The degree to which scores are free from

random measurement error

y Reliability measures

Internal Consistency Reliability

Test-retest Reliability

Alternate Forms Reliability



ReliabilityReliability

y Levels of Reliability

0.90 Excellent

0.80 Very Good

0.70 Adequate

<0.70 Poor



Example:Reliability of ObservedExample:Reliability of ObservedVariablesVariablesC ronbach¶s al pha were com puted for the all variables

Variable No. of items Reliability

Variable1 10 .91

Variable2 10 .87

Variable3 10 .58

Variable4 10 .70

Variable5 12 .72

Variable6 12 .80

Variable7 12 .80

Variable8 12 .87

Variable9 10 .84

Variable10 7 .71

Variable11 4 .48



Summated Scale Reliability

y When reliability involves multiple scaled

items, reliability must be measured in a

summated scale.

y A summated rating scale is a short list of statements, questions or other items that

the subject responds to.

y A summated is a sum of responses from alist of statements to create an overallscore.



Reliability coefficient (1)

y There are several ways to measure

reliability which will be discussed later.

y The measurement is normally called the

reliability coefficient.y This coefficient is the percent of variance

in an observed variable that is accounted

for by the true scores of the underlyingconstruct.



Reliability Coefficient (2)y Imagine you have collected 2 scores from a survey

True and observed scores of customer satisfaction

y You compute the correlation between thescores

y The square of correlation coefficient will be

your reliability coefficient which is

The total variances explained in the observed scores by

the true score or The percent of variance in observed scores that is

accounted for by true scores.

Types of Reliability



Types of Reliability

Test-retest

Assessed by administering the same instrument tothe same sample respondent at two points in time,and computing the correlation between two sets of scores.

Internal consistency reliabilityThe extent to which individual items that constitute atest correlate with one another or with the test total.In short, it measures how consistently respondents

respond to the items within scale.



Types of Reliability (2) For example, if the first half of an instrument is educational

items which correlate highly among themselves and second ispolitical items which correlate highly among themselves., theinstrument would have high internal consistency anyway, eventhough they are two distinct dimensions

Note that measure of internal consistency are oftencalled measures of ́ internal consistency reliabilityµor even ́ reliability,µ but this merge the distinctconcepts of internal consistency and reliability, whichnecessarily go together

How do we solve this problem? The most commonly used internal consistency reliability is

Cronbach·s Alpha



Validity

Definition: extent to which an item or setof items correctly represent theconstruct of study- the degree of which it

is free from any systematic or non-random error

Validity deals withHow well the construct is defined by the

item/s (what should be measured) While Reliability deals withHow consistent the item/s is/are measuring

the construct (HOW it is measured)



ValidityValidity

Whether the scores measure what they

are supposed to measure

Types of validity

Construct Validity (SEM Confirmatory FactorAnalysis helps to establish construct validity)

Criterion-Related Validity (Correlation with anexternal standard)

Convergent Validity/ Discriminant Validity (Can bedetermined through SEM Confirmatory FactorAnalysis)



Examples Example 1:How happy are you?

This example is validity -whether the measure

accurately represents what it is supposed to

measure

Example 2:How happy are you when you aresmoking? Ask this question repeatedly on the

same subject or multiple subject and see how

consistent their answers are?This example is about reliability (sometimes I·dlike to call it consistency)



I Am an Indicator Lyrics by Alan Reifman

(May be sung to the tune of "The Entertainer," Billy Joel)

I am an indicator, a latent construct I represent,I'm measurable, sometimes pleasurable,A manifestation of what is meant,

I am an indicator, I usually come in a multiple set,With other signs of the same construct, you may instruct,I'm correlated with my co-indicators, you can bet,

I am an indicator, from my presence the construct is inferred,I'm tap-able, the construct is not palpable,

The distinction should not be blurred

At L t Th



At Least ThreeLyrics by Alan Reifman(May be sung to the tune of "Think of Me," Lloyd Webber/Hart/Stilgoe,

fromPh

antom of the Opera)

At least three, indicators are urged,

For each latent construct shown,At least three, indicators should help,Avoid output where you groan,

With less than three, your construct sure willbe, locally unidentified,Though the model might still run, you couldhave a rough ride



Total, Direct and IndirectTotal, Direct and Indirect

EffectsEffectsy There is a direct effect between two latent

variables when a single directed line or arrowconnects them

y There is an indirect effect between two

variables when the second latent variable isconnected to the first latent variable throughone or more other latent variables

y The total effect between two latent variables is

the sum of any direct effect and all indirecteffects that connect them.



Example: Direct and Indirect «Example: Direct and Indirect «

Ability /Intelligence

Aspirations Achievement

1L

2L

1\

ExogenousLatent

Construct

EndogenousLatent

Construct

EndogenousLatent

Construct



Semantics

Types of measurement scale

Metric and Non-metric

Correlation coefficient

Correlation and Covariance Matrix

Standardized and Un-standardizedEstimates



Types of Measurement Scale

y There 4 types of measurement scale in ascale instrument Nominal Scale Ordinal

Interval Scales Ratio

y Some other common scales like Likertscales, Semantic Differential Scales,

Dichotomous Scales etc can becategorized into the 4 above

y This is important as assumptions on SEMrely on what we know on this page



Metric and Non-metric Scales

y Metric scales are quantitative data wherethe parameters of the scale is continuum Interval or Ratio scale data

y

Non-metric scales are qualitative datawhere attributes, characteristics orcategorical properties that identify ordescribe a subject or object

Possibly Nominal or Ordinal scale datay But the use of metric and non-metric

scales can be misused or abusedsometime«how?



VARIABLE SC ALESVARIABLE SC ALES

y SEM in general assumes observed variables aremeasured on a linear continuous scale

y Dichotomous and ordinal variables cause

problems because correlations /covariances tendto be truncated. These scores are not normallydistributed and responses to individual items maynot be very reliable.



Correlation

y Perhaps the most basic semantic Definition: the linear relationship of two

variables

y The strength of relationship is determined

by the correlation coefficient and r(explained later)

y There are 2 common types of correlationcoefficient

Pearson Product Moment Correlation(Interval)

Spearman Ranking Correlation (Ordinal)

y The former is the one we will use in this

course



Correlation Matrix (1)

y The correlation matrix of n random variables X1,«,Xnis the n × n matrix whose i,j entry is corr( Xi,Xj)

y If the measurement of correlation used are product-moment coefficients, the correlation matrix is the sameas the covariance matrix of the standardized randomvariables Xi/SD(Xi) for i=1,«,n

y Consequently it is necessary a non-negative definitematrix ² important assumption

y The correlation matrix is symmetric because the

correlation between Xi and Xj is the same as thecorrelation between Xj and Xi




Correlation Matrix (2)A1 A2 A3 A4 A5 A6 A7 B1 B2 B3

A1

a1

1.0000 0.65579

<.0001

0.46296

<.0001

0.58812

<.0001

0.62082

<.0001

0.62629

<.0001

0.64288

<.0001

0.34385

0.0004

0.57904

<.0001

0.56353

<.0001

A2

a2

0.65579

<.0001

1.00000 0.45951

<.0001

0.66297

<.0001

0.72727

<.0001

0.77384

<.0001

0.76693

<.0001

0.40987

<.0001

0.67796

<.0001

0.59493

<.0001

A3

a3

0.46296

<.0001

0.45951

<.0001

1.00000 0.51913

<.0001

0.46652

<.0001

0.45752

<.0001

0.44520

<.0001

0.33407

0.0006

0.35833

0.0002

0.33623

0.0006

A4

a4

0.55812

<.0001

0.66297

<.0001

0.51913

<.0001

1.00000 0.69905

<.0001

0.64969

<.0001

0.59358

<.0001

0.34148

0.0004

0.58859

<.0001

0.44284

<.0001

A5

a5

0.62082

<.0001

0.72727

<.0001

0.46652

<.0001

0.69905

<.0001

1.00000 0.67281

<.0001

0.66939

<.0001

0.31277

<.0014

0.63133

<.0001

0.54744

<.0001

A6

A6

0.62629

<.0001

0.77384

<.0001

0.45752

<.0001

0.64969

<.0001

0.67281

<.0001

1.00000 0.86014

<.0001

0.40483

<.0001

0.66758

<.0001

0.56944

<.0001

A7

A7

0.64288<.0001

0.76693<.0001

0.44520<.0001

0.59358<.0001

0.66939<.0001

0.86014<.0001

1.00000 0.39913<.0001

0.68141<.0001

0.62075<.0001

B1

b1

0.34385<.0004

0.40987<.0001

0.33407<.0006

0.34148<.0004

0.31277<.0014

0.40483<.0001

0.39913<.0001

1.00000 0.58187<.0001

0.62583<.0001

B2

b2

0.57904<.0001

0.67796<.0001

0.35833<.0002

0.58859<.0001

0.63133<.0001

0.66758<.0001

0.68141

<.0001

0.58187

<.0001

1.00000 0.85335<.0001

B3

b3

0.56353

<.0001

0.59493

<.0001

0.33623

<.0006

0.44284

<.0001

0.54744

<.0001

0.56944

<.0001

0.62075

<.0001

0.62583

<.0001

0.85335

<.000

1.00000




y So we say that

If the input matrix used is the CovarianceMatrix: the estimated coefficients in the

parameters measured are unstandardizedestimates

If the input matrix used is the CorrelationMatrix: the estimated coefficients in the

parameters measured are the standardizedestimates

y So what?



Covariance

The covariance between two variables

equals the correlation times the product

of the variables' standard deviations. Thecovariance of a variable with itself is thevariable's variance




y Therefore when we want to test a theory,we use variance-covariance matrix (to validate the causal relationships among

constructs)

y When we just want to explain the patternof the relationships then we usecorrelation matrix

(Theory testing is not required)

F t Eff ti C l ti /



Factors Effecting Correlation/

Covariance Coefficienty Type of scale and range of values

Pearson correlation is basis for analysis in regression, path, factor analysis andSEM.Hence data must be in metric form.

There must be enough variation in scores to allow correlation relationship tomanifest.

y Linearity Pearson correlation coefficient measures degree of linear relationship between

two variables, hence need to test linearity.

y Sample size SEM requires big sample size.Rule of thumb: 10-20 times the number of

variables. Ding, Velicer and Harlow (1995): 100-150; Boomsma (1982,1983): 400;Hu, Bentler and Kano (1992): in some cases 5000 is still insufficient; Schumaker,Lomax (1999) many articles 250-500. Bentler and Chou (1987): for normal data5 subjects per variable is sufficient.

C i



Covariance

Lyrics by Alan Reifman (May be sung to the tune of "Aquarius," Rado/Ragni/MacDermot, from Hair , also

popularized by the Fifth Dimension)

You draw paths to show relationships,You hope align with the known r ·s,Your model will guide the tracings,From constructs near to constructs far,

You will compare this with the data·s covariance,The data·s covariance...Covariance!Covariance!

Similar to correlation,With the variables unstandardized,

Does each known covariance match up with,The one the model tracings will derive?

Covariance!Covariance!



SEM AssumptionsSEM Assumptions

Sample Size

y a good rule of thumb is >15 cases perpredictor / indicator ( James Stevens·

Applied Multivariate Statistics for the Social Sciences)

y Model with TWO factors,

recommended sample size >100y Model with FOUR factors,

recommended sample size > 200



SEM Assumptions (cont.)SEM Assumptions (cont.)

Sample Size

y Consequences of using smaller samples

convergence failures (the software cannot

reach a satisfactory solution), improper solutions (including negative error

variance estimates for measured variables),

lowered accuracy of parameter estimates

and, in particular, standard errorsy SEM program standard errors are

computed under the assumption of large

sample sizes.




Normality

y Many SEM estimation proceduresassume multivariate normal distributions

y

Lack of univariate normality occurs whenthe skew index is > 3.0 and kurtosisindex > 10.

y Multivariate normality can be detected by

indices of multivariate skew or kurtosisy Non-normal distributions can sometimes

be corrected by transforming variables




Multicollinearity Occurs when intercorrelations among some

variables are so high that certain mathematicaloperations are impossible or results are unstablebecause denominators are close to 0.

Bivariate correlations >0.85; Multiple correlations>0.90

May cause a non-positive definite/ singular covariancematrix

May be due to inclusion of individual and compositevariables

Detection; Tolerance = 1-R2 ,0.10;

Variance Inflation Factor (VIF) = 1/(1-R2) >10 Can be corrected by eliminating or combiningredundant variables




Outliersy Univariate outliers more than three SDs away

from the meany Detection by inspecting frequency

distributions and univariate measures of skewness and kurtosis

y Multivariate outliers may have extremescores on two or more variables or their figurations of scores may be unusual

y Detection by inspecting indices of multivariateskewness and kurtosis. MahalanobisDistance squared is distributed as chi square

with df equal to the number of variables.y Can be remedied by correcting errors or by

dropping these cases of transforming thevariables



VIOLATIONS OF ASSUMPTIONS(1)VIOLATIONS OF ASSUMPTIONS(1)

y The best known distribution with nokurtosis is the multi-normal.

y

Leptokurtic (more peaked) distributionsresult in too many rejections of Ho

based on the Chi square statistic.

y

Platykurtic distributions will lead to toolow estimates of Chi Square.

VIOLATIONS OF ASSUMPTIONSVIOLATIONS OF ASSUMPTIONS



VIOLATIONS OF ASSUMPTIONSVIOLATIONS OF ASSUMPTIONS

(2)(2)

y High degrees of skewness lead to

excessively large Chi square estimates.

y In small samples (N<100), the Chi square

statistic tends to be too large.

M d l Id tifi ti (Id tifi dM d l Id tifi ti (Id tifi d



Model Identification (IdentifiedModel Identification (Identified

Equations)Equations)

y Identification refers to the idea that there is atleast one unique solution for each parameter estimate in a SEM model.

y Models in which there is only one possiblesolution for each parameter estimate are said to

be j ust-identified.y Models for which there are an infinite number of

possible parameter estimate values are said tobe underidentified.

y Finally, models that have more than one possible

solution (but one best or optimal solution) for each parameter estimate are consideredoveridentified .

Model Identification (IdentifiedModel Identification (Identified



((

Equations)Equations)

y

Underidentification empirical underidentification or

structural underidentification

y Empirical underidentification occurs when a parameterestimate that establishes model identification has a very small

(close to zero) estimate.

y A path coefficient whose value is estimated as being close to

zero may be treated as zero by the SEM program's matrix

inversion algorithm. If that path coefficient is necessary to

identify the model, the model thus becomes underidentified.y Remedy for Empirical underidentification - collect more data or

respecify the model

y Remedy for Structural underidentification - respecify the model



Examples of Identified ModelExamples of Identified Model

Case 1: Let say we have an equation x + 2y = 7

Question: Is this equation / model identified?

Answer : No, it is µunderidentified¶ because there are aninfinite number of solutions for x and y (e.g., x = 5 and y

=1, or x = 3 and y = 2). T

hese values are thereforeunderidentified because there are fewer "knowns" than"unknowns."

Case 2: Let say we have a set of equations

x + 2y = 7

3 x - y = 7

Question: Is this equation / model identified?

Answer : Yes, it is µjust-identified¶ model as there are asmany knowns as unknowns. There is one best pair of values (x = 3 y = 2)

Download - Statistical Modelling 2

Top Related