Download - Statistical Modelling 2
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 1/89
Statistical Modelling(Special Topic: SEM)
Statistik Terapan dan Manajemen Data.Email : [email protected]
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 2/89
Relationship Analysis
Awareness on the fact that somerelationships / models are ¶meaningful· and
some are not.
Meaningful relationships / modelsnormally have theoretical basis
(underlying theory) and exhibit ¶causality·or ¶cause-and-effect·
For those ¶cause-and-effect· relationships,
SEM provides a formal way of analysing
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 3/89
Agenda Agenda
Part I: SEM the Basic
i. SEM Nomenclature / Terminologies
ii. SEM & related Models
Part II: Modeling and Computingi. how to draw a model using AMOS.
ii. how to run the AMOS model and evaluate severalkey components of the AMOS graphics and textoutput, including overall model fit and test statistics
for individual path coefficients.iii. how to modify and respecify a non-fitting model.
Part III: SEM and Its Applications
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 4/89
Part One
SEM: The Basichttp://58.26.137.12/byatim/
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 5/89
SEM
Is a statistical methodology of the analysis
of a structural theory that bears on some
phenomenon using a confirmatory
(hypothesis testing) approach. Most othermultivariate procedures are descriptive/exploratory in nature.
The theory represent ́ causalµ processesthat generate observations on multiple
variables.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 6/89
SEM
conveys 2 important aspects of the procedures..The causal processes under study are represented by a
series of structural equations, and
These structural equations can be modeled pictorially
to enable a clearer conceptualization of the theoryunder study.
The model can be tested simultaneously todetermine the extent to which it is consistent
with the data: if the goodness of fit adequate, themodel is not rejected, otherwise thehypothesized relations rejected.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 7/89
SEM: A Note
SEM is a very general, very powerful and very popular multivariate analysis technique.
It provides a comprehensive method for thequantification and testing of theories.
Been applied in econometric, psychology, sociology,political science, education, market and medicalresearch etc.
Also known ascovariance structure analysis,
covariance structure modeling,latent vaviable modeling,
confirmatory factor analysis,
linear structural relationship and
analysis of covariance structures.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 8/89
SEM is ««
a family of statistical techniqueswhich incorporates and
integratesPath analysis
Linear regression
Factor analysis
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 9/89
SEM
serves purposes similar to multipleregression, but in a more powerful way which takes into account the modeling
of interactions, nonlinearities, correlated
independents, measurement error, correlated errorterms, multiple latent independents each measured bymultiple indicators, and one or more latent dependentsalso each with multiple indicators.
may be used as a more powerful
alternative to multiple regression, pathanalysis, factor analysis, time series analysis,and analysis of covariance. Theseprocedures are special cases of SEM, or,
is an extension of the general linear model
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 10/89
Advantages of SEM compared to
multiple regression
more flexible assumptions (particularly allowing interpretationeven in the face of multicollinearity),
use of confirmatory factor analysis to reduce measurementerror by having multiple indicators per latent variable,
the attraction of SEM's graphical modeling interface, thedesirability of testing models overall rather than coefficientsindividually,
the ability to test models with multiple dependents,
model mediating variables, model error terms,
test coefficients across multiple between-subjects groups, and
handle difficult data (time series with autocorrelated error, non-normaldata, incomplete data).
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 11/89
Major applications of structural
equation modeling1. causal modeling, or path analysis - hypothesizes causalrelationships among variables and tests the causal models with alinear equation system. Causal models can involve either manifestvariables, latent variables, or both;
2. confirmatory factor analysis - extension of factor analysis in whichspecific hypotheses about the structure of the factor loadings and
intercorrelations are tested;3. regression models, in which regression weights may be constrained
to be equal to each other, or to specified numerical values;
4. covariance structure models, which hypothesize that a covariancematrix has a particular form. For example, you can test thehypothesis that a set of variables all have equal variances with this
procedure;5. correlation structure models, which hypothesize that a correlation
matrix has a particular form.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 12/89
Aims and Objectives
By the end of this course you should: Have a working knowledge of the principles
behind causality. Understand the basic steps to building a model
of the phenomenon of interest. Be able to construct/ interpret path diagrams. Understand the basic principles of how models
are tested using SEM. Be able to test models adequacy using SEM Be able to use AMOS intelligently.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 13/89
SEM: Another Note
Assumption 1: you are familiar with thebasic logic of statistical reasoning asdescribed in Elementary Concepts.
Assumption 2: you are familiar with theconcepts of variance, covariance,correlation and regression analysis; if not,you are advised to read the BasicStatistics.
It is highly desirable that you have somebackground in factor analysis beforeattempting to use structural modeling.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 14/89
Introduction to SEMIntroduction to SEM
How Useful is Statistical Model?
The Basic Idea Behind SEM
Causality (Cause-and-EffectRelationship)
SEM Nomenclature/Terminologies
SEM & related Statistical Models
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 15/89
How Useful is Statistical Model?How Useful is Statistical Model?
y ³ All models are wrong, but some areuseful ́
G.E.P Box
y ³SEM models can never be accepted(as absolute truth); they can only failto be rejected. ³ This leads researchers to provisionally
acce pt a given model.
While models that fit the data well canonly be provisionally accepted, modelsthat do not fit the data well can be
absolutely rejected.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 16/89
The Basic Idea Behind SEM
y In Distribution Theory course you aretaught that, if you multiply every number ina list by some constant K, you multiply the
mean of the numbers by K. Similarly, youmultiply the standard deviation by theabsolute value of K.
y Suppose you have the list of numbers 1,2,3
having a mean of 2 and a standarddeviation of 1. Suppose also you take these3 numbers and multiply them by 4. Thenthe mean would become 8, and thestandard deviation would become 4, the
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 17/89
The Basic Idea Behind SEM
y The point is, if you have a set of numbers Xrelated to another set of numbers Y by theequation Y = 4X, then the variance of Y must be16 times that of X, so you can test the
hypothesis that Y and X are related by theequation Y = 4X indirectly by comparing thevariances of the Y and X variables.This ideageneralizes, in various ways, to several variables
inter-related by a group of linear equations.Therules become more complex, the calculationsmore difficult, but the basic message remains thesame -- you can test whether variables areinterrelated through a set of linear relationshipsby examining the variances and covariances of
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 18/89
The Basic Idea Behind SEMStatisticians have developed procedures for testing whether a set of variances and covariances in a covariance matrix fits a specifiedstructure.The way SEM works is as follows:
1. You state the way that you (the theory) believe the variables areinter-related, often with the use of a path diagram.
2. You (AMOS) work out, via some complex internal rules, what theimplications of this are for the variances and covariances of thevariables.
3. You test whether the variances and covariances fit this model of them.
4. Results of the statistical testing, and also parameter estimates and
standard errors for the numerical coefficients in the linear equations are reported.
5. On the basis of this information, you decide whether the modelseems like a good fit to your data.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 19/89
A Simple SEM
SEM is an attempt to model causal
relations between variables by including
all variables that are known to have some
involvement in the process of interest« test the effect of a drug on some
psychological disorder (e.g. obsessive
compulsive disorder, OCD)
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 20/89
CausalityCausality
Causality has theoretical basis
Education
Succes
s inLife
Price Deman
d
Suppl
y
Unemp-
loyment
Rate
No. of
Crimes
Windows of
Opportunity
for Crime
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 21/89
Cause and Effect
Philosophers have had a great deal to say
about the conditions necessary to infer
causality. Cause and effect
should occur close together in time,
cause should occur before an effect isobserved, and
the cause should never occur without thepresence of the effect.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 22/89
John Stuart Mill (1865) described three
conditions necessary to infer cause
Cause has to precede effect Cause and effect must be related
All other explanations of the cause-effect
relationship must be ruled out.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 23/89
To verify the third criterion, Mill
proposed the method of agreement which states that an
effect is present when the cause is present;
method of difference which states that whenthe cause is absent the effect will be absent alsoand;
method of concomitant variation which
states that when the above relationships areobserved, causal inference will be made strongerbecause most other interpretations of the cause-effect relationship will have been ruled out.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 24/89
Example:If we wanted to say that me talking about causality causes
boredom, we would have to satisfy the following conditions:(1) I talk about causality before boredom occurs.
(2) Whenever I talk about causality, boredom occurs shortlyafterwards.
(3) The correlation between boredom and my talking about
causality must be strong (e.g. 4 out of 4 occasions when I talk about causality boredom is observed)
. (4) When cause is absent effect is absent: when I don·t talk aboutcausality no boredom is observed.
(5) The manipulation of cause leads to an associatedchange in effect. So, if we manipulated whether someone islistening to me talking about causality or to my cat is mewing, theeffect elicited should change according to the manipulation.
This final manipulation serves to rule out external variables thatmight affect the cause-effect relationship.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 25/89
Continue«
in situations in which cause cannot be
manipulated we cannot make causal
attributions about our variables.
Statistically speaking, this means thatwhen we analyze data from non-experimental situations we cannot
conclude anything about cause an effect. Structural Equation Modeling (SEM) is an
attempt to provide a flexible framework within which causal models can be built.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 26/89
Statistical ModelingStatistical Modeling
A Statistical Model DOES NOT necessarily havetheoretical basis ² It may be interpreted as either¶make sense· or ¶nonsense·
No. of
Road
Accidents
No. of
Newspape
r Readers
Heart
Disease
Weight
IncomeSmokin
g
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 27/89
SEM & Related StatisticalSEM & Related Statistical
Models
Modelsy General Linear Model (GLM)
y Regression Model
y Time Series Model
y Log-linear Model
y Mixed Models
y Survival Models
y Many more «
All these Statistical Models may or may
not have theoretical basis
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 28/89
GeneralIntelligence
y4
e4
1
1
y3
e3
1
y2
e2
1
y1
e1
1
Quantitative Ability
x8
d8
1
1
x7
d7
1
x6
d6
1
x5
d5
1
Verbal
Ability
x4
d4
1
1
x4
d3
1
x2
d2
1
x1
d1
1
Exogenous Latent Variable /Construct
Endogenous Latent Variable
Exogenous Latent Variable
Indicators
Indicators
Indicators
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 29/89
SEM NomenclatureSEM Nomenclature
y Independent variables, which are assumed tobe measured without error, are calledexogenous or upstream variables;
y Dependent or mediating variables are called
endogenous or downstream variables.y Manifest or observed variables or indicators
are directly measured by researchersy Latent or unobserved variables are not
directly measured but are inferred by therelationships or correlations amongmeasured variables in the analysis. Example,self-concept, motivation, powerlessness,
anomie, verbal ability, capitalism, socialclass.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 30/89
SEM Nomenclature (cont.)SEM Nomenclature (cont.)
y SEM illustrates relationships among
observed and unobserved variables
using path diagrams.
y Ovals or circles represent latentvariables,
y Rectangles or squares represent
measured variables. y Residuals are always unobserved, so
they are represented by ovals or
circles.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 31/89
SEM SEM ± ± DefinitionDefinition
SEM is an extension of the general linearmodel (GLM) that enables a researcher totest a set of regression equationssimultaneously.
SEM consists of TWO components;
Structural Model illustrates the relationships among the latent
constructs or endogenous variables Measurement Model represents how the constructs are related to
their indicators or manifest variables
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 32/89
ExampleExample
Ability /
IntelligenceAspirations Achievement
1L
2
L1
\
ExogenousLatent
Construct
EndogenousLatent
Construct
EndogenousLatent
Construct
In psychology, the theorypostulates that «
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 33/89
Full Latent Variable Model
Ability Aspiration Achievement
InterpersonalSkill, x2
Peer¶sInfluence
y3
FamilyStatus, y1
Father¶sOccupation,
y2
ProfessionalStatus, y5
SocialStatus,
y6
AcademicSkill, x1
CommunicationSkill, x3
PersonalActualization,
y4
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 34/89
Example:ONE Latent (unobserved) Exogenous Variable &
TWO Latent (unobserved) Endogenous Variables
1\ 1L 2L
1 x
2 x
3 x
1 y 2
y3
y4
y5
y6
y
11P
21P
21 F
x
11P
x
21P
x
31P
y
11P
y
21P
y
31P y
42P
y
52P
y
62P
1 ^ 2
^
1H
2H
3H 1
I2
I3
I3
I4
I6
I
Structural
Model
Measurement
Model
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 35/89
Structural ModelStructural Model
y The structural model allows for certainrelationships among the latent variables,depicted by lines or arrows (in a path diagram)
y In the path diagram, we specified that Ability andAchievement were related in a specific way. Thatis, intelligence had some influence on laterachievement.
y Thus, one result from the structural model is an
indication of the extent to which these a priorihypothesized relationships are supported byour sample data.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 36/89
Structural ModelStructural Model (Cont.)(Cont.)
y The structural equation addresses the
following questions:
y Are Ability and Achievement related?
y Exactly how strong is the influence of
Ability on Achievement?
y Could there be other latent variables that
we need to consider to get a betterunderstanding of the influence on
Achievement?
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 37/89
Example:ONE Latent (unobserved) Exogenous Variable &
TWO Latent (unobserved) Endogenous Variables
1\ 1L 2L
1 x
2 x
3 x
1 y 2
y3
y4
y5
y6
y
11P
21P
21 F
x
11P
x
21P
x
31P
y
11P
y
21P
y
31P y
42P
y
52P
y
62P
1 ^ 2
^
1H
2H
3H 1
I2
I3
I3
I4
I6
I
Structural
Model
Measurement
Model
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 38/89
Mathematical Form of StructuralMathematical Form of Structural
ModelModel
^L\L
^
^
L
L
F\
P
P
L
L
^L F\PL
^\PL
+!
¹¹ º
¸©©ª
¨¹¹
º
¸©©ª
¨¹¹ º
¸©©ª
¨¹¹
º
¸©©ª
¨!¹¹
º
¸©©ª
¨
!
!
B
2
1
2
1
21
1
21
11
2
1
21211212
11111
0
00
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 39/89
Measurement ModelMeasurement Model
y Specifying the relationship between thelatent variables and the observed variables
y Answers the questions:1) To what extent are the observed variables
actually measuring the hypothesized latent
variables?2) Which observed variable is the best
measure of a particular latent variable?3) To what extent are the observed variables
actually measuring something other than
the hypothesized latent variable?y Using Exploratory Factor Analysis (EFA) or
Confirmatory Factor Analysis (CFA) todetermine the significant observedvariables related to each of the latentvariables
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 40/89
Exploratory FA (EFA)Exploratory FA (EFA)
In EFA the factor structure or theory about aphenomenon is NOT KNOWN.
For example, the researcher is interested inmeasuring ́ the achievement of a personnelµ.
Suppose he has no knowledge ( very little theory)regarding the factors that contribute to achievement
the no. of indicators of each factor
which indicators represent which factor
In such a case, the researcher may collect dataand ¶explore· for a factor or theory which canexplain the correlations among the indicators.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 41/89
Confirmatory FA (CFA)Confirmatory FA (CFA) In CFA the precise factor structure or theory about a
phenomenon is KNOWN or specified priori.
For example, a researcher is interested in measuring´consumer preferenceµ to a product.
Suppose that ¶based on previous research· it is hypothesized
(the theory) that a construct or factor to measure¶consumer preference· is a one-dimensional construct with 7 indicators or items as its
measures
The obvious question is: How well do the empirical data conform to the theory of
consumer preferences? Or How well do the data fit the model?
In such a case, CFA is used to do empirical ¶confirmation· or¶testing· of the theory
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 42/89
Using Factor AnalysisUsing Factor Analysis
Ability
Academic
Skill
Inter-
personal
Skill
Communi-cation Skill
x1
x2
x3
1\
FactorLoadings
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 43/89
Using Factor AnalysisUsing Factor Analysis
Aspiration
Family
Status
Father¶s
Occupation
Peer¶s
Influence
y1
y2
y3
1L
FactorLoadings
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 44/89
Using Factor AnalysisUsing Factor Analysis
Achievement
Personal
Actualisation
Professional
Status
Social
Status
y4
y5
y6
2L
FactorLoading
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 45/89
Measurement ModelMeasurement Model (Cont.)(Cont.)
y The relationships between the observedvariables and the latent variables aredescribed by factor loadings
y Factor loadings provide information about
the extent to which a given observedvariable is able to measure the latentvariable. They serve as validity coefficients.
y Measurement error is defined as thatportion of an observed variable that is
measuring something other than what thelatent variable is hypothesized to measure. It serves as a measure of reliability.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 46/89
Measurement ModelMeasurement Model (Cont.)(Cont.)
yMeasurement error could be the result
of:
1. An unobserved variable that is
measuring some other latent variable
2. Unreliability
3. A second-order factor
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 47/89
Mathematical Form of Mathematical Form of
Measurement ModelMeasurement Model
H\
H
H
H
\
P
P
P
H\P
H\P
H\P
50!
¹
¹¹
º
¸
©
©©
ª
¨
¹
¹¹
º
¸
©
©©
ª
¨
!
¹
¹¹
º
¸
©
©©
ª
¨
!!
!
x
x
x
x
x
x
x
x
or
x
x
x
x
x
x
3
2
1
1
31
21
11
3
2
1
31313
21212
11111
How the latent (unobservable) exogenous variable
are related to their indicators or manifest/observed
variables x1,x2 & x3
1\
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 48/89
Measurement Model (cont.)Measurement Model (cont.)
IL
I
I
I
I
I
I
L
L
P
P
P
P
P
P
ILP
ILP
ILP
ILP
ILP
ILP
50!
¹
¹¹¹¹¹¹¹
º
¸
©
©©©©©©©
ª
¨
¹¹ º
¸©©ª
¨
¹
¹¹¹¹¹¹¹
º
¸
©
©©©©©©©
ª
¨
!
¹
¹¹¹¹¹¹¹
º
¸
©
©©©©©©©
ª
¨
!
!
!
!
!
!
y
y
y
y
y
y
y
y
y
y
y
y
y
yor
y
y
y
y
y
y
formmatrixin
y
y
y
y
y
y
6
5
4
3
2
1
2
1
62
52
42
31
21
11
6
5
4
3
2
1
62626
52525
42424
31313
21212
11111
0
0
0
0
0
0
How the TWO latent (unobservable)
constructs or endogenous variables ,
are related to their indicators or manifest
variables y1
, ..y6
1L
2L
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 49/89
Full Latent Variable Model
Ability Aspiration Achievement
InterpersonalSkill, x2
Peer¶sInfluence
y3
FamilyStatus, y1
Father¶sOccupation,
y2
ProfessionalStatus, y5
SocialStatus,
y6
AcademicSkill, x1
CommunicationSkill, x3
PersonalActualization,
y4
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 50/89
Example:ONE Latent (unobserved) Exogenous Variable &
TWO Latent (unobserved) Endogenous Variables
1\ 1L 2L
1 x
2 x
3 x
1 y 2
y3
y4
y5
y6
y
11P
21P
21 F
x
11P
x
21P
x
31P
y
11P
y
21P
y
31P y
42P
y
52P
y
62P
1 ^ 2
^
1H
2H
3H 1
I2
I3
I3
I4
I6
I
Structural
Model
Measurement
Model
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 51/89
Reliabilityy Definition: Extent to which a variable or set of
variables or set of variables is consistent in
what it is intended to measure
y If multiple measurement are taken, the reliable
measures will all be consistent in their valuesy It is a degree to which the observed variable
measure the ́ trueµ value and is ́ error freeµ
y It is different from validity
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 52/89
True Score and Measurement
Error True score: a component which indicates
the subject actually stands on the variable
(statement) of interest Measurement error: A component which
indicates the inaccuracies when measuring´true scoresµ due to fallibility of survey
instrument, responses scales, data entryor respondent error
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 53/89
ReliabilityReliability
y The degree to which scores are free from
random measurement error
y Reliability measures
Internal Consistency Reliability
Test-retest Reliability
Alternate Forms Reliability
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 54/89
ReliabilityReliability
y Levels of Reliability
0.90 Excellent
0.80 Very Good
0.70 Adequate
<0.70 Poor
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 55/89
Example:Reliability of ObservedExample:Reliability of ObservedVariablesVariablesC ronbach¶s al pha were com puted for the all variables
Variable No. of items Reliability
Variable1 10 .91
Variable2 10 .87
Variable3 10 .58
Variable4 10 .70
Variable5 12 .72
Variable6 12 .80
Variable7 12 .80
Variable8 12 .87
Variable9 10 .84
Variable10 7 .71
Variable11 4 .48
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 56/89
Summated Scale Reliability
y When reliability involves multiple scaled
items, reliability must be measured in a
summated scale.
y A summated rating scale is a short list of statements, questions or other items that
the subject responds to.
y A summated is a sum of responses from alist of statements to create an overallscore.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 57/89
Reliability coefficient (1)
y There are several ways to measure
reliability which will be discussed later.
y The measurement is normally called the
reliability coefficient.y This coefficient is the percent of variance
in an observed variable that is accounted
for by the true scores of the underlyingconstruct.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 58/89
Reliability Coefficient (2)y Imagine you have collected 2 scores from a survey
True and observed scores of customer satisfaction
y You compute the correlation between thescores
y The square of correlation coefficient will be
your reliability coefficient which is
The total variances explained in the observed scores by
the true score or The percent of variance in observed scores that is
accounted for by true scores.
Types of Reliability
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 59/89
Types of Reliability
Test-retest
Assessed by administering the same instrument tothe same sample respondent at two points in time,and computing the correlation between two sets of scores.
Internal consistency reliabilityThe extent to which individual items that constitute atest correlate with one another or with the test total.In short, it measures how consistently respondents
respond to the items within scale.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 60/89
Types of Reliability (2) For example, if the first half of an instrument is educational
items which correlate highly among themselves and second ispolitical items which correlate highly among themselves., theinstrument would have high internal consistency anyway, eventhough they are two distinct dimensions
Note that measure of internal consistency are oftencalled measures of ́ internal consistency reliabilityµor even ́ reliability,µ but this merge the distinctconcepts of internal consistency and reliability, whichnecessarily go together
How do we solve this problem? The most commonly used internal consistency reliability is
Cronbach·s Alpha
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 61/89
Validity
Definition: extent to which an item or setof items correctly represent theconstruct of study- the degree of which it
is free from any systematic or non-random error
Validity deals withHow well the construct is defined by the
item/s (what should be measured) While Reliability deals withHow consistent the item/s is/are measuring
the construct (HOW it is measured)
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 62/89
ValidityValidity
Whether the scores measure what they
are supposed to measure
Types of validity
Construct Validity (SEM Confirmatory FactorAnalysis helps to establish construct validity)
Criterion-Related Validity (Correlation with anexternal standard)
Convergent Validity/ Discriminant Validity (Can bedetermined through SEM Confirmatory FactorAnalysis)
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 63/89
Examples Example 1:How happy are you?
This example is validity -whether the measure
accurately represents what it is supposed to
measure
Example 2:How happy are you when you aresmoking? Ask this question repeatedly on the
same subject or multiple subject and see how
consistent their answers are?This example is about reliability (sometimes I·dlike to call it consistency)
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 64/89
I Am an Indicator Lyrics by Alan Reifman
(May be sung to the tune of "The Entertainer," Billy Joel)
I am an indicator, a latent construct I represent,I'm measurable, sometimes pleasurable,A manifestation of what is meant,
I am an indicator, I usually come in a multiple set,With other signs of the same construct, you may instruct,I'm correlated with my co-indicators, you can bet,
I am an indicator, from my presence the construct is inferred,I'm tap-able, the construct is not palpable,
The distinction should not be blurred
At L t Th
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 65/89
At Least ThreeLyrics by Alan Reifman(May be sung to the tune of "Think of Me," Lloyd Webber/Hart/Stilgoe,
fromPh
antom of the Opera)
At least three, indicators are urged,
For each latent construct shown,At least three, indicators should help,Avoid output where you groan,
With less than three, your construct sure willbe, locally unidentified,Though the model might still run, you couldhave a rough ride
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 66/89
Total, Direct and IndirectTotal, Direct and Indirect
EffectsEffectsy There is a direct effect between two latent
variables when a single directed line or arrowconnects them
y There is an indirect effect between two
variables when the second latent variable isconnected to the first latent variable throughone or more other latent variables
y The total effect between two latent variables is
the sum of any direct effect and all indirecteffects that connect them.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 67/89
Example: Direct and Indirect «Example: Direct and Indirect «
Ability /Intelligence
Aspirations Achievement
1L
2L
1\
ExogenousLatent
Construct
EndogenousLatent
Construct
EndogenousLatent
Construct
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 68/89
Semantics
Types of measurement scale
Metric and Non-metric
Correlation coefficient
Correlation and Covariance Matrix
Standardized and Un-standardizedEstimates
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 69/89
Types of Measurement Scale
y There 4 types of measurement scale in ascale instrument Nominal Scale Ordinal
Interval Scales Ratio
y Some other common scales like Likertscales, Semantic Differential Scales,
Dichotomous Scales etc can becategorized into the 4 above
y This is important as assumptions on SEMrely on what we know on this page
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 70/89
Metric and Non-metric Scales
y Metric scales are quantitative data wherethe parameters of the scale is continuum Interval or Ratio scale data
y
Non-metric scales are qualitative datawhere attributes, characteristics orcategorical properties that identify ordescribe a subject or object
Possibly Nominal or Ordinal scale datay But the use of metric and non-metric
scales can be misused or abusedsometime«how?
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 71/89
VARIABLE SC ALESVARIABLE SC ALES
y SEM in general assumes observed variables aremeasured on a linear continuous scale
y Dichotomous and ordinal variables cause
problems because correlations /covariances tendto be truncated. These scores are not normallydistributed and responses to individual items maynot be very reliable.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 72/89
Correlation
y Perhaps the most basic semantic Definition: the linear relationship of two
variables
y The strength of relationship is determined
by the correlation coefficient and r(explained later)
y There are 2 common types of correlationcoefficient
Pearson Product Moment Correlation(Interval)
Spearman Ranking Correlation (Ordinal)
y The former is the one we will use in this
course
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 73/89
Correlation Matrix (1)
y The correlation matrix of n random variables X1,«,Xnis the n × n matrix whose i,j entry is corr( Xi,Xj)
y If the measurement of correlation used are product-moment coefficients, the correlation matrix is the sameas the covariance matrix of the standardized randomvariables Xi/SD(Xi) for i=1,«,n
y Consequently it is necessary a non-negative definitematrix ² important assumption
y The correlation matrix is symmetric because the
correlation between Xi and Xj is the same as thecorrelation between Xj and Xi
Correlation Matrix (2)
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 74/89
Correlation Matrix (2)A1 A2 A3 A4 A5 A6 A7 B1 B2 B3
A1
a1
1.0000 0.65579
<.0001
0.46296
<.0001
0.58812
<.0001
0.62082
<.0001
0.62629
<.0001
0.64288
<.0001
0.34385
0.0004
0.57904
<.0001
0.56353
<.0001
A2
a2
0.65579
<.0001
1.00000 0.45951
<.0001
0.66297
<.0001
0.72727
<.0001
0.77384
<.0001
0.76693
<.0001
0.40987
<.0001
0.67796
<.0001
0.59493
<.0001
A3
a3
0.46296
<.0001
0.45951
<.0001
1.00000 0.51913
<.0001
0.46652
<.0001
0.45752
<.0001
0.44520
<.0001
0.33407
0.0006
0.35833
0.0002
0.33623
0.0006
A4
a4
0.55812
<.0001
0.66297
<.0001
0.51913
<.0001
1.00000 0.69905
<.0001
0.64969
<.0001
0.59358
<.0001
0.34148
0.0004
0.58859
<.0001
0.44284
<.0001
A5
a5
0.62082
<.0001
0.72727
<.0001
0.46652
<.0001
0.69905
<.0001
1.00000 0.67281
<.0001
0.66939
<.0001
0.31277
<.0014
0.63133
<.0001
0.54744
<.0001
A6
A6
0.62629
<.0001
0.77384
<.0001
0.45752
<.0001
0.64969
<.0001
0.67281
<.0001
1.00000 0.86014
<.0001
0.40483
<.0001
0.66758
<.0001
0.56944
<.0001
A7
A7
0.64288<.0001
0.76693<.0001
0.44520<.0001
0.59358<.0001
0.66939<.0001
0.86014<.0001
1.00000 0.39913<.0001
0.68141<.0001
0.62075<.0001
B1
b1
0.34385<.0004
0.40987<.0001
0.33407<.0006
0.34148<.0004
0.31277<.0014
0.40483<.0001
0.39913<.0001
1.00000 0.58187<.0001
0.62583<.0001
B2
b2
0.57904<.0001
0.67796<.0001
0.35833<.0002
0.58859<.0001
0.63133<.0001
0.66758<.0001
0.68141
<.0001
0.58187
<.0001
1.00000 0.85335<.0001
B3
b3
0.56353
<.0001
0.59493
<.0001
0.33623
<.0006
0.44284
<.0001
0.54744
<.0001
0.56944
<.0001
0.62075
<.0001
0.62583
<.0001
0.85335
<.000
1.00000
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 75/89
Correlation Matrix (3)
y So we say that
If the input matrix used is the CovarianceMatrix: the estimated coefficients in the
parameters measured are unstandardizedestimates
If the input matrix used is the CorrelationMatrix: the estimated coefficients in the
parameters measured are the standardizedestimates
y So what?
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 76/89
Covariance
The covariance between two variables
equals the correlation times the product
of the variables' standard deviations. Thecovariance of a variable with itself is thevariable's variance
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 77/89
Correlation Matrix (4)
y Therefore when we want to test a theory,we use variance-covariance matrix (to validate the causal relationships among
constructs)
y When we just want to explain the patternof the relationships then we usecorrelation matrix
(Theory testing is not required)
F t Eff ti C l ti /
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 78/89
Factors Effecting Correlation/
Covariance Coefficienty Type of scale and range of values
Pearson correlation is basis for analysis in regression, path, factor analysis andSEM.Hence data must be in metric form.
There must be enough variation in scores to allow correlation relationship tomanifest.
y Linearity Pearson correlation coefficient measures degree of linear relationship between
two variables, hence need to test linearity.
y Sample size SEM requires big sample size.Rule of thumb: 10-20 times the number of
variables. Ding, Velicer and Harlow (1995): 100-150; Boomsma (1982,1983): 400;Hu, Bentler and Kano (1992): in some cases 5000 is still insufficient; Schumaker,Lomax (1999) many articles 250-500. Bentler and Chou (1987): for normal data5 subjects per variable is sufficient.
C i
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 79/89
Covariance
Lyrics by Alan Reifman (May be sung to the tune of "Aquarius," Rado/Ragni/MacDermot, from Hair , also
popularized by the Fifth Dimension)
You draw paths to show relationships,You hope align with the known r ·s,Your model will guide the tracings,From constructs near to constructs far,
You will compare this with the data·s covariance,The data·s covariance...Covariance!Covariance!
Similar to correlation,With the variables unstandardized,
Does each known covariance match up with,The one the model tracings will derive?
Covariance!Covariance!
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 80/89
SEM AssumptionsSEM Assumptions
Sample Size
y a good rule of thumb is >15 cases perpredictor / indicator ( James Stevens·
Applied Multivariate Statistics for the Social Sciences)
y Model with TWO factors,
recommended sample size >100y Model with FOUR factors,
recommended sample size > 200
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 81/89
SEM Assumptions (cont.)SEM Assumptions (cont.)
Sample Size
y Consequences of using smaller samples
convergence failures (the software cannot
reach a satisfactory solution), improper solutions (including negative error
variance estimates for measured variables),
lowered accuracy of parameter estimates
and, in particular, standard errorsy SEM program standard errors are
computed under the assumption of large
sample sizes.
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 82/89
SEM Assumptions (cont.)SEM Assumptions (cont.)
Normality
y Many SEM estimation proceduresassume multivariate normal distributions
y
Lack of univariate normality occurs whenthe skew index is > 3.0 and kurtosisindex > 10.
y Multivariate normality can be detected by
indices of multivariate skew or kurtosisy Non-normal distributions can sometimes
be corrected by transforming variables
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 83/89
SEM Assumptions (cont.)SEM Assumptions (cont.)
Multicollinearity Occurs when intercorrelations among some
variables are so high that certain mathematicaloperations are impossible or results are unstablebecause denominators are close to 0.
Bivariate correlations >0.85; Multiple correlations>0.90
May cause a non-positive definite/ singular covariancematrix
May be due to inclusion of individual and compositevariables
Detection; Tolerance = 1-R2 ,0.10;
Variance Inflation Factor (VIF) = 1/(1-R2) >10 Can be corrected by eliminating or combiningredundant variables
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 84/89
SEM Assumptions (cont.)SEM Assumptions (cont.)
Outliersy Univariate outliers more than three SDs away
from the meany Detection by inspecting frequency
distributions and univariate measures of skewness and kurtosis
y Multivariate outliers may have extremescores on two or more variables or their figurations of scores may be unusual
y Detection by inspecting indices of multivariateskewness and kurtosis. MahalanobisDistance squared is distributed as chi square
with df equal to the number of variables.y Can be remedied by correcting errors or by
dropping these cases of transforming thevariables
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 85/89
VIOLATIONS OF ASSUMPTIONS(1)VIOLATIONS OF ASSUMPTIONS(1)
y The best known distribution with nokurtosis is the multi-normal.
y
Leptokurtic (more peaked) distributionsresult in too many rejections of Ho
based on the Chi square statistic.
y
Platykurtic distributions will lead to toolow estimates of Chi Square.
VIOLATIONS OF ASSUMPTIONSVIOLATIONS OF ASSUMPTIONS
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 86/89
VIOLATIONS OF ASSUMPTIONSVIOLATIONS OF ASSUMPTIONS
(2)(2)
y High degrees of skewness lead to
excessively large Chi square estimates.
y In small samples (N<100), the Chi square
statistic tends to be too large.
M d l Id tifi ti (Id tifi dM d l Id tifi ti (Id tifi d
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 87/89
Model Identification (IdentifiedModel Identification (Identified
Equations)Equations)
y Identification refers to the idea that there is atleast one unique solution for each parameter estimate in a SEM model.
y Models in which there is only one possiblesolution for each parameter estimate are said to
be j ust-identified.y Models for which there are an infinite number of
possible parameter estimate values are said tobe underidentified.
y Finally, models that have more than one possible
solution (but one best or optimal solution) for each parameter estimate are consideredoveridentified .
Model Identification (IdentifiedModel Identification (Identified
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 88/89
((
Equations)Equations)
y
Underidentification empirical underidentification or
structural underidentification
y Empirical underidentification occurs when a parameterestimate that establishes model identification has a very small
(close to zero) estimate.
y A path coefficient whose value is estimated as being close to
zero may be treated as zero by the SEM program's matrix
inversion algorithm. If that path coefficient is necessary to
identify the model, the model thus becomes underidentified.y Remedy for Empirical underidentification - collect more data or
respecify the model
y Remedy for Structural underidentification - respecify the model
8/3/2019 Statistical Modelling 2
http://slidepdf.com/reader/full/statistical-modelling-2 89/89
Examples of Identified ModelExamples of Identified Model
Case 1: Let say we have an equation x + 2y = 7
Question: Is this equation / model identified?
Answer : No, it is µunderidentified¶ because there are aninfinite number of solutions for x and y (e.g., x = 5 and y
=1, or x = 3 and y = 2). T
hese values are thereforeunderidentified because there are fewer "knowns" than"unknowns."
Case 2: Let say we have a set of equations
x + 2y = 7
3 x - y = 7
Question: Is this equation / model identified?
Answer : Yes, it is µjust-identified¶ model as there are asmany knowns as unknowns. There is one best pair of values (x = 3 y = 2)