lecture 1: correlations and multiple regression aims & objectives -should know about a variety...
TRANSCRIPT
Lecture 1: Correlations and multiple regression
Aims & Objectives
-Should know about a variety of correlational techniques
-Multiple correlations and the Bonferroni correction
-Partial correlations
-3 type of multiple regression
-Simultaneous-Stepwise-Hierarchical
Questions & techniques
• What is the association between a set of variables
• This takes a number of multi-variate forms– Associations between a number of variables
• (multiple-correlations)
– Associations between 1 variable (DV) and many variables (IVs) – MODEL BUILDING
• regression and partial correlations
– Associations between 1 set of variables and another set of variables • canonical correlations
Correlations
Vary between –1 and 1
+1
-1
Low High
Low
High
Types of correlation
• Pearson’s (Interval and ratio data)
• Spearman’s (Ordinal data)
• Phi (both true dichotomies)
• Tau (rating)
• Biserial (Interval & dichotomised)
• Point-biserial (interval & true dichotomy)
Factors affecting correlations
• Outliers
• Homoscedecence
• Restriction of range
• Multi-collinearity
• Singularity
Outliers
Outlier or influential pointCook’s distance of 1 or greater
HomoscedasticityWhen the variability of scores (errors)in one continuous variable is the samein a second variable
At group level data this is Termed homogeneity of variance
Heteroscedasicity
One variable is skew or the relationship is non-linear
Singularity & Multicollinearity
• Singularity:– when variables are redundant, one variable is a combination of two or more other
variables.• Multi-collinearity:
– when variables are highly correlated (.90+). For example two measures of IQ• Problems
– Logical: Don’t want to measure the same thing twice.– Statistical: Singularity prevents matrix inversion (division) as determinants = zero,
for multi-collinearity determinant zero to many decimal places • Screening
– Bivariate correlations– Examine SMC: large = problems– Tolerance (1 – SMC)
• Solutions: – Composite score– Remove 1 variable
IQ: Multi-collinearity & Singularity
IQ1
Verbal Spatial Memory Maths
Total IQ is singular with its own sub-scales (total is a function of combining subscales
One total IQ test (MD5) is multicolinear with another (MAT)
IQ2
Multicolinear
Singular
Multiple correlationsStress ES Cont Dep
Stress 1
ES .32* 1
Cont .24* .12 1
Dep .23* .62* .43* 1
Partial correlations
DV
IV1
IV2
ac
b
d
Neuroticism
Stress
Depression
Partial r Neuroticism (N) = once the overlap of stress with N and the Stress with Depression is removed
Semi-partial r for N = once overlap of Stress with N is removed
[N]
[S][Dep]
Bonferroni correction
• With multiple r matrix [R] or many (k) IVs in regression analysis then the possibility of chance effects increases
• Correct the level (0.05/N)
• Correct for the number of effects expected by chance = * N (0.05 * N)
Multiple regression
Y
X
eA XBXBXBY kk ,...,
2211
`
B
A (intercept)
(slope)
Regression assumptions
• N:IVs ratio – Assume medium effect size
• for Multiple Correlations N > 50 + 8m (m = N of IVs)• For simple linear regression N > 104 + m
– (8/f2) + (m – 1). Where f2 = ES = .10, .15 – or f2 = .35
• f2 = R2/(1 – R2) for a more accurate estimate
– Stepwise 40:1• Outlier = Cook distance• Singularity-Multi-collinearity = SMCs• Normality = residual plots
Types of regression
• Simultaneous (Standard)– No theory and enter all IV in one block
• Stepwise– No theory. Allows the computer to choose on
statistical ground the best sub-set of IVs to fit the equations. Capitalises on chance effects
• Hierarchical (sequential)–– Theory driven. A-priori sequence of entry.
Types of regression: An example
Simultaneous
AgeGenderStressNControl
Stepwise
AgeControl
Hierarchical
Step 1
AgeGender
Step 2
StressNControl
Venn Diagrams
a bc
d
e
fg
Depression
Age
Sex
Neuroticism
Stress
Standard Regression
ac
e
g
Depression
Age
Sex
Neuroticism
Stress
Hierarchical
a bc
d
e
fg
Depression
Age
Sex
Neuroticism
Stress
Step 1
Step 2
Stepwise
a bc
d
e
fg
Depression
Age
Sex
Neuroticism
Stress
Stepwise
a bc
d
e
fg
Depression
Age
Sex
Neuroticism
Stress
Statistical terms
• B = un-standardized Beta
• Beta = standardized (-1 to +1)
• T-test = Is the beta significant?
• R2 0-1 (amount of variance accounted for)R2 = Change in from one block to the nextF = is the change in R significant?
• F = Is the equation significant?