Download - Correlations, Simple Regression
-
8/4/2019 Correlations, Simple Regression
1/43
(Not) Relationships amongVariables
Descriptive stats (e.g., mean, median,mode, standard deviation)
describe a sample of data
z-test &/or t-test for a single populationparameter (e.g., mean)
infer the true value of a single variable ex: mean # of random digits that people can
memorize
-
8/4/2019 Correlations, Simple Regression
2/43
Relationships among Variables
But relationships among more than one variableare the crucial feature of almost all scientificresearch.
Examples: How does the perception of a stimulus vary with the
physical intensity of that stimulus?
How does the attitude towards the President vary with
the socio-economic properties of the surveyrespondent?
How does the performance on a mental task vary withage?
-
8/4/2019 Correlations, Simple Regression
3/43
Relationships among Variables
More Examples:
How does depression vary with number of traumaticexperiences?
How does undergraduate drinking vary with performance inquantitative courses?
How does memory performance vary with attention span?
etc...
Weve already learned a few ways to analyzerelationships among 2 variables.
-
8/4/2019 Correlations, Simple Regression
4/43
Relationships among TwoVariables: Chi-Square
Chi-Square test of independence (2-waycontingency table) compare observed cell frequencies to the cell
frequencies youd expect if the two variables areindependent.
ex: X=geographical region: West coast, Midwest, East
coast
Y=favorite color: red, blue, green
Note: both variables are categorical
-
8/4/2019 Correlations, Simple Regression
5/43
Relationships among TwoVariables: Chi-Square
Observed frequencies:
Expected frequencies:
West
Coast Midwest
East
Coast
Red 49 30 18
Blue 52 32 20
Green 130 62 107
West
Coast Midwest
East
Coast total
Red 49 30 18 97
Blue 52 32 20 104
Green 130 62 107 299
total 231 124 145 500
West
Coast Midwest
East
Coast total
Red 97
Blue 104
Green 299
total 231 124 145 500
(row total)(column total
grand total
West
Coast Midwest
East
Coast total
Red 44.814 97
Blue 104
Green 299
total 231 124 145 500
West
Coast Midwest
East
Coast total
Red 44.814 24.06 28.13 97
Blue 48.05 25.79 30.16 104
Green 138.14 74.15 86.71 299
total 231 124 145 500
-
8/4/2019 Correlations, Simple Regression
6/43
Relationships among TwoVariables: Chi-Square
Observed frequencies:
Expected frequencies:
West
Coast Midwest
East
Coast total
Red 49 30 18 97
Blue 52 32 20 104
Green 130 62 107 299
total 231 124 145 500
West
Coast Midwest
East
Coast total
Red 44.814 24.06 28.13 97
Blue 48.05 25.79 30.16 104
Green 138.14 74.15 86.71 299
total 231 124 145 500
2
(observed expected) 2
expectedall cells
17.97
if this exceeds
critical value,
reject H0 that the2 variables are
independent
(unrelated)
-
8/4/2019 Correlations, Simple Regression
7/43
Relationships among TwoVariables: z, t tests
z-test &/or t-test for difference ofpopulation means
compare values of one variable (Y) for 2 differentlevels/groups of another variable (X)
ex:
X=age: young people vs. old people
Y=# random digits can memorize
Q: Is the mean # digits the same for the 2 agegroups?
-
8/4/2019 Correlations, Simple Regression
8/43
Relationships among TwoVariables: ANOVA
ANOVA
compare values of one variable (Y) for 3+ different
levels/groups of another variable (X) ex:
X=age: young people, middle-aged, old people
Y=# random digits can memorize
Q: Is the mean # digits the same for all 3 agegroups?
-
8/4/2019 Correlations, Simple Regression
9/43
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Digits Memorized
young
old
Relationships among TwoVariables: z, t & ANOVA
NOTE: for z/t tests for differences, and for ANOVA, there are a
small number of possible values for one of the variables (X)z, t ANOVA
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Digits Memorized
young
middle
old
-
8/4/2019 Correlations, Simple Regression
10/43
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Digits Memorized
young
old
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# Digits Memorized
young
middle
old
Relationships among TwoVariables: z, t & ANOVA
NOTE: for z/t tests for differences, and for ANOVA, there are a
small number of possible values for one of the variables (X)
QuickTime and a
decompressor
are needed to see this picture.
z, t
QuickTime and adecompressor
are needed to s ee this picture.
ANOVA
-
8/4/2019 Correlations, Simple Regression
11/43
Relationships among TwoVariables: many values of X?
What about when there are many possible values of BOTH
variables? Maybe theyre even continuous (rather than discrete)?
QuickTime and a
decompressorare needed to see t his picture.
Correlation, andSimple Linear
Regression will be
used to analyzerelationship among
two such variables
(scatter plot)
-
8/4/2019 Correlations, Simple Regression
12/43
Correlation: Scatter Plots
QuickTime and adecompressor
are needed to see this picture.Does it look like there is a relationship?
-
8/4/2019 Correlations, Simple Regression
13/43
Correlation
measures the direction and strength of alinear relationship between two variables
that is, it answers in a general way thequestion: as the values of one variable
change, how do the corresponding values
of the other variable change?
-
8/4/2019 Correlations, Simple Regression
14/43
Linear Relationship
linear relationship:
y=a +bx (straight line)
QuickTime and adecompressor
are needed to s ee this picture.
QuickTime and a
decompressorare needed to see this p icture.
QuickTime and adecompressor
are needed to see this picture.
LinearNot (strictly) Linear
-
8/4/2019 Correlations, Simple Regression
15/43
Correlation Coefficient: r
sign: direction of relationship
magnitude(number): strength of relationship
-1 r 1 r=0 is no linear relationship
r=-1 is perfect negative correlation
r=1 is perfect positive correlation
Notes: Symmetric Measure (You can exchange X and Y and
get the same value)
Measures linear relationship only
-
8/4/2019 Correlations, Simple Regression
16/43
QuickTime an d a
decompressorare needed to see this picture.
Correlation Coefficient: r
Formula:
alt. formula (ALEKS):
r1
n 1xi x
sx
yi ysy
standardized
values
r
x iy i nx yi1
n
(n 1)sxsy
-
8/4/2019 Correlations, Simple Regression
17/43
QuickTime and a
decompressorare needed to see this p icture.
Correlation: Examples
QuickTime and a
decompressorare needed to see this picture.
Population: undergraduates
-
8/4/2019 Correlations, Simple Regression
18/43
Correlation: Examples
QuickTime and a
decompressorare needed to see this picture.
Population: undergraduates
-
8/4/2019 Correlations, Simple Regression
19/43
Correlation: Examples
Population: undergraduates
-
8/4/2019 Correlations, Simple Regression
20/43
Correlation: Examples
Others?
-
8/4/2019 Correlations, Simple Regression
21/43
Correlation: Interpretation
Correlation Causation!
-
8/4/2019 Correlations, Simple Regression
22/43
-
8/4/2019 Correlations, Simple Regression
23/43
-
8/4/2019 Correlations, Simple Regression
24/43
Correlation: Interpretation
Correlation Causation!
When 2 variables are correlated, the causality maybe:
X --> Y X X&Y (lurking third variable)
or a combination of the above
Examples: ice cream & murder, violence & videogames, SAT verbal & math, booze & GPA
Inferring causation requires consideration of: howdata gathered (e.g., experiment vs. observation),
other relevant knowledge, logic...
-
8/4/2019 Correlations, Simple Regression
25/43
Simple Linear Regression
PREDICTING one variable (Y) fromanother (X)
No longer symmetric like Correlation
One variable is used to explain another variable
X VariableIndependent VariableExplaining VariableExogenous Variable
Predictor Variable
Y VariableDependent VariableResponse Variable
Endogenous Variable
Criterion Variable
-
8/4/2019 Correlations, Simple Regression
26/43
Simple Linear Regression
idea: find a line (linear function) that bestfits the scattered data points
this will let us characterize the relationshipbetween X & Y, and predict new values ofY for a given X value.
-
8/4/2019 Correlations, Simple Regression
27/43
-
8/4/2019 Correlations, Simple Regression
28/43
-
8/4/2019 Correlations, Simple Regression
29/43
(0,a)
b
Intercept
Slope
bX+a
X
Reminder: (Simple) Linear Function Y=a+bX
We are interested in this to model the relationship between anindependent variable X and a dependent variable Y
Y
-
8/4/2019 Correlations, Simple Regression
30/43
1
slope:b
intercept:a
bXaY
:spredictionerrorlesshadweIf
X
Y
Simple Linear Regression
all data points would
fall right on the line
-
8/4/2019 Correlations, Simple Regression
31/43
X
YA guess at the location of the regression line
-
8/4/2019 Correlations, Simple Regression
32/43
X
Y
Another guess at the location of the regression line
(same slope, different intercept)
-
8/4/2019 Correlations, Simple Regression
33/43
X
YInitial guess at the location of the regression line
-
8/4/2019 Correlations, Simple Regression
34/43
X
Y
Another guess at the location of the regression line
(same intercept, different slope)
-
8/4/2019 Correlations, Simple Regression
35/43
X
YInitial guess at the location of the regression line
-
8/4/2019 Correlations, Simple Regression
36/43
X
Y
Another guess at the location of the regression line
(different intercept and slope, same center)
We will end up being reasonably confident
-
8/4/2019 Correlations, Simple Regression
37/43
X
Y
We will end up being reasonably confident
that the true regression line is somewhere
in the indicated region.
-
8/4/2019 Correlations, Simple Regression
38/43
X
Y
Estimated Regression Line
errors/residuals
-
8/4/2019 Correlations, Simple Regression
39/43
X
Y
Estimated Regression Line
-
8/4/2019 Correlations, Simple Regression
40/43
X
Y
Estimated Regression Line
Error Terms have to be drawn vertically
-
8/4/2019 Correlations, Simple Regression
41/43
X
Y
Estimated Regression Line
iii yye
i
y
iy
ix
bXaY
:LineRegressiontheofEquation
iy =y hat: predictedvalue of Y for Xi
-
8/4/2019 Correlations, Simple Regression
42/43
Estimating the Regression Line
Idea: find the formula for the line that minimizesthe squared errors error: distance between actual data point and
predicted value Y=a+bX
Y=b0+b1x
b1=slope of regression line
b0=Y intercept of regression line
-
8/4/2019 Correlations, Simple Regression
43/43
b1 X
iX
Y
iY
i1
N
Xi X
2
i1
N
ALEKS:
b1 rsy
sx
b0 Y b1X
Y=b0+b1X
b1 xiyi nx y
i1
n
(n 1)sx
2
b1 (slope)
b0 (Y intercept)
using
correlation
coefficient