2 correlation & regression
Embed Size (px)
TRANSCRIPT

– Working with relationships between two variables
• Size of Teaching Tip & Stats Test Score
0
10
20
30
40
50
60
70
80
90
100
$0 $20 $40 $60 $80
StatsTestScore

Correlation & Regression
• Univariate & Bivariate Statistics – U: frequency distribution, mean, mode, range, standard
deviation– B: correlation – two variables
• Correlation– linear pattern of relationship between one variable (x) and
another variable (y) – an association between two variables– relative position of one variable correlates with relative
distribution of another variable– graphical representation of the relationship between two
variables
• Warning: – No proof of causality– Cannot assume x causes y

Scatterplot!
• No Correlation– Random or circular
assortment of dots
• Positive Correlation– ellipse leaning to right
– GPA and SAT– Smoking and Lung Damage
• Negative Correlation– ellipse learning to left
– Depression & Self-esteem
– Studying & test errors

Pearson’s Correlation Coefficient
• “r” indicates…– strength of relationship (strong, weak, or none)
– direction of relationship
• positive (direct) – variables move in same direction
• negative (inverse) – variables move in opposite directions
• r ranges in value from –1.0 to +1.0
Strong Negative No Rel. Strong Positive-1.0 0.0 +1.0
•Go to website!–playing with scatterplots

Practice with Scatterplots
r = .__ __
r = .__ __
r = .__ __
r = .__ __


Correlation Guestimation

Correlations
1 -.797** -.800** -.774**
.002 .002 .003
12 12 12 12
-.797** 1 .648* .780**
.002 .023 .003
12 12 12 12
-.800** .648* 1 .753**
.002 .023 .005
12 12 12 12
-.774** .780** .753** 1
.003 .003 .005
12 12 12 12
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Miles walked per day
Weight
Depression
Anxiety
Miles walkedper day Weight Depression Anxiety
Correlation is significant at the 0.01 level (2-tailed).**.
Correlation is significant at the 0.05 level (2-tailed).*.

Samples vs. Populations
• Sample statistics estimate Population parameters– M tries to estimate μ
– r tries to estimate ρ (“rho” – greek symbol --- not “p”)
• r correlation for a sample
• based on a the limited observations we have
• ρ actual correlation in population • the true correlation
• Beware Sampling Error!!– even if ρ=0 (there’s no actual correlation), you might get r =.08
or r = -.26 just by chance.
– We look at r, but we want to know about ρ

Hypothesis testing with Correlations• Two possibilities
– Ho: ρ = 0 (no actual correlation; The Null Hypothesis)– Ha: ρ ≠ 0 (there is some correlation; The Alternative Hyp.)
• Case #1 (see correlation worksheet)
– Correlation between distance and points r = -.904– Sample small (n=6), but r is very large– We guess ρ < 0 (we guess there is some correlation in the pop.)
• Case #2– Correlation between aiming and points, r = .628– Sample small (n=6), and r is only moderate in size– We guess ρ = 0 (we guess there is NO correlation in pop.)
• Bottom-line– We can only guess about ρ – We can be wrong in two ways

Reading Correlation MatrixCorrelationsa
1 -.904* -.582 .628 .821* -.037 -.502
. .013 .226 .181 .045 .945 .310
6 6 6 6 6 6 6
-.904* 1 .279 -.653 -.883* .228 .522
.013 . .592 .159 .020 .664 .288
6 6 6 6 6 6 6
-.582 .279 1 -.390 -.248 -.087 .267
.226 .592 . .445 .635 .869 .609
6 6 6 6 6 6 6
.628 -.653 -.390 1 .758 -.546 -.250
.181 .159 .445 . .081 .262 .633
6 6 6 6 6 6 6
.821* -.883* -.248 .758 1 -.553 -.101
.045 .020 .635 .081 . .255 .848
6 6 6 6 6 6 6
-.037 .228 -.087 -.546 -.553 1 -.524
.945 .664 .869 .262 .255 . .286
6 6 6 6 6 6 6
-.502 .522 .267 -.250 -.101 -.524 1
.310 .288 .609 .633 .848 .286 .
6 6 6 6 6 6 6
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Total ball toss points
Distance from target
Time spun beforethrowing
Aiming accuracy
Manual dexterity
College grade point avg
Confidence for task
Total balltoss points
Distancefrom target
Time spunbefore
throwingAiming
accuracyManual
dexterityCollege grade
point avgConfidence
for task
Correlation is significant at the 0.05 level (2-tailed).*.
Day sample collected = Tuesdaya.
6 6
-.904* 1
.013 .
6 6
-.582 .279
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Total ball toss points
Distance from target
Time spun beforethrowing
r = -.904
p = .013 -- Probability of getting a correlation this size by sheer chance. Reject Ho if p ≤ .05.
sample size r (4) = -.904, p.05

Predictive Potential
• Coefficient of Determination– r²
– Amount of variance accounted for in y by x
– Percentage increase in accuracy you gain by using the regression line to make predictions
– Without correlation, you can only guess the mean of y
– [Used with regression]
20%0% 80% 100%60%40%

Limitations of Correlation
• linearity: – can’t describe non-linear relationships
– e.g., relation between anxiety & performance
• truncation of range: – underestimate stength of relationship if you can’t see full range
of x value
• no proof of causation– third variable problem:
• could be 3rd variable causing change in both variables
• directionality: can’t be sure which way causality “flows”

Regression
• Regression: Correlation + Prediction– predicting y based on x
– e.g., predicting….
• throwing points (y)
• based on distance from target (x)
• Regression equation – formula that specifies a line
– y’ = bx + a
– plug in a x value (distance from target) and predict y (points)
– note
• y= actual value of a score
• y’= predict value •Go to website!–Regression Playground

Distance from target
2624222018161412108
Tota
l ba
ll to
ss p
oin
ts
120
100
80
60
40
20
0 Rsq = 0.6031
Regression Graphic – Regression Line
if x=18 then…
y’=47
if x=24 then…
y’=20
See correlation & regression worksheet

Regression Equation
• y’= bx + a– y’ = predicted value of y– b = slope of the line– x = value of x that you plug-in– a = y-intercept (where line crosses y access)
• In this case….– y’ = -4.263(x) + 125.401
• So if the distance is 2020 feet– y’ = -4.263(2020) + 125.401– y’ = -85.26 + 125.401
– y’ = 40.141
See correlation & regression worksheet

SPSS Regression Set-up•“Criterion,” •y-axis variable, •what you’re trying to predict
•“Predictor,” •x-axis variable, •what you’re basing the prediction on
Note: Never refer to the IV or DV when doing regression

Getting Regression Info from SPSS
Model Summary
.777a .603 .581 18.476Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), Distance from targeta.
Coefficientsa
125.401 14.265 8.791 .000
-4.263 .815 -.777 -5.230 .000
(Constant)
Distance from target
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: Total ball toss pointsa.
y’ = b (xx) + a
y’ = -4.263(2020) + 125.401
See correlation & regression worksheet
a
b

Predictive Ability
• Mantra!! – As variability decreases, prediction accuracy ___
– if we can account for variance, we can make better predictions
• As r increases:– r² increases
• “variance accounted for” increases
• the prediction accuracy increases
– prediction error decreases (distance between y’ and y)
– Sy’ decreases
• the standard error of the residual/predictor
• measures overall amount of prediction error
• We like big r’s!!!

Drawing a Regression Line by Hand
Three steps
1. Plug zero in for x to get a y’ value, and then plot this value
– Note: It will be the y-intercept
2. Plug in a large value for x (just so it falls on the right end of the graph), plug it in for x, then plot the resulting point
3. Connect the two points with a straight line!