# 2 correlation & regression

Embed Size (px)

TRANSCRIPT

– Working with relationships between two variables

• Size of Teaching Tip & Stats Test Score

0

10

20

30

40

50

60

70

80

90

100

$0 $20 $40 $60 $80

StatsTestScore

Correlation & Regression

• Univariate & Bivariate Statistics – U: frequency distribution, mean, mode, range, standard

deviation– B: correlation – two variables

• Correlation– linear pattern of relationship between one variable (x) and

another variable (y) – an association between two variables– relative position of one variable correlates with relative

distribution of another variable– graphical representation of the relationship between two

variables

• Warning: – No proof of causality– Cannot assume x causes y

Scatterplot!

• No Correlation– Random or circular

assortment of dots

• Positive Correlation– ellipse leaning to right

– GPA and SAT– Smoking and Lung Damage

• Negative Correlation– ellipse learning to left

– Depression & Self-esteem

– Studying & test errors

Pearson’s Correlation Coefficient

• “r” indicates…– strength of relationship (strong, weak, or none)

– direction of relationship

• positive (direct) – variables move in same direction

• negative (inverse) – variables move in opposite directions

• r ranges in value from –1.0 to +1.0

Strong Negative No Rel. Strong Positive-1.0 0.0 +1.0

•Go to website!–playing with scatterplots

Practice with Scatterplots

r = .__ __

r = .__ __

r = .__ __

r = .__ __

Correlation Guestimation

Correlations

1 -.797** -.800** -.774**

.002 .002 .003

12 12 12 12

-.797** 1 .648* .780**

.002 .023 .003

12 12 12 12

-.800** .648* 1 .753**

.002 .023 .005

12 12 12 12

-.774** .780** .753** 1

.003 .003 .005

12 12 12 12

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Miles walked per day

Weight

Depression

Anxiety

Miles walkedper day Weight Depression Anxiety

Correlation is significant at the 0.01 level (2-tailed).**.

Correlation is significant at the 0.05 level (2-tailed).*.

Samples vs. Populations

• Sample statistics estimate Population parameters– M tries to estimate μ

– r tries to estimate ρ (“rho” – greek symbol --- not “p”)

• r correlation for a sample

• based on a the limited observations we have

• ρ actual correlation in population • the true correlation

• Beware Sampling Error!!– even if ρ=0 (there’s no actual correlation), you might get r =.08

or r = -.26 just by chance.

– We look at r, but we want to know about ρ

Hypothesis testing with Correlations• Two possibilities

– Ho: ρ = 0 (no actual correlation; The Null Hypothesis)– Ha: ρ ≠ 0 (there is some correlation; The Alternative Hyp.)

• Case #1 (see correlation worksheet)

– Correlation between distance and points r = -.904– Sample small (n=6), but r is very large– We guess ρ < 0 (we guess there is some correlation in the pop.)

• Case #2– Correlation between aiming and points, r = .628– Sample small (n=6), and r is only moderate in size– We guess ρ = 0 (we guess there is NO correlation in pop.)

• Bottom-line– We can only guess about ρ – We can be wrong in two ways

Reading Correlation MatrixCorrelationsa

1 -.904* -.582 .628 .821* -.037 -.502

. .013 .226 .181 .045 .945 .310

6 6 6 6 6 6 6

-.904* 1 .279 -.653 -.883* .228 .522

.013 . .592 .159 .020 .664 .288

6 6 6 6 6 6 6

-.582 .279 1 -.390 -.248 -.087 .267

.226 .592 . .445 .635 .869 .609

6 6 6 6 6 6 6

.628 -.653 -.390 1 .758 -.546 -.250

.181 .159 .445 . .081 .262 .633

6 6 6 6 6 6 6

.821* -.883* -.248 .758 1 -.553 -.101

.045 .020 .635 .081 . .255 .848

6 6 6 6 6 6 6

-.037 .228 -.087 -.546 -.553 1 -.524

.945 .664 .869 .262 .255 . .286

6 6 6 6 6 6 6

-.502 .522 .267 -.250 -.101 -.524 1

.310 .288 .609 .633 .848 .286 .

6 6 6 6 6 6 6

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Total ball toss points

Distance from target

Time spun beforethrowing

Aiming accuracy

Manual dexterity

College grade point avg

Confidence for task

Total balltoss points

Distancefrom target

Time spunbefore

throwingAiming

accuracyManual

dexterityCollege grade

point avgConfidence

for task

Correlation is significant at the 0.05 level (2-tailed).*.

Day sample collected = Tuesdaya.

6 6

-.904* 1

.013 .

6 6

-.582 .279

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Total ball toss points

Distance from target

Time spun beforethrowing

r = -.904

p = .013 -- Probability of getting a correlation this size by sheer chance. Reject Ho if p ≤ .05.

sample size r (4) = -.904, p.05

Predictive Potential

• Coefficient of Determination– r²

– Amount of variance accounted for in y by x

– Percentage increase in accuracy you gain by using the regression line to make predictions

– Without correlation, you can only guess the mean of y

– [Used with regression]

20%0% 80% 100%60%40%

Limitations of Correlation

• linearity: – can’t describe non-linear relationships

– e.g., relation between anxiety & performance

• truncation of range: – underestimate stength of relationship if you can’t see full range

of x value

• no proof of causation– third variable problem:

• could be 3rd variable causing change in both variables

• directionality: can’t be sure which way causality “flows”

Regression

• Regression: Correlation + Prediction– predicting y based on x

– e.g., predicting….

• throwing points (y)

• based on distance from target (x)

• Regression equation – formula that specifies a line

– y’ = bx + a

– plug in a x value (distance from target) and predict y (points)

– note

• y= actual value of a score

• y’= predict value •Go to website!–Regression Playground

Distance from target

2624222018161412108

Tota

l ba

ll to

ss p

oin

ts

120

100

80

60

40

20

0 Rsq = 0.6031

Regression Graphic – Regression Line

if x=18 then…

y’=47

if x=24 then…

y’=20

See correlation & regression worksheet

Regression Equation

• y’= bx + a– y’ = predicted value of y– b = slope of the line– x = value of x that you plug-in– a = y-intercept (where line crosses y access)

• In this case….– y’ = -4.263(x) + 125.401

• So if the distance is 2020 feet– y’ = -4.263(2020) + 125.401– y’ = -85.26 + 125.401

– y’ = 40.141

See correlation & regression worksheet

SPSS Regression Set-up•“Criterion,” •y-axis variable, •what you’re trying to predict

•“Predictor,” •x-axis variable, •what you’re basing the prediction on

Note: Never refer to the IV or DV when doing regression

Getting Regression Info from SPSS

Model Summary

.777a .603 .581 18.476Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Distance from targeta.

Coefficientsa

125.401 14.265 8.791 .000

-4.263 .815 -.777 -5.230 .000

(Constant)

Distance from target

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Dependent Variable: Total ball toss pointsa.

y’ = b (xx) + a

y’ = -4.263(2020) + 125.401

See correlation & regression worksheet

a

b

Predictive Ability

• Mantra!! – As variability decreases, prediction accuracy ___

– if we can account for variance, we can make better predictions

• As r increases:– r² increases

• “variance accounted for” increases

• the prediction accuracy increases

– prediction error decreases (distance between y’ and y)

– Sy’ decreases

• the standard error of the residual/predictor

• measures overall amount of prediction error

• We like big r’s!!!

Drawing a Regression Line by Hand

Three steps

1. Plug zero in for x to get a y’ value, and then plot this value

– Note: It will be the y-intercept

2. Plug in a large value for x (just so it falls on the right end of the graph), plug it in for x, then plot the resulting point

3. Connect the two points with a straight line!