september 151. 2 in chapter 14: 14.1 data 14.2 scatterplots 14.3 correlation 14.4 regression

39
Jun 15, 2022 1 Chapter 14: Chapter 14: Correlation and Correlation and Regression Regression

Upload: moses-harrison

Post on 28-Dec-2015

226 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

Apr 19, 2023 1

Chapter 14: Chapter 14: Correlation and RegressionCorrelation and Regression

Page 2: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

2

In Chapter 14:

14.1 Data

14.2 Scatterplots

14.3 Correlation

14.4 Regression

Page 3: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

3

14.1 Data• Quantitative response variable Y (“dependent

variable”)• Quantitative explanatory variable X

(“independent variable”)• Historically important public health data set used

to illustrate techniques (Doll, 1955)– n = 11 countries– Explanatory variable = per capita cigarette

consumption in 1930 (CIG1930) – Response variable = lung cancer mortality per

100,000 (LUNGCA)

Page 4: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

4

Data, cont.

Page 5: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

5

§14.2 ScatterplotBivariate (xi, yi) points plotted as scatter plot.

Page 6: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

6

Inspect scatterplot’s

• Form: Can the relation be described with a straight or some other type of line?

• Direction: Do points tend trend upward or downward?

• Strength of association: Do point adhere closely to an imaginary trend line?

• Outliers (in any): Are there any striking deviations from the overall pattern?

Page 7: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

7

Judging Correlational Strength• Correlational strength refers

to the degree to which points adhere to a trend line

• The eye is not a good judge of strength.

• The top plot appears to show a weaker correlation than the bottom plot. However, these are plots of the same data sets. (The perception of a difference is an artifact of axes scaling.)

Page 8: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

8

§14.3. Correlation• Correlation coefficient r quantifies linear

relationship with a number between −1 and 1.• When all points fall on a line with an upward

slope, r = 1. When all data points fall on a line with a downward slope, r = −1

• When data points trend upward, r is positive; when data points trend downward, r is negative.

• The closer r is to 1 or −1, the stronger the correlation.

Page 9: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

9

Examples of correlations

Page 10: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

10

Calculating r• Formula

Correlation coefficient tracks the degree to which X and Y “go together.”

• Recall that z scores quantify the amount a value lies above or below its mean in standard deviations units.

• When z scores for X and Y track in the same direction, their products are positive and r is positive (and vice versa).

Page 11: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

11

Calculating r, Example

Page 12: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

12

Calculating rIn practice, we rely on computers and calculators to calculate r. I encourage my students to use these tools whenever possible.

Page 13: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

13

Calculating rSPSS output for Analyze > Correlate > Bivariate using the illustrative data:

Correlations

1 .737**

.010

11 11

.737** 1

.010

11 11

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Cig Consuption percapita, 1930

Lung Cancer Mortalityper 100000, 1950

CigConsuptionper capita,

1930

Lung CancerMortality per100000, 1950

Correlation is significant at the 0.01 level (2-tailed).**.

Page 14: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

14

Interpretation of r1. Direction. The sign of r indicates the direction

of the association: positive (r > 0), negative (r < 0), or no association (r ≈ 0).

2. Strength. The closer r is to 1 or −1, the stronger the association.

3. Coefficient of determination. The square of the correlation coefficient (r2) is called the coefficient of determination. This statistic quantifies the proportion of the variance in Y [mathematically] “explained” by X. For the illustrative data, r = 0.737 and r2 = 0.54. Therefore, 54% of the variance in Y is explained by X.

Page 15: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

15

Notes, cont. 4. Reversible relationship. With correlation, it

does not matter whether variable X or Y is specified as the explanatory variable; calculations come out the same either way. [This will not be true for regression.]

5. Outliers. Outliers can have

a profound effect on r. This

figure has an r of 0.82 that is

fully accounted for by the

single outlier.

Page 16: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

16

Notes, cont.

6. Linear relations only. Correlation applies only to linear relationships This figure shows a strong non-linear relationship, yet r = 0.00.

7. Correlation does not necessarily mean causation. Beware lurking variables (next slide).

Page 17: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

17

Confounded CorrelationA near perfect negative correlation (r = −.987) was seen between cholera mortality and elevation above sea level during a 19th century epidemic.

We now know that cholera is transmitted by water. The observed relationship between cholera and elevation was confounded by the lurking variable proximity to polluted water.

Page 18: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

18

Hypothesis Test

We conduct the hypothesis test to guard against identifying too many random correlations.

Random selection from a random scatter can result in an apparent correlation

Page 19: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

19

Hypothesis TestA. Hypotheses. Let ρ represent the population

correlation coefficient. H0: ρ = 0 vs. Ha: ρ ≠ 0 (two-sided)[or Ha: ρ > 0 (right-sided) or Ha: ρ < 0 (left-sided)]

B. Test statistic

C. P-value. Convert tstat to P-value with software or Table C.

2

2

1 where

2

stat

ndf

n

rSE

SE

rt r

r

Page 20: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

20

Hypothesis Test – Illustrative Example

A. H0: ρ = 0 vs. Ha: ρ ≠ 0 (two-sided)

B. Test stat

C. .005 < P < .01 by Table C. P = .0097 by computer. The evidence against H0 is highly significant.

9 211

3.27 0.2253

737.0

0.2253211

737.01

stat

2

df

t

SEr

Page 21: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

21

Confidence Interval for ρ

Page 22: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

22

Confidence Interval for ρ

Page 23: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

23

Conditions for Inference

• Independent observations

• Bivariate Normality (r can still be used descriptively when data are not bivariate Normal)

Page 24: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

24

§14.4. Regression• Regression describes

the relationship in the data with a line that predicts the average change in Y per unit X.

• The best fitting line is found by minimizing the sum of squared residuals, as shown in this figure.

Page 25: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

25

Regression Line, cont.

• The regression line equation is:

where ŷ ≡ predicted value of Y, a ≡ the intercept of the line, and b ≡ the slope of the line

• Equations to calculate a and bSLOPE:

INTERCEPT:

Page 26: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

26

Regression Line, cont.Slope b is the key statistic produced by the regression

Page 27: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

27

Regression Line, illustrative example

Here’s the output from SPSS:

Page 28: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

28

• Let α represent the population intercept, β represent population slope, and εi represent the residual “error” for point i. The population regression model is

• The estimated standard error of the regression is

• A (1−α)100% CI for population slope β is

Inference

X

xYbbn

sn

sSESEtb

1 where |

1,2 2

Page 29: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

29

Confidence Interval for β–Example

-4.342 17.854

.007 .039

(Constant)

cig1930

Model1

Lower Bound Upper Bound

95% Confidence Interval for B

Page 30: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

30

t Test of Slope Coefficient

A. Hypotheses. H0: β = 0 against Ha: β ≠ 0

B. Test statistic.

C. P-value. Convert the tstat to a P-value

2

1

where |stat

ndf

sn

sSE

SE

bt

X

xYb

b

Page 31: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

31

t Test: Illustrative Example

6.756 4.906 1.377 .202

.023 .007 .737 3.275 .010

(Constant)

cig1930

Model1

B Std. Error

UnstandardizedCoefficients

Beta

StandardizedCoefficients

t Sig.

Page 32: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

32

Analysis of Variance of the Regression Model

An ANOVA technique equivalent to the t test can also be used to test H0: β = 0.This technique is covered on pp. 321 – 324 in the text but is not included in this presentation.

Page 33: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

33

Conditions for Inference

Inference about the regression line requires these conditions

• Linearity

• Independent observations

• Normality at each level of X

• Equal variance at each level of X

Page 34: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

34

Conditions for InferenceThis figure illustrates Normal and equal variation around the regression line at all levels of X

Page 35: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

35

Assessing Conditions• The scatterplot should be visually inspected for

linearity, Normality, and equal variance• Plotting the residuals from the model can be

helpful in this regard.• The table lists residuals for the illustrative data

Page 36: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

36

Assessing Conditions, cont. • A stemplot of the

residuals show no major departures from Normality

• This residual plot shows more variability at higher X values (but the data is very sparse)

|-1|6|-0|2336| 0|01366| 1|4 x10

Page 37: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

37

Residual PlotsWith a little experience, you can get good at reading residual plots. Here’s an example of linearity with equal variance.

Page 38: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

38

Residual PlotsExample of linearity with unequal variance

Page 39: September 151. 2 In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression

39

Example of Residual PlotsExample of non-linearity with equal variance