chapter 7 found in unit 5 correlation & causality section 1: seeking correlation can't...

25
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 1

Upload: tyrone-nelson

Post on 13-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Chapter 7 found in Unit 5Correlation & Causality

Section 1: Seeking Correlation

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 1

Page 2: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

Definition

A correlation exists if there is a relationship between two quantities and, if so, will tell how strong the relationship is.

A scatter diagram (or scatterplot) is a graph in which each point represents the values of two variables.

2

Page 3: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Figure 7.3 Types of correlation seen on scatter diagrams.

Types of Correlation

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

Page 289

3

Page 4: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

CorrelationThe correlation coefficient, r, is a unit-less measure that describes the strength of the linear relationship between two variables.

–If the value is positive, as one variable increases, the other increases.–If the value is negative, as one variable increases, the other decreases.–The variable, r, will always be a value between -1 and 1 inclusive.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

4

Page 5: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Linear Correlation Coefficient

The formula to calculate the correlation coefficient (r) is as follows:

2 22 2

n xy x yr

n x x n y y

Page 294

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

Tables can be made just like we did with the Standard Deviation.

We can calculate this in StatCrunch > Stats > Summary Stats > Correlation

5

Page 6: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

StatCrunch

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 6

Page 7: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

7.2

Interpreting Correlations

Page 299

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 7

Page 8: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Cautions

• Outliers can cause bad interpretations. If they are removed from the calculations, you must state that and why.

• Bad Groupings can also cause bad interpretations, they can hide or show what someone wants you to see, or other issues.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 8

Page 9: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

Correlation Does Not Imply Causality

Possible Explanations for a Correlation

1. The correlation may be a coincidence.

2. Both correlation variables might be directly influenced by some common underlying cause.

3. One of the correlated variables may actually be a cause of the other. But note that, even in this case, it may be just one of several causes.

Page 303

9

Page 10: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

7.3Best-Fit Lines & Prediction

Page 307

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 10

Page 11: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

The line of best fit (regression line or the least squares line) is the line that best fits the data, i.e. it is closer to the data than any other line.

This line can be calculated as: y = mx + b, where Slope, m = r(sy/sx), with sy is the standard deviation

of y & sx is the standard deviation of x

Y-intercept, b = y – (m * x), with y as the mean of the y’s and x as the mean of the x’s.

(again, StatCrunch or another program is handy)

Page 313

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

11

Page 12: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Example: Graph

Best Fit Line

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 12

Page 13: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

StatCrunch

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 13

Page 14: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

StatCrunch

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 14

Page 15: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Cautions in Making Predictions from Best-Fit Lines 1. Don’t expect a best-fit line to give a good prediction unless the

correlation is strong and there are many data points. If the sample points lie very close to the best-fit line, the correlation is very strong and the prediction is more likely to be accurate. If the sample points lie away from the best-fit line by substantial amounts, the correlation is weak and predictions tend to be much less accurate.

2. Don’t use a best-fit line to make predictions beyond the bounds of the data points to which the line was fit.

3. A best-fit line based on past data is not necessarily valid now and might not result in valid predictions of the future.

4. Don’t make predictions about a population that is different from the population from which the sample data were drawn.

5. Remember that a best-fit line is meaningless when there is no significant correlation or when the relationship is nonlinear.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 15

Page 16: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not.

Solution:No one exercises 18 hours per day on an ongoing basis, so this much

exercise must be beyond the bounds of any data collected. Therefore, a prediction about someone who exercises 18 hours per day should not be trusted.

EXAMPLE 1 Valid Predictions?

You’ve found a best-fit line for a correlation between the number of hours per day that people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories per day.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 16

Page 17: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not.

Solution:

EXAMPLE 1 Valid Predictions?

Historical data have shown a strong negative correlation between national birth rates and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia.

We cannot automatically assume that the historical data still apply today. In fact, Russia currently has a very low birth rate, despite also having a low level of affluence.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 17

Page 18: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not.

Solution:

EXAMPLE 1 Valid Predictions?

Scientific studies have shown a very strong correlation between children’s ingesting of lead and mental retardation. Based on this correlation, paints containing lead were banned.

Given the strength of the correlation and the severity of the consequences, this prediction and the ban that followed seem quite reasonable. In fact, later studies established lead as an actual cause of mental retardation, making the rationale behind the ban even stronger.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 18

Page 19: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not.

Solution:

EXAMPLE 1 Valid Predictions?

Based on a large data set, you’ve made a scatter diagram for salsa consumption (per person) versus years of education. The diagram shows no significant correlation, but you’ve drawn a best-fit line anyway. The line predicts that someone who consumes a pint of salsa per week has at least 13 years of education.

Because there is no significant correlation, the best-fit line and any predictions made from it are meaningless.

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 19

Page 20: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

The square of the correlation coefficient, or r2, is the proportion of the variation in a variable that is accounted for by the best-fit line.

The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, R2, tells us the proportion of the scatter in the data accounted for by the best-fit equation.

20

Page 21: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Political scientists are interested in knowing what factors affect voter turnout in elections. One such factor is the unemployment rate. Data collected in presidential election years since 1964 show a very weak negative correlation between voter turnout and the unemployment rate, with a correlation coefficient of about r = -0.1. Based on this correlation, should we use the unemployment rate to predict voter turnout in the next presidential election?

Note that there is a scatter diagram of the voter turnout data on page 312.

Solution: The square of the correlation coefficient is r2 = (-0.1)2 = 0.01, which means that only about 1% of the variation in the data is accounted for by the best-fit line. Nearly all of the variation in the data must therefore be explained by other factors. We conclude that unemployment is not a reliable predictor of voter turnout.

EXAMPLE 4 Voter Turnout and Unemployment

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 21

Page 22: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

7.4The Search for Causality

Page 315

Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 22

Page 23: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Guidelines for Establishing CausalityIf you suspect that a particular variable (the suspected cause) is causing some effect:

1. Look for situations in which the effect is correlated with the suspected cause even while other factors vary.

2. Among groups that differ only in the presence or absence of the suspected cause, check that the effect is similarly present or absent.

3. Look for evidence that larger amounts of the suspected cause produce larger amounts of the effect.

4. If the effect might be produced by other potential causes (besides your suspected cause), make sure that the effect still remains after accounting for these other potential causes.

5. If possible, test the suspected cause with an experiment. If the experiment cannot be performed with humans for ethical reasons, consider doing the experiment with animals, cell cultures, or computer models .

6. Try to determine the physical mechanism by which the suspected cause produces the effect.

23Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar

Put ? in front of Questions so it is easier to see them.

Page 24: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Hidden Causality

Sometimes correlations—or the lack of a correlation—can hide an underlying causality.

For example, studies suggested patients who had heart bypass surgery fared no better than those who didn’t. But researchers found confounding variables that early studies had not considered, such as amount of blockage and surgical techniques.

These confounding variables prevented the studies from finding a real correlation between the surgery and prolonged life.

24Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar

Put ? in front of Questions so it is easier to see them.

Page 25: Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter

Broad Levels of Confidence in Causality

Possible cause: We have discovered a correlation, but cannot yet determine whether the correlation implies causality.

Probable cause: We have good reason to suspect that the correlation involves cause, perhaps because some of the guidelines for establishing causality are satisfied.

Cause beyond reasonable doubt: We have found a physical model that is so successful in explaining how one thing causes another that it seems unreasonable to doubt the causality.

25Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar

Put ? in front of Questions so it is easier to see them.