chapter 4: more on two variable data

16
CHAPTER 4: MORE ON TWO VARIABLE DATA Sec. 4.2 – Cautions about Correlation and Regression

Upload: elke

Post on 23-Feb-2016

89 views

Category:

Documents


0 download

DESCRIPTION

Chapter 4: More on Two Variable Data. Sec. 4.2 – Cautions about Correlation and Regression. Cautions about Correlation and Regression. Recall from chapter 3: T hat correlation and regression describe only linear relationships That c orrelation and the LSRL are not resistant - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 4: More on Two Variable Data

CHAPTER 4: MORE ON TWO VARIABLE DATASec. 4.2 – Cautions about Correlation and Regression

Page 2: Chapter 4: More on Two Variable Data

CAUTIONS ABOUT CORRELATION AND REGRESSION Recall from chapter 3:

That correlation and regression describe only linear relationships

That correlation and the LSRL are not resistant One influential point or incorrectly entered data point

can completely change the data.

Always plot your data before interpreting regression or correlation

Page 3: Chapter 4: More on Two Variable Data

EXTRAPOLATION Extrapolation is the use of a regression line far

outside the domain of values of the explanatory variable x that you used to obtain the line or curve. Such predictions are not accurate

Example Suppose that you have data on a child’s growth

between the years 3 and 8. You find a strong linear relationship between age x and height y. If you fit a regression line to these data and use it to predict the child’s height at 25 years old you would predict them to be 8 feet tall

Don’t stray far from the domain of x that actually appears in your data

Page 4: Chapter 4: More on Two Variable Data

LURKING VARIABLES Sometimes the relationship between two variables is

influenced by other variables that we did not measure or even think about

A lurking variable is a variable that is not among the explanatory or response variables in study and yet may influence the interpretation of relationships among those variables.

The relationship between two variables can be strongly influenced by lurking variables. A lurking variable can falsely suggest a strong

relationship between x and y or it can hide a relationship that is really there.

Page 5: Chapter 4: More on Two Variable Data

Because lurking variables are often unrecognized and unmeasured, detecting their effect is a challenge

Many lurking variables change systematically over time. One method of detecting if time has an influence

is to plot residuals and response variables against the time order if available.

LURKING VARIABLES

Page 6: Chapter 4: More on Two Variable Data

LURKING VARIABLES

Page 7: Chapter 4: More on Two Variable Data

THE QUESTION OF CAUSATION In many studies of the relationship between

two variables, the goal is to establish that changes in the explanatory variable cause changes in the response variable.

Even when a strong association is present, the conclusion that this association is due to a causal linking in the variables is often elusive.

Page 8: Chapter 4: More on Two Variable Data

Strong Associations can generally be explained by one of three relationships.

1. Causation2. Common Response3. Confounding

Variable x and y show a strong association (dashed line). This association may be the result of any of several causal relationships (solid arrow).

EXPLAINING ASSOCIATION

Page 9: Chapter 4: More on Two Variable Data

EXPLAINING ASSOCIATION

Confounding: x may cause y, but y may instead be caused by a confounding variable z

Common Response: x and y are reacting to a lurking variable z

Causation:x causes y

Page 10: Chapter 4: More on Two Variable Data

CAUSATION

Causation is not easily established.

The best evidence for causation comes from experiments that change x while holding all other factors fixed.

Even a very strong association between two variables is not by itself good evidence that there is a cause-and-effect link between the variables.

Page 11: Chapter 4: More on Two Variable Data

EXAMPLES OF DIRECT CAUSATION The following relationships are examples

of direct causation, but “causation” is not a simple idea. Refer to p.233 for explanations

1. x = mother’s BMI y = daughter’s BMI

2. x = amount of saccharin in a rat’s diet y = count of tumors in the rat’s bladder

Page 12: Chapter 4: More on Two Variable Data

Beware of lurking variables when thinking about an association between two variables.

The observed association between the variables x and y is explained by a lurking variable z. Both x and y change to changes in z. This common response creates an association

even though there may be no direct causal link between x and y.

COMMON RESPONSE

Page 13: Chapter 4: More on Two Variable Data

EXAMPLES OF COMMON RESPONSE The following relationships are examples of

how common response can create an association. Refer to p.233 for explanations

3. x = a high school senior’s SAT score y = the student’s first-year college GPA

4. x = monthly flow of money into stock mutual funds

y = monthly rate of return for the stock market

Page 14: Chapter 4: More on Two Variable Data

Two variables are confounded when their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables.

Confounding of several variables often prevents us from drawing conclusions about causation.

CONFOUNDING

Page 15: Chapter 4: More on Two Variable Data

EXAMPLES OF CONFOUNDING The following relationships are examples of

confounding Refer to p.234 for explanations

5. x = whether a person regularly attends religious services

y = how long the person lives

6. x = the number of years of education a worker has

y = the worker’s income

Page 16: Chapter 4: More on Two Variable Data

Homework: p.237-239 #’s 33-36, 38 & 41